linux/drivers/iommu/iommu.c

3832 lines
101 KiB
C
Raw Permalink Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2007-2008 Advanced Micro Devices, Inc.
* Author: Joerg Roedel <jroedel@suse.de>
*/
#define pr_fmt(fmt) "iommu: " fmt
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
#include <linux/amba/bus.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/bits.h>
#include <linux/bug.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/export.h>
#include <linux/slab.h>
#include <linux/errno.h>
#include <linux/host1x_context_bus.h>
#include <linux/iommu.h>
#include <linux/iommufd.h>
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
#include <linux/idr.h>
#include <linux/err.h>
#include <linux/pci.h>
#include <linux/pci-ats.h>
#include <linux/bitops.h>
#include <linux/platform_device.h>
#include <linux/property.h>
#include <linux/fsl/mc.h>
#include <linux/module.h>
#include <linux/cc_platform.h>
#include <linux/cdx/cdx_bus.h>
#include <trace/events/iommu.h>
#include <linux/sched/mm.h>
#include <linux/msi.h>
#include <uapi/linux/iommufd.h>
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
#include "dma-iommu.h"
iommu: Introduce a new iommu_group_replace_domain() API qemu has a need to replace the translations associated with a domain when the guest does large-scale operations like switching between an IDENTITY domain and, say, dma-iommu.c. Currently, it does this by replacing all the mappings in a single domain, but this is very inefficient and means that domains have to be per-device rather than per-translation. Provide a high-level API to allow replacements of one domain with another. This is similar to a detach/attach cycle except it doesn't force the group to go to the blocking domain in-between. By removing this forced blocking domain the iommu driver has the opportunity to implement a non-disruptive replacement of the domain to the greatest extent its hardware allows. This allows the qemu emulation of the vIOMMU to be more complete, as real hardware often has a non-distruptive replacement capability. It could be possible to address this by simply removing the protection from the iommu_attach_group(), but it is not so clear if that is safe for the few users. Thus, add a new API to serve this new purpose. All drivers are already required to support changing between active UNMANAGED domains when using their attach_dev ops. This API is expected to be used only by IOMMUFD, so add to the iommu-priv header and mark it as IOMMUFD_INTERNAL. Link: https://lore.kernel.org/r/13-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-17 15:12:09 -03:00
#include "iommu-priv.h"
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
static DEFINE_IDA(iommu_global_pasid_ida);
static unsigned int iommu_def_domain_type __read_mostly;
static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_DMA_STRICT);
static u32 iommu_cmd_line __read_mostly;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
/* Tags used with xa_tag_pointer() in group->pasid_array */
enum { IOMMU_PASID_ARRAY_DOMAIN = 0, IOMMU_PASID_ARRAY_HANDLE = 1 };
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
struct xarray pasid_array;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
char *name;
int id;
struct iommu_domain *default_domain;
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
struct iommu_domain *blocking_domain;
struct iommu_domain *domain;
struct list_head entry;
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
unsigned int owner_cnt;
void *owner;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
};
struct group_device {
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
struct list_head list;
struct device *dev;
char *name;
};
/* Iterate over each struct group_device in a struct iommu_group */
#define for_each_group_device(group, pos) \
list_for_each_entry(pos, &(group)->devices, list)
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
struct iommu_group_attribute {
struct attribute attr;
ssize_t (*show)(struct iommu_group *group, char *buf);
ssize_t (*store)(struct iommu_group *group,
const char *buf, size_t count);
};
static const char * const iommu_group_resv_type_string[] = {
iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions Introduce a new type for reserved region. This corresponds to directly mapped regions which are known to be relaxable in some specific conditions, such as device assignment use case. Well known examples are those used by USB controllers providing PS/2 keyboard emulation for pre-boot BIOS and early BOOT or RMRRs associated to IGD working in legacy mode. Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow RMRR on graphics devices too"), those regions are currently considered "safe" with respect to device assignment use case which requires a non direct mapping at IOMMU physical level (RAM GPA -> HPA mapping). Those RMRRs currently exist and sometimes the device is attempting to access it but this has not been considered an issue until now. However at the moment, iommu_get_group_resv_regions() is not able to make any difference between directly mapped regions: those which must be absolutely enforced and those like above ones which are known as relaxable. This is a blocker for reporting severe conflicts between non relaxable RMRRs (like MSI doorbells) and guest GPA space. With this new reserved region type we will be able to use iommu_get_group_resv_regions() to enumerate the IOVA space that is usable through the IOMMU API without introducing regressions with respect to existing device assignment use cases (USB and IGD). Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-03 08:53:35 +02:00
[IOMMU_RESV_DIRECT] = "direct",
[IOMMU_RESV_DIRECT_RELAXABLE] = "direct-relaxable",
[IOMMU_RESV_RESERVED] = "reserved",
[IOMMU_RESV_MSI] = "msi",
[IOMMU_RESV_SW_MSI] = "msi",
};
#define IOMMU_CMD_LINE_DMA_API BIT(0)
#define IOMMU_CMD_LINE_STRICT BIT(1)
static int bus_iommu_probe(const struct bus_type *bus);
static int iommu_bus_notifier(struct notifier_block *nb,
unsigned long action, void *data);
static void iommu_release_device(struct device *dev);
static int __iommu_attach_device(struct iommu_domain *domain,
struct device *dev);
static int __iommu_attach_group(struct iommu_domain *domain,
struct iommu_group *group);
static struct iommu_domain *__iommu_paging_domain_alloc_flags(struct device *dev,
unsigned int type,
unsigned int flags);
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
enum {
IOMMU_SET_DOMAIN_MUST_SUCCEED = 1 << 0,
};
static int __iommu_device_set_domain(struct iommu_group *group,
struct device *dev,
struct iommu_domain *new_domain,
unsigned int flags);
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
static int __iommu_group_set_domain_internal(struct iommu_group *group,
struct iommu_domain *new_domain,
unsigned int flags);
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
static int __iommu_group_set_domain(struct iommu_group *group,
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
struct iommu_domain *new_domain)
{
return __iommu_group_set_domain_internal(group, new_domain, 0);
}
static void __iommu_group_set_domain_nofail(struct iommu_group *group,
struct iommu_domain *new_domain)
{
WARN_ON(__iommu_group_set_domain_internal(
group, new_domain, IOMMU_SET_DOMAIN_MUST_SUCCEED));
}
static int iommu_setup_default_domain(struct iommu_group *group,
int target_type);
static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
struct device *dev);
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
static ssize_t iommu_group_store_type(struct iommu_group *group,
const char *buf, size_t count);
static struct group_device *iommu_group_alloc_device(struct iommu_group *group,
struct device *dev);
static void __iommu_group_free_device(struct iommu_group *group,
struct group_device *grp_dev);
static void iommu_domain_init(struct iommu_domain *domain, unsigned int type,
const struct iommu_ops *ops);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
#define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \
struct iommu_group_attribute iommu_group_attr_##_name = \
__ATTR(_name, _mode, _show, _store)
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
#define to_iommu_group_attr(_attr) \
container_of(_attr, struct iommu_group_attribute, attr)
#define to_iommu_group(_kobj) \
container_of(_kobj, struct iommu_group, kobj)
static LIST_HEAD(iommu_device_list);
static DEFINE_SPINLOCK(iommu_device_lock);
static const struct bus_type * const iommu_buses[] = {
&platform_bus_type,
#ifdef CONFIG_PCI
&pci_bus_type,
#endif
#ifdef CONFIG_ARM_AMBA
&amba_bustype,
#endif
#ifdef CONFIG_FSL_MC_BUS
&fsl_mc_bus_type,
#endif
#ifdef CONFIG_TEGRA_HOST1X_CONTEXT_BUS
&host1x_context_device_bus_type,
#endif
#ifdef CONFIG_CDX_BUS
&cdx_bus_type,
#endif
};
/*
* Use a function instead of an array here because the domain-type is a
* bit-field, so an array would waste memory.
*/
static const char *iommu_domain_type_str(unsigned int t)
{
switch (t) {
case IOMMU_DOMAIN_BLOCKED:
return "Blocked";
case IOMMU_DOMAIN_IDENTITY:
return "Passthrough";
case IOMMU_DOMAIN_UNMANAGED:
return "Unmanaged";
case IOMMU_DOMAIN_DMA:
case IOMMU_DOMAIN_DMA_FQ:
return "Translated";
case IOMMU_DOMAIN_PLATFORM:
return "Platform";
default:
return "Unknown";
}
}
static int __init iommu_subsys_init(void)
{
struct notifier_block *nb;
if (!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API)) {
if (IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH))
iommu_set_default_passthrough(false);
else
iommu_set_default_translated(false);
if (iommu_default_passthrough() && cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
pr_info("Memory encryption detected - Disabling default IOMMU Passthrough\n");
iommu_set_default_translated(false);
}
}
if (!iommu_default_passthrough() && !iommu_dma_strict)
iommu_def_domain_type = IOMMU_DOMAIN_DMA_FQ;
pr_info("Default domain type: %s%s\n",
iommu_domain_type_str(iommu_def_domain_type),
(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ?
" (set via kernel command line)" : "");
if (!iommu_default_passthrough())
pr_info("DMA domain TLB invalidation policy: %s mode%s\n",
iommu_dma_strict ? "strict" : "lazy",
(iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ?
" (set via kernel command line)" : "");
nb = kcalloc(ARRAY_SIZE(iommu_buses), sizeof(*nb), GFP_KERNEL);
if (!nb)
return -ENOMEM;
for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {
nb[i].notifier_call = iommu_bus_notifier;
bus_register_notifier(iommu_buses[i], &nb[i]);
}
return 0;
}
subsys_initcall(iommu_subsys_init);
static int remove_iommu_group(struct device *dev, void *data)
{
if (dev->iommu && dev->iommu->iommu_dev == data)
iommu_release_device(dev);
return 0;
}
/**
* iommu_device_register() - Register an IOMMU hardware instance
* @iommu: IOMMU handle for the instance
* @ops: IOMMU ops to associate with the instance
* @hwdev: (optional) actual instance device, used for fwnode lookup
*
* Return: 0 on success, or an error.
*/
int iommu_device_register(struct iommu_device *iommu,
const struct iommu_ops *ops, struct device *hwdev)
{
int err = 0;
/* We need to be able to take module references appropriately */
if (WARN_ON(is_module_address((unsigned long)ops) && !ops->owner))
return -EINVAL;
iommu->ops = ops;
if (hwdev)
iommu->fwnode = dev_fwnode(hwdev);
spin_lock(&iommu_device_lock);
list_add_tail(&iommu->list, &iommu_device_list);
spin_unlock(&iommu_device_lock);
for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)
err = bus_iommu_probe(iommu_buses[i]);
if (err)
iommu_device_unregister(iommu);
iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit b46064a18810 ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 18:41:28 +01:00
else
WRITE_ONCE(iommu->ready, true);
return err;
}
EXPORT_SYMBOL_GPL(iommu_device_register);
void iommu_device_unregister(struct iommu_device *iommu)
{
for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++)
bus_for_each_dev(iommu_buses[i], NULL, iommu, remove_iommu_group);
spin_lock(&iommu_device_lock);
list_del(&iommu->list);
spin_unlock(&iommu_device_lock);
/* Pairs with the alloc in generic_single_device_group() */
iommu_group_put(iommu->singleton_group);
iommu->singleton_group = NULL;
}
EXPORT_SYMBOL_GPL(iommu_device_unregister);
#if IS_ENABLED(CONFIG_IOMMUFD_TEST)
void iommu_device_unregister_bus(struct iommu_device *iommu,
const struct bus_type *bus,
struct notifier_block *nb)
{
bus_unregister_notifier(bus, nb);
iommu_device_unregister(iommu);
}
EXPORT_SYMBOL_GPL(iommu_device_unregister_bus);
/*
* Register an iommu driver against a single bus. This is only used by iommufd
* selftest to create a mock iommu driver. The caller must provide
* some memory to hold a notifier_block.
*/
int iommu_device_register_bus(struct iommu_device *iommu,
const struct iommu_ops *ops,
const struct bus_type *bus,
struct notifier_block *nb)
{
int err;
iommu->ops = ops;
nb->notifier_call = iommu_bus_notifier;
err = bus_register_notifier(bus, nb);
if (err)
return err;
spin_lock(&iommu_device_lock);
list_add_tail(&iommu->list, &iommu_device_list);
spin_unlock(&iommu_device_lock);
err = bus_iommu_probe(bus);
if (err) {
iommu_device_unregister_bus(iommu, bus, nb);
return err;
}
return 0;
}
EXPORT_SYMBOL_GPL(iommu_device_register_bus);
#endif
static struct dev_iommu *dev_iommu_get(struct device *dev)
{
struct dev_iommu *param = dev->iommu;
lockdep_assert_held(&iommu_probe_device_lock);
if (param)
return param;
param = kzalloc(sizeof(*param), GFP_KERNEL);
if (!param)
return NULL;
mutex_init(&param->lock);
dev->iommu = param;
return param;
}
void dev_iommu_free(struct device *dev)
{
iommu: Fix potential use-after-free during probe Kasan has reported the following use after free on dev->iommu. when a device probe fails and it is in process of freeing dev->iommu in dev_iommu_free function, a deferred_probe_work_func runs in parallel and tries to access dev->iommu->fwspec in of_iommu_configure path thus causing use after free. BUG: KASAN: use-after-free in of_iommu_configure+0xb4/0x4a4 Read of size 8 at addr ffffff87a2f1acb8 by task kworker/u16:2/153 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace+0x0/0x33c show_stack+0x18/0x24 dump_stack_lvl+0x16c/0x1e0 print_address_description+0x84/0x39c __kasan_report+0x184/0x308 kasan_report+0x50/0x78 __asan_load8+0xc0/0xc4 of_iommu_configure+0xb4/0x4a4 of_dma_configure_id+0x2fc/0x4d4 platform_dma_configure+0x40/0x5c really_probe+0x1b4/0xb74 driver_probe_device+0x11c/0x228 __device_attach_driver+0x14c/0x304 bus_for_each_drv+0x124/0x1b0 __device_attach+0x25c/0x334 device_initial_probe+0x24/0x34 bus_probe_device+0x78/0x134 deferred_probe_work_func+0x130/0x1a8 process_one_work+0x4c8/0x970 worker_thread+0x5c8/0xaec kthread+0x1f8/0x220 ret_from_fork+0x10/0x18 Allocated by task 1: ____kasan_kmalloc+0xd4/0x114 __kasan_kmalloc+0x10/0x1c kmem_cache_alloc_trace+0xe4/0x3d4 __iommu_probe_device+0x90/0x394 probe_iommu_group+0x70/0x9c bus_for_each_dev+0x11c/0x19c bus_iommu_probe+0xb8/0x7d4 bus_set_iommu+0xcc/0x13c arm_smmu_bus_init+0x44/0x130 [arm_smmu] arm_smmu_device_probe+0xb88/0xc54 [arm_smmu] platform_drv_probe+0xe4/0x13c really_probe+0x2c8/0xb74 driver_probe_device+0x11c/0x228 device_driver_attach+0xf0/0x16c __driver_attach+0x80/0x320 bus_for_each_dev+0x11c/0x19c driver_attach+0x38/0x48 bus_add_driver+0x1dc/0x3a4 driver_register+0x18c/0x244 __platform_driver_register+0x88/0x9c init_module+0x64/0xff4 [arm_smmu] do_one_initcall+0x17c/0x2f0 do_init_module+0xe8/0x378 load_module+0x3f80/0x4a40 __se_sys_finit_module+0x1a0/0x1e4 __arm64_sys_finit_module+0x44/0x58 el0_svc_common+0x100/0x264 do_el0_svc+0x38/0xa4 el0_svc+0x20/0x30 el0_sync_handler+0x68/0xac el0_sync+0x160/0x180 Freed by task 1: kasan_set_track+0x4c/0x84 kasan_set_free_info+0x28/0x4c ____kasan_slab_free+0x120/0x15c __kasan_slab_free+0x18/0x28 slab_free_freelist_hook+0x204/0x2fc kfree+0xfc/0x3a4 __iommu_probe_device+0x284/0x394 probe_iommu_group+0x70/0x9c bus_for_each_dev+0x11c/0x19c bus_iommu_probe+0xb8/0x7d4 bus_set_iommu+0xcc/0x13c arm_smmu_bus_init+0x44/0x130 [arm_smmu] arm_smmu_device_probe+0xb88/0xc54 [arm_smmu] platform_drv_probe+0xe4/0x13c really_probe+0x2c8/0xb74 driver_probe_device+0x11c/0x228 device_driver_attach+0xf0/0x16c __driver_attach+0x80/0x320 bus_for_each_dev+0x11c/0x19c driver_attach+0x38/0x48 bus_add_driver+0x1dc/0x3a4 driver_register+0x18c/0x244 __platform_driver_register+0x88/0x9c init_module+0x64/0xff4 [arm_smmu] do_one_initcall+0x17c/0x2f0 do_init_module+0xe8/0x378 load_module+0x3f80/0x4a40 __se_sys_finit_module+0x1a0/0x1e4 __arm64_sys_finit_module+0x44/0x58 el0_svc_common+0x100/0x264 do_el0_svc+0x38/0xa4 el0_svc+0x20/0x30 el0_sync_handler+0x68/0xac el0_sync+0x160/0x180 Fix this by setting dev->iommu to NULL first and then freeing dev_iommu structure in dev_iommu_free function. Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com> Link: https://lore.kernel.org/r/1643613155-20215-1-git-send-email-quic_vjitta@quicinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-01-31 12:42:35 +05:30
struct dev_iommu *param = dev->iommu;
dev->iommu = NULL;
iommu: Fix potential use-after-free during probe Kasan has reported the following use after free on dev->iommu. when a device probe fails and it is in process of freeing dev->iommu in dev_iommu_free function, a deferred_probe_work_func runs in parallel and tries to access dev->iommu->fwspec in of_iommu_configure path thus causing use after free. BUG: KASAN: use-after-free in of_iommu_configure+0xb4/0x4a4 Read of size 8 at addr ffffff87a2f1acb8 by task kworker/u16:2/153 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace+0x0/0x33c show_stack+0x18/0x24 dump_stack_lvl+0x16c/0x1e0 print_address_description+0x84/0x39c __kasan_report+0x184/0x308 kasan_report+0x50/0x78 __asan_load8+0xc0/0xc4 of_iommu_configure+0xb4/0x4a4 of_dma_configure_id+0x2fc/0x4d4 platform_dma_configure+0x40/0x5c really_probe+0x1b4/0xb74 driver_probe_device+0x11c/0x228 __device_attach_driver+0x14c/0x304 bus_for_each_drv+0x124/0x1b0 __device_attach+0x25c/0x334 device_initial_probe+0x24/0x34 bus_probe_device+0x78/0x134 deferred_probe_work_func+0x130/0x1a8 process_one_work+0x4c8/0x970 worker_thread+0x5c8/0xaec kthread+0x1f8/0x220 ret_from_fork+0x10/0x18 Allocated by task 1: ____kasan_kmalloc+0xd4/0x114 __kasan_kmalloc+0x10/0x1c kmem_cache_alloc_trace+0xe4/0x3d4 __iommu_probe_device+0x90/0x394 probe_iommu_group+0x70/0x9c bus_for_each_dev+0x11c/0x19c bus_iommu_probe+0xb8/0x7d4 bus_set_iommu+0xcc/0x13c arm_smmu_bus_init+0x44/0x130 [arm_smmu] arm_smmu_device_probe+0xb88/0xc54 [arm_smmu] platform_drv_probe+0xe4/0x13c really_probe+0x2c8/0xb74 driver_probe_device+0x11c/0x228 device_driver_attach+0xf0/0x16c __driver_attach+0x80/0x320 bus_for_each_dev+0x11c/0x19c driver_attach+0x38/0x48 bus_add_driver+0x1dc/0x3a4 driver_register+0x18c/0x244 __platform_driver_register+0x88/0x9c init_module+0x64/0xff4 [arm_smmu] do_one_initcall+0x17c/0x2f0 do_init_module+0xe8/0x378 load_module+0x3f80/0x4a40 __se_sys_finit_module+0x1a0/0x1e4 __arm64_sys_finit_module+0x44/0x58 el0_svc_common+0x100/0x264 do_el0_svc+0x38/0xa4 el0_svc+0x20/0x30 el0_sync_handler+0x68/0xac el0_sync+0x160/0x180 Freed by task 1: kasan_set_track+0x4c/0x84 kasan_set_free_info+0x28/0x4c ____kasan_slab_free+0x120/0x15c __kasan_slab_free+0x18/0x28 slab_free_freelist_hook+0x204/0x2fc kfree+0xfc/0x3a4 __iommu_probe_device+0x284/0x394 probe_iommu_group+0x70/0x9c bus_for_each_dev+0x11c/0x19c bus_iommu_probe+0xb8/0x7d4 bus_set_iommu+0xcc/0x13c arm_smmu_bus_init+0x44/0x130 [arm_smmu] arm_smmu_device_probe+0xb88/0xc54 [arm_smmu] platform_drv_probe+0xe4/0x13c really_probe+0x2c8/0xb74 driver_probe_device+0x11c/0x228 device_driver_attach+0xf0/0x16c __driver_attach+0x80/0x320 bus_for_each_dev+0x11c/0x19c driver_attach+0x38/0x48 bus_add_driver+0x1dc/0x3a4 driver_register+0x18c/0x244 __platform_driver_register+0x88/0x9c init_module+0x64/0xff4 [arm_smmu] do_one_initcall+0x17c/0x2f0 do_init_module+0xe8/0x378 load_module+0x3f80/0x4a40 __se_sys_finit_module+0x1a0/0x1e4 __arm64_sys_finit_module+0x44/0x58 el0_svc_common+0x100/0x264 do_el0_svc+0x38/0xa4 el0_svc+0x20/0x30 el0_sync_handler+0x68/0xac el0_sync+0x160/0x180 Fix this by setting dev->iommu to NULL first and then freeing dev_iommu structure in dev_iommu_free function. Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com> Link: https://lore.kernel.org/r/1643613155-20215-1-git-send-email-quic_vjitta@quicinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-01-31 12:42:35 +05:30
if (param->fwspec) {
fwnode_handle_put(param->fwspec->iommu_fwnode);
kfree(param->fwspec);
}
kfree(param);
}
/*
* Internal equivalent of device_iommu_mapped() for when we care that a device
* actually has API ops, and don't want false positives from VFIO-only groups.
*/
static bool dev_has_iommu(struct device *dev)
{
return dev->iommu && dev->iommu->iommu_dev;
}
static u32 dev_iommu_get_max_pasids(struct device *dev)
{
u32 max_pasids = 0, bits = 0;
int ret;
if (dev_is_pci(dev)) {
ret = pci_max_pasids(to_pci_dev(dev));
if (ret > 0)
max_pasids = ret;
} else {
ret = device_property_read_u32(dev, "pasid-num-bits", &bits);
if (!ret)
max_pasids = 1UL << bits;
}
return min_t(u32, max_pasids, dev->iommu->iommu_dev->max_pasids);
}
void dev_iommu_priv_set(struct device *dev, void *priv)
{
/* FSL_PAMU does something weird */
if (!IS_ENABLED(CONFIG_FSL_PAMU))
lockdep_assert_held(&iommu_probe_device_lock);
dev->iommu->priv = priv;
}
EXPORT_SYMBOL_GPL(dev_iommu_priv_set);
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
/*
* Init the dev->iommu and dev->iommu_group in the struct device and get the
* driver probed
*/
static int iommu_init_device(struct device *dev)
{
const struct iommu_ops *ops;
struct iommu_device *iommu_dev;
struct iommu_group *group;
int ret;
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
if (!dev_iommu_get(dev))
return -ENOMEM;
/*
iommu: Get DT/ACPI parsing into the proper probe path In hindsight, there were some crucial subtleties overlooked when moving {of,acpi}_dma_configure() to driver probe time to allow waiting for IOMMU drivers with -EPROBE_DEFER, and these have become an ever-increasing source of problems. The IOMMU API has some fundamental assumptions that iommu_probe_device() is called for every device added to the system, in the order in which they are added. Calling it in a random order or not at all dependent on driver binding leads to malformed groups, a potential lack of isolation for devices with no driver, and all manner of unexpected concurrency and race conditions. We've attempted to mitigate the latter with point-fix bodges like iommu_probe_device_lock, but it's a losing battle and the time has come to bite the bullet and address the true source of the problem instead. The crux of the matter is that the firmware parsing actually serves two distinct purposes; one is identifying the IOMMU instance associated with a device so we can check its availability, the second is actually telling that instance about the relevant firmware-provided data for the device. However the latter also depends on the former, and at the time there was no good place to defer and retry that separately from the availability check we also wanted for client driver probe. Nowadays, though, we have a proper notion of multiple IOMMU instances in the core API itself, and each one gets a chance to probe its own devices upon registration, so we can finally make that work as intended for DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to be able to run the iommu_fwspec machinery currently buried deep in the wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be surprisingly straightforward to bootstrap this transformation by pretty much just calling the same path twice. At client driver probe time, dev->driver is obviously set; conversely at device_add(), or a subsequent bus_iommu_probe(), any device waiting for an IOMMU really should *not* have a driver already, so we can use that as a condition to disambiguate the two cases, and avoid recursing back into the IOMMU core at the wrong times. Obviously this isn't the nicest thing, but for now it gives us a functional baseline to then unpick the layers in between without many more awkward cross-subsystem patches. There are some minor side-effects like dma_range_map potentially being created earlier, and some debug prints being repeated, but these aren't significantly detrimental. Let's make things work first, then deal with making them nice. With the basic flow finally in the right order again, the next step is probably turning the bus->dma_configure paths inside-out, since all we really need from bus code is its notion of which device and input ID(s) to parse the common firmware properties with... Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28 15:46:33 +00:00
* For FDT-based systems and ACPI IORT/VIOT, the common firmware parsing
* is buried in the bus dma_configure path. Properly unpicking that is
* still a big job, so for now just invoke the whole thing. The device
* already having a driver bound means dma_configure has already run and
* found no IOMMU to wait for, so there's no point calling it again.
iommu: Get DT/ACPI parsing into the proper probe path In hindsight, there were some crucial subtleties overlooked when moving {of,acpi}_dma_configure() to driver probe time to allow waiting for IOMMU drivers with -EPROBE_DEFER, and these have become an ever-increasing source of problems. The IOMMU API has some fundamental assumptions that iommu_probe_device() is called for every device added to the system, in the order in which they are added. Calling it in a random order or not at all dependent on driver binding leads to malformed groups, a potential lack of isolation for devices with no driver, and all manner of unexpected concurrency and race conditions. We've attempted to mitigate the latter with point-fix bodges like iommu_probe_device_lock, but it's a losing battle and the time has come to bite the bullet and address the true source of the problem instead. The crux of the matter is that the firmware parsing actually serves two distinct purposes; one is identifying the IOMMU instance associated with a device so we can check its availability, the second is actually telling that instance about the relevant firmware-provided data for the device. However the latter also depends on the former, and at the time there was no good place to defer and retry that separately from the availability check we also wanted for client driver probe. Nowadays, though, we have a proper notion of multiple IOMMU instances in the core API itself, and each one gets a chance to probe its own devices upon registration, so we can finally make that work as intended for DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to be able to run the iommu_fwspec machinery currently buried deep in the wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be surprisingly straightforward to bootstrap this transformation by pretty much just calling the same path twice. At client driver probe time, dev->driver is obviously set; conversely at device_add(), or a subsequent bus_iommu_probe(), any device waiting for an IOMMU really should *not* have a driver already, so we can use that as a condition to disambiguate the two cases, and avoid recursing back into the IOMMU core at the wrong times. Obviously this isn't the nicest thing, but for now it gives us a functional baseline to then unpick the layers in between without many more awkward cross-subsystem patches. There are some minor side-effects like dma_range_map potentially being created earlier, and some debug prints being repeated, but these aren't significantly detrimental. Let's make things work first, then deal with making them nice. With the basic flow finally in the right order again, the next step is probably turning the bus->dma_configure paths inside-out, since all we really need from bus code is its notion of which device and input ID(s) to parse the common firmware properties with... Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28 15:46:33 +00:00
*/
if (!dev->iommu->fwspec && !dev->driver && dev->bus->dma_configure) {
iommu: Get DT/ACPI parsing into the proper probe path In hindsight, there were some crucial subtleties overlooked when moving {of,acpi}_dma_configure() to driver probe time to allow waiting for IOMMU drivers with -EPROBE_DEFER, and these have become an ever-increasing source of problems. The IOMMU API has some fundamental assumptions that iommu_probe_device() is called for every device added to the system, in the order in which they are added. Calling it in a random order or not at all dependent on driver binding leads to malformed groups, a potential lack of isolation for devices with no driver, and all manner of unexpected concurrency and race conditions. We've attempted to mitigate the latter with point-fix bodges like iommu_probe_device_lock, but it's a losing battle and the time has come to bite the bullet and address the true source of the problem instead. The crux of the matter is that the firmware parsing actually serves two distinct purposes; one is identifying the IOMMU instance associated with a device so we can check its availability, the second is actually telling that instance about the relevant firmware-provided data for the device. However the latter also depends on the former, and at the time there was no good place to defer and retry that separately from the availability check we also wanted for client driver probe. Nowadays, though, we have a proper notion of multiple IOMMU instances in the core API itself, and each one gets a chance to probe its own devices upon registration, so we can finally make that work as intended for DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to be able to run the iommu_fwspec machinery currently buried deep in the wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be surprisingly straightforward to bootstrap this transformation by pretty much just calling the same path twice. At client driver probe time, dev->driver is obviously set; conversely at device_add(), or a subsequent bus_iommu_probe(), any device waiting for an IOMMU really should *not* have a driver already, so we can use that as a condition to disambiguate the two cases, and avoid recursing back into the IOMMU core at the wrong times. Obviously this isn't the nicest thing, but for now it gives us a functional baseline to then unpick the layers in between without many more awkward cross-subsystem patches. There are some minor side-effects like dma_range_map potentially being created earlier, and some debug prints being repeated, but these aren't significantly detrimental. Let's make things work first, then deal with making them nice. With the basic flow finally in the right order again, the next step is probably turning the bus->dma_configure paths inside-out, since all we really need from bus code is its notion of which device and input ID(s) to parse the common firmware properties with... Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28 15:46:33 +00:00
mutex_unlock(&iommu_probe_device_lock);
dev->bus->dma_configure(dev);
mutex_lock(&iommu_probe_device_lock);
/* If another instance finished the job for us, skip it */
if (!dev->iommu || dev->iommu_group)
return -ENODEV;
iommu: Get DT/ACPI parsing into the proper probe path In hindsight, there were some crucial subtleties overlooked when moving {of,acpi}_dma_configure() to driver probe time to allow waiting for IOMMU drivers with -EPROBE_DEFER, and these have become an ever-increasing source of problems. The IOMMU API has some fundamental assumptions that iommu_probe_device() is called for every device added to the system, in the order in which they are added. Calling it in a random order or not at all dependent on driver binding leads to malformed groups, a potential lack of isolation for devices with no driver, and all manner of unexpected concurrency and race conditions. We've attempted to mitigate the latter with point-fix bodges like iommu_probe_device_lock, but it's a losing battle and the time has come to bite the bullet and address the true source of the problem instead. The crux of the matter is that the firmware parsing actually serves two distinct purposes; one is identifying the IOMMU instance associated with a device so we can check its availability, the second is actually telling that instance about the relevant firmware-provided data for the device. However the latter also depends on the former, and at the time there was no good place to defer and retry that separately from the availability check we also wanted for client driver probe. Nowadays, though, we have a proper notion of multiple IOMMU instances in the core API itself, and each one gets a chance to probe its own devices upon registration, so we can finally make that work as intended for DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to be able to run the iommu_fwspec machinery currently buried deep in the wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be surprisingly straightforward to bootstrap this transformation by pretty much just calling the same path twice. At client driver probe time, dev->driver is obviously set; conversely at device_add(), or a subsequent bus_iommu_probe(), any device waiting for an IOMMU really should *not* have a driver already, so we can use that as a condition to disambiguate the two cases, and avoid recursing back into the IOMMU core at the wrong times. Obviously this isn't the nicest thing, but for now it gives us a functional baseline to then unpick the layers in between without many more awkward cross-subsystem patches. There are some minor side-effects like dma_range_map potentially being created earlier, and some debug prints being repeated, but these aren't significantly detrimental. Let's make things work first, then deal with making them nice. With the basic flow finally in the right order again, the next step is probably turning the bus->dma_configure paths inside-out, since all we really need from bus code is its notion of which device and input ID(s) to parse the common firmware properties with... Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28 15:46:33 +00:00
}
/*
* At this point, relevant devices either now have a fwspec which will
* match ops registered with a non-NULL fwnode, or we can reasonably
* assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
* be present, and that any of their registered instances has suitable
* ops for probing, and thus cheekily co-opt the same mechanism.
*/
ops = iommu_fwspec_ops(dev->iommu->fwspec);
if (!ops) {
ret = -ENODEV;
goto err_free;
}
if (!try_module_get(ops->owner)) {
ret = -EINVAL;
goto err_free;
}
iommu_dev = ops->probe_device(dev);
if (IS_ERR(iommu_dev)) {
ret = PTR_ERR(iommu_dev);
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
goto err_module_put;
}
dev->iommu->iommu_dev = iommu_dev;
ret = iommu_device_link(iommu_dev, dev);
if (ret)
goto err_release;
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
group = ops->device_group(dev);
if (WARN_ON_ONCE(group == NULL))
group = ERR_PTR(-EINVAL);
if (IS_ERR(group)) {
ret = PTR_ERR(group);
goto err_unlink;
}
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
dev->iommu_group = group;
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
dev->iommu->max_pasids = dev_iommu_get_max_pasids(dev);
if (ops->is_attach_deferred)
dev->iommu->attach_deferred = ops->is_attach_deferred(dev);
return 0;
err_unlink:
iommu_device_unlink(iommu_dev, dev);
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
err_release:
if (ops->release_device)
ops->release_device(dev);
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
err_module_put:
module_put(ops->owner);
err_free:
dev->iommu->iommu_dev = NULL;
dev_iommu_free(dev);
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
return ret;
}
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
static void iommu_deinit_device(struct device *dev)
{
struct iommu_group *group = dev->iommu_group;
const struct iommu_ops *ops = dev_iommu_ops(dev);
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
lockdep_assert_held(&group->mutex);
iommu_device_unlink(dev->iommu->iommu_dev, dev);
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
/*
* release_device() must stop using any attached domain on the device.
iommu: Add static iommu_ops->release_domain The current device_release callback for individual iommu drivers does the following: 1) Silent IOMMU DMA translation: It detaches any existing domain from the device and puts it into a blocking state (some drivers might use the identity state). 2) Resource release: It releases resources allocated during the device_probe callback and restores the device to its pre-probe state. Step 1 is challenging for individual iommu drivers because each must check if a domain is already attached to the device. Additionally, if a deferred attach never occurred, the device_release should avoid modifying hardware configuration regardless of the reason for its call. To simplify this process, introduce a static release_domain within the iommu_ops structure. It can be either a blocking or identity domain depending on the iommu hardware. The iommu core will decide whether to attach this domain before the device_release callback, eliminating the need for repetitive code in various drivers. Consequently, the device_release callback can focus solely on the opposite operations of device_probe, including releasing all resources allocated during that callback. Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240305013305.204605-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-05 20:21:17 +08:00
* If there are still other devices in the group, they are not affected
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
* by this callback.
*
iommu: Add static iommu_ops->release_domain The current device_release callback for individual iommu drivers does the following: 1) Silent IOMMU DMA translation: It detaches any existing domain from the device and puts it into a blocking state (some drivers might use the identity state). 2) Resource release: It releases resources allocated during the device_probe callback and restores the device to its pre-probe state. Step 1 is challenging for individual iommu drivers because each must check if a domain is already attached to the device. Additionally, if a deferred attach never occurred, the device_release should avoid modifying hardware configuration regardless of the reason for its call. To simplify this process, introduce a static release_domain within the iommu_ops structure. It can be either a blocking or identity domain depending on the iommu hardware. The iommu core will decide whether to attach this domain before the device_release callback, eliminating the need for repetitive code in various drivers. Consequently, the device_release callback can focus solely on the opposite operations of device_probe, including releasing all resources allocated during that callback. Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240305013305.204605-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-05 20:21:17 +08:00
* If the iommu driver provides release_domain, the core code ensures
* that domain is attached prior to calling release_device. Drivers can
* use this to enforce a translation on the idle iommu. Typically, the
* global static blocked_domain is a good choice.
*
* Otherwise, the iommu driver must set the device to either an identity
* or a blocking translation in release_device() and stop using any
* domain pointer, as it is going to be freed.
*
* Regardless, if a delayed attach never occurred, then the release
* should still avoid touching any hardware configuration either.
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
*/
iommu: Add static iommu_ops->release_domain The current device_release callback for individual iommu drivers does the following: 1) Silent IOMMU DMA translation: It detaches any existing domain from the device and puts it into a blocking state (some drivers might use the identity state). 2) Resource release: It releases resources allocated during the device_probe callback and restores the device to its pre-probe state. Step 1 is challenging for individual iommu drivers because each must check if a domain is already attached to the device. Additionally, if a deferred attach never occurred, the device_release should avoid modifying hardware configuration regardless of the reason for its call. To simplify this process, introduce a static release_domain within the iommu_ops structure. It can be either a blocking or identity domain depending on the iommu hardware. The iommu core will decide whether to attach this domain before the device_release callback, eliminating the need for repetitive code in various drivers. Consequently, the device_release callback can focus solely on the opposite operations of device_probe, including releasing all resources allocated during that callback. Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240305013305.204605-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-05 20:21:17 +08:00
if (!dev->iommu->attach_deferred && ops->release_domain)
ops->release_domain->ops->attach_dev(ops->release_domain, dev);
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
if (ops->release_device)
ops->release_device(dev);
/*
* If this is the last driver to use the group then we must free the
* domains before we do the module_put().
*/
if (list_empty(&group->devices)) {
if (group->default_domain) {
iommu_domain_free(group->default_domain);
group->default_domain = NULL;
}
if (group->blocking_domain) {
iommu_domain_free(group->blocking_domain);
group->blocking_domain = NULL;
}
group->domain = NULL;
}
/* Caller must put iommu_group */
dev->iommu_group = NULL;
module_put(ops->owner);
dev_iommu_free(dev);
#ifdef CONFIG_IOMMU_DMA
dev->dma_iommu = false;
#endif
}
static struct iommu_domain *pasid_array_entry_to_domain(void *entry)
{
if (xa_pointer_tag(entry) == IOMMU_PASID_ARRAY_DOMAIN)
return xa_untag_pointer(entry);
return ((struct iommu_attach_handle *)xa_untag_pointer(entry))->domain;
}
DEFINE_MUTEX(iommu_probe_device_lock);
static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
{
struct iommu_group *group;
struct group_device *gdev;
int ret;
/*
* Serialise to avoid races between IOMMU drivers registering in
* parallel and/or the "replay" calls from ACPI/OF code via client
* driver probe. Once the latter have been cleaned up we should
* probably be able to use device_lock() here to minimise the scope,
* but for now enforcing a simple global ordering is fine.
*/
lockdep_assert_held(&iommu_probe_device_lock);
iommu: Have __iommu_probe_device() check for already probed devices This is a step toward making __iommu_probe_device() self contained. It should, under proper locking, check if the device is already associated with an iommu driver and resolve parallel probes. All but one of the callers open code this test using two different means, but they all rely on dev->iommu_group. Currently the bus_iommu_probe()/probe_iommu_group() and probe_acpi_namespace_devices() rejects already probed devices with an unlocked read of dev->iommu_group. The OF and ACPI "replay" functions use device_iommu_mapped() which is the same read without the pointless refcount. Move this test into __iommu_probe_device() and put it under the iommu_probe_device_lock. The store to dev->iommu_group is in iommu_group_add_device() which is also called under this lock for iommu driver devices, making it properly locked. The only path that didn't have this check is the hotplug path triggered by BUS_NOTIFY_ADD_DEVICE. The only way to get dev->iommu_group assigned outside the probe path is via iommu_group_add_device(). Today the only caller is VFIO no-iommu which never associates with an iommu driver. Thus adding this additional check is safe. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:39 -03:00
/* Device is probed already if in a group */
if (dev->iommu_group)
return 0;
ret = iommu_init_device(dev);
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
if (ret)
return ret;
/*
* And if we do now see any replay calls, they would indicate someone
* misusing the dma_configure path outside bus code.
*/
if (dev->driver)
dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n");
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
group = dev->iommu_group;
gdev = iommu_group_alloc_device(group, dev);
mutex_lock(&group->mutex);
if (IS_ERR(gdev)) {
ret = PTR_ERR(gdev);
goto err_put_group;
}
/*
* The gdev must be in the list before calling
* iommu_setup_default_domain()
*/
list_add_tail(&gdev->list, &group->devices);
WARN_ON(group->default_domain && !group->domain);
if (group->default_domain)
iommu_create_device_direct_mappings(group->default_domain, dev);
if (group->domain) {
ret = __iommu_device_set_domain(group, dev, group->domain, 0);
if (ret)
goto err_remove_gdev;
} else if (!group->default_domain && !group_list) {
ret = iommu_setup_default_domain(group, 0);
if (ret)
goto err_remove_gdev;
} else if (!group->default_domain) {
/*
* With a group_list argument we defer the default_domain setup
* to the caller by providing a de-duplicated list of groups
* that need further setup.
*/
if (list_empty(&group->entry))
list_add_tail(&group->entry, group_list);
}
iommu/dma: Centralise iommu_setup_dma_ops() It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only ever call iommu_setup_dma_ops() after a successful iommu_probe_device(), which means there should be no harm in achieving the same order of operations by running it off the back of iommu_probe_device() itself. This then puts it in line with the x86 and s390 .probe_finalize bodges, letting us pull it all into the main flow properly. As a bonus this lets us fold in and de-scope the PCI workaround setup as well. At this point we can also then pull the call up inside the group mutex, and avoid having to think about whether iommu_group_store_type() could theoretically race and free the domain if iommu_setup_dma_ops() ran just *before* iommu_device_use_default_domain() claims it... Furthermore we replace one .probe_finalize call completely, since the only remaining implementations are now one which only needs to run once for the initial boot-time probe, and two which themselves render that path unreachable. This leaves us a big step closer to realistically being able to unpick the variety of different things that iommu_setup_dma_ops() has been muddling together, and further streamline iommu-dma into core API flows in future. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/bebea331c1d688b34d9862eefd5ede47503961b8.1713523152.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-19 17:54:45 +01:00
if (group->default_domain)
iommu_setup_dma_ops(dev);
mutex_unlock(&group->mutex);
return 0;
err_remove_gdev:
list_del(&gdev->list);
__iommu_group_free_device(group, gdev);
err_put_group:
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
iommu_deinit_device(dev);
mutex_unlock(&group->mutex);
iommu_group_put(group);
return ret;
}
int iommu_probe_device(struct device *dev)
{
const struct iommu_ops *ops;
int ret;
mutex_lock(&iommu_probe_device_lock);
ret = __iommu_probe_device(dev, NULL);
mutex_unlock(&iommu_probe_device_lock);
if (ret)
return ret;
ops = dev_iommu_ops(dev);
if (ops->probe_finalize)
ops->probe_finalize(dev);
return 0;
}
static void __iommu_group_free_device(struct iommu_group *group,
struct group_device *grp_dev)
{
struct device *dev = grp_dev->dev;
sysfs_remove_link(group->devices_kobj, grp_dev->name);
sysfs_remove_link(&dev->kobj, "iommu_group");
trace_remove_device_from_group(group->id, dev);
/*
* If the group has become empty then ownership must have been
* released, and the current domain must be set back to NULL or
* the default domain.
*/
if (list_empty(&group->devices))
WARN_ON(group->owner_cnt ||
group->domain != group->default_domain);
kfree(grp_dev->name);
kfree(grp_dev);
}
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
/* Remove the iommu_group from the struct device. */
static void __iommu_group_remove_device(struct device *dev)
{
struct iommu_group *group = dev->iommu_group;
struct group_device *device;
mutex_lock(&group->mutex);
for_each_group_device(group, device) {
if (device->dev != dev)
continue;
list_del(&device->list);
__iommu_group_free_device(group, device);
if (dev_has_iommu(dev))
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
iommu_deinit_device(dev);
else
dev->iommu_group = NULL;
break;
}
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
mutex_unlock(&group->mutex);
/*
* Pairs with the get in iommu_init_device() or
* iommu_group_add_device()
*/
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
iommu_group_put(group);
}
static void iommu_release_device(struct device *dev)
{
struct iommu_group *group = dev->iommu_group;
if (group)
__iommu_group_remove_device(dev);
/* Free any fwspec if no iommu_driver was ever attached */
if (dev->iommu)
dev_iommu_free(dev);
}
static int __init iommu_set_def_domain_type(char *str)
{
bool pt;
int ret;
ret = kstrtobool(str, &pt);
if (ret)
return ret;
if (pt)
iommu_set_default_passthrough(true);
else
iommu_set_default_translated(true);
return 0;
}
early_param("iommu.passthrough", iommu_set_def_domain_type);
static int __init iommu_dma_setup(char *str)
{
int ret = kstrtobool(str, &iommu_dma_strict);
if (!ret)
iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
return ret;
}
early_param("iommu.strict", iommu_dma_setup);
void iommu_set_dma_strict(void)
{
iommu_dma_strict = true;
if (iommu_def_domain_type == IOMMU_DOMAIN_DMA_FQ)
iommu_def_domain_type = IOMMU_DOMAIN_DMA;
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
static ssize_t iommu_group_attr_show(struct kobject *kobj,
struct attribute *__attr, char *buf)
{
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
struct iommu_group *group = to_iommu_group(kobj);
ssize_t ret = -EIO;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
if (attr->show)
ret = attr->show(group, buf);
return ret;
}
static ssize_t iommu_group_attr_store(struct kobject *kobj,
struct attribute *__attr,
const char *buf, size_t count)
{
struct iommu_group_attribute *attr = to_iommu_group_attr(__attr);
struct iommu_group *group = to_iommu_group(kobj);
ssize_t ret = -EIO;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
if (attr->store)
ret = attr->store(group, buf, count);
return ret;
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
static const struct sysfs_ops iommu_group_sysfs_ops = {
.show = iommu_group_attr_show,
.store = iommu_group_attr_store,
};
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
static int iommu_group_create_file(struct iommu_group *group,
struct iommu_group_attribute *attr)
{
return sysfs_create_file(&group->kobj, &attr->attr);
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
static void iommu_group_remove_file(struct iommu_group *group,
struct iommu_group_attribute *attr)
{
sysfs_remove_file(&group->kobj, &attr->attr);
}
static ssize_t iommu_group_show_name(struct iommu_group *group, char *buf)
{
return sysfs_emit(buf, "%s\n", group->name);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
}
/**
* iommu_insert_resv_region - Insert a new region in the
* list of reserved regions.
* @new: new region to insert
* @regions: list of regions
*
* Elements are sorted by start address and overlapping segments
* of the same type are merged.
*/
static int iommu_insert_resv_region(struct iommu_resv_region *new,
struct list_head *regions)
{
struct iommu_resv_region *iter, *tmp, *nr, *top;
LIST_HEAD(stack);
nr = iommu_alloc_resv_region(new->start, new->length,
new->prot, new->type, GFP_KERNEL);
if (!nr)
return -ENOMEM;
/* First add the new element based on start address sorting */
list_for_each_entry(iter, regions, list) {
if (nr->start < iter->start ||
(nr->start == iter->start && nr->type <= iter->type))
break;
}
list_add_tail(&nr->list, &iter->list);
/* Merge overlapping segments of type nr->type in @regions, if any */
list_for_each_entry_safe(iter, tmp, regions, list) {
phys_addr_t top_end, iter_end = iter->start + iter->length - 1;
/* no merge needed on elements of different types than @new */
if (iter->type != new->type) {
list_move_tail(&iter->list, &stack);
continue;
}
/* look for the last stack element of same type as @iter */
list_for_each_entry_reverse(top, &stack, list)
if (top->type == iter->type)
goto check_overlap;
list_move_tail(&iter->list, &stack);
continue;
check_overlap:
top_end = top->start + top->length - 1;
if (iter->start > top_end + 1) {
list_move_tail(&iter->list, &stack);
} else {
top->length = max(top_end, iter_end) - top->start + 1;
list_del(&iter->list);
kfree(iter);
}
}
list_splice(&stack, regions);
return 0;
}
static int
iommu_insert_device_resv_regions(struct list_head *dev_resv_regions,
struct list_head *group_resv_regions)
{
struct iommu_resv_region *entry;
int ret = 0;
list_for_each_entry(entry, dev_resv_regions, list) {
ret = iommu_insert_resv_region(entry, group_resv_regions);
if (ret)
break;
}
return ret;
}
int iommu_get_group_resv_regions(struct iommu_group *group,
struct list_head *head)
{
struct group_device *device;
int ret = 0;
mutex_lock(&group->mutex);
for_each_group_device(group, device) {
struct list_head dev_resv_regions;
/*
* Non-API groups still expose reserved_regions in sysfs,
* so filter out calls that get here that way.
*/
if (!dev_has_iommu(device->dev))
break;
INIT_LIST_HEAD(&dev_resv_regions);
iommu_get_resv_regions(device->dev, &dev_resv_regions);
ret = iommu_insert_device_resv_regions(&dev_resv_regions, head);
iommu_put_resv_regions(device->dev, &dev_resv_regions);
if (ret)
break;
}
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_get_group_resv_regions);
static ssize_t iommu_group_show_resv_regions(struct iommu_group *group,
char *buf)
{
struct iommu_resv_region *region, *next;
struct list_head group_resv_regions;
int offset = 0;
INIT_LIST_HEAD(&group_resv_regions);
iommu_get_group_resv_regions(group, &group_resv_regions);
list_for_each_entry_safe(region, next, &group_resv_regions, list) {
offset += sysfs_emit_at(buf, offset, "0x%016llx 0x%016llx %s\n",
(long long)region->start,
(long long)(region->start +
region->length - 1),
iommu_group_resv_type_string[region->type]);
kfree(region);
}
return offset;
}
static ssize_t iommu_group_show_type(struct iommu_group *group,
char *buf)
{
char *type = "unknown";
mutex_lock(&group->mutex);
if (group->default_domain) {
switch (group->default_domain->type) {
case IOMMU_DOMAIN_BLOCKED:
type = "blocked";
break;
case IOMMU_DOMAIN_IDENTITY:
type = "identity";
break;
case IOMMU_DOMAIN_UNMANAGED:
type = "unmanaged";
break;
case IOMMU_DOMAIN_DMA:
type = "DMA";
break;
case IOMMU_DOMAIN_DMA_FQ:
type = "DMA-FQ";
break;
}
}
mutex_unlock(&group->mutex);
return sysfs_emit(buf, "%s\n", type);
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
static IOMMU_GROUP_ATTR(reserved_regions, 0444,
iommu_group_show_resv_regions, NULL);
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type,
iommu_group_store_type);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
static void iommu_group_release(struct kobject *kobj)
{
struct iommu_group *group = to_iommu_group(kobj);
pr_debug("Releasing group %d\n", group->id);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
if (group->iommu_data_release)
group->iommu_data_release(group->iommu_data);
ida_free(&iommu_group_ida, group->id);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
iommu: Add iommu_init/deinit_device() paired functions Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-05 21:59:43 -03:00
/* Domains are free'd by iommu_deinit_device() */
WARN_ON(group->default_domain);
WARN_ON(group->blocking_domain);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
kfree(group->name);
kfree(group);
}
static const struct kobj_type iommu_group_ktype = {
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
.sysfs_ops = &iommu_group_sysfs_ops,
.release = iommu_group_release,
};
/**
* iommu_group_alloc - Allocate a new group
*
* This function is called by an iommu driver to allocate a new iommu
* group. The iommu group represents the minimum granularity of the iommu.
* Upon successful return, the caller holds a reference to the supplied
* group in order to hold the group until devices are added. Use
* iommu_group_put() to release this extra reference count, allowing the
* group to be automatically reclaimed once it has no devices or external
* references.
*/
struct iommu_group *iommu_group_alloc(void)
{
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
struct iommu_group *group;
int ret;
group = kzalloc(sizeof(*group), GFP_KERNEL);
if (!group)
return ERR_PTR(-ENOMEM);
group->kobj.kset = iommu_group_kset;
mutex_init(&group->mutex);
INIT_LIST_HEAD(&group->devices);
INIT_LIST_HEAD(&group->entry);
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
xa_init(&group->pasid_array);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
ret = ida_alloc(&iommu_group_ida, GFP_KERNEL);
if (ret < 0) {
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
kfree(group);
return ERR_PTR(ret);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
}
group->id = ret;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
NULL, "%d", group->id);
if (ret) {
kobject_put(&group->kobj);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
return ERR_PTR(ret);
}
group->devices_kobj = kobject_create_and_add("devices", &group->kobj);
if (!group->devices_kobj) {
kobject_put(&group->kobj); /* triggers .release & free */
return ERR_PTR(-ENOMEM);
}
/*
* The devices_kobj holds a reference on the group kobject, so
* as long as that exists so will the group. We can therefore
* use the devices_kobj for reference counting.
*/
kobject_put(&group->kobj);
ret = iommu_group_create_file(group,
&iommu_group_attr_reserved_regions);
if (ret) {
kobject_put(group->devices_kobj);
return ERR_PTR(ret);
}
ret = iommu_group_create_file(group, &iommu_group_attr_type);
if (ret) {
kobject_put(group->devices_kobj);
return ERR_PTR(ret);
}
pr_debug("Allocated group %d\n", group->id);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
return group;
}
EXPORT_SYMBOL_GPL(iommu_group_alloc);
/**
* iommu_group_get_iommudata - retrieve iommu_data registered for a group
* @group: the group
*
* iommu drivers can store data in the group for use when doing iommu
* operations. This function provides a way to retrieve it. Caller
* should hold a group reference.
*/
void *iommu_group_get_iommudata(struct iommu_group *group)
{
return group->iommu_data;
}
EXPORT_SYMBOL_GPL(iommu_group_get_iommudata);
/**
* iommu_group_set_iommudata - set iommu_data for a group
* @group: the group
* @iommu_data: new data
* @release: release function for iommu_data
*
* iommu drivers can store data in the group for use when doing iommu
* operations. This function provides a way to set the data after
* the group has been allocated. Caller should hold a group reference.
*/
void iommu_group_set_iommudata(struct iommu_group *group, void *iommu_data,
void (*release)(void *iommu_data))
{
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
group->iommu_data = iommu_data;
group->iommu_data_release = release;
}
EXPORT_SYMBOL_GPL(iommu_group_set_iommudata);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
/**
* iommu_group_set_name - set name for a group
* @group: the group
* @name: name
*
* Allow iommu driver to set a name for a group. When set it will
* appear in a name attribute file under the group in sysfs.
*/
int iommu_group_set_name(struct iommu_group *group, const char *name)
{
int ret;
if (group->name) {
iommu_group_remove_file(group, &iommu_group_attr_name);
kfree(group->name);
group->name = NULL;
if (!name)
return 0;
}
group->name = kstrdup(name, GFP_KERNEL);
if (!group->name)
return -ENOMEM;
ret = iommu_group_create_file(group, &iommu_group_attr_name);
if (ret) {
kfree(group->name);
group->name = NULL;
return ret;
}
return 0;
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
EXPORT_SYMBOL_GPL(iommu_group_set_name);
static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
struct device *dev)
{
struct iommu_resv_region *entry;
struct list_head mappings;
unsigned long pg_size;
int ret = 0;
iommu: Prevent RESV_DIRECT devices from blocking domains The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 20:48:02 +08:00
pg_size = domain->pgsize_bitmap ? 1UL << __ffs(domain->pgsize_bitmap) : 0;
INIT_LIST_HEAD(&mappings);
iommu: Prevent RESV_DIRECT devices from blocking domains The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 20:48:02 +08:00
if (WARN_ON_ONCE(iommu_is_dma_domain(domain) && !pg_size))
return -EINVAL;
iommu_get_resv_regions(dev, &mappings);
/* We need to consider overlapping regions for different devices */
list_for_each_entry(entry, &mappings, list) {
dma_addr_t start, end, addr;
size_t map_size = 0;
iommu: Prevent RESV_DIRECT devices from blocking domains The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 20:48:02 +08:00
if (entry->type == IOMMU_RESV_DIRECT)
dev->iommu->require_direct = 1;
iommu: Prevent RESV_DIRECT devices from blocking domains The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 20:48:02 +08:00
if ((entry->type != IOMMU_RESV_DIRECT &&
entry->type != IOMMU_RESV_DIRECT_RELAXABLE) ||
!iommu_is_dma_domain(domain))
continue;
iommu: Prevent RESV_DIRECT devices from blocking domains The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 20:48:02 +08:00
start = ALIGN(entry->start, pg_size);
end = ALIGN(entry->start + entry->length, pg_size);
for (addr = start; addr <= end; addr += pg_size) {
phys_addr_t phys_addr;
if (addr == end)
goto map_end;
phys_addr = iommu_iova_to_phys(domain, addr);
if (!phys_addr) {
map_size += pg_size;
continue;
}
map_end:
if (map_size) {
ret = iommu_map(domain, addr - map_size,
addr - map_size, map_size,
entry->prot, GFP_KERNEL);
if (ret)
goto out;
map_size = 0;
}
}
}
out:
iommu_put_resv_regions(dev, &mappings);
return ret;
}
/* This is undone by __iommu_group_free_device() */
static struct group_device *iommu_group_alloc_device(struct iommu_group *group,
struct device *dev)
{
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
int ret, i = 0;
struct group_device *device;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
device = kzalloc(sizeof(*device), GFP_KERNEL);
if (!device)
return ERR_PTR(-ENOMEM);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
device->dev = dev;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
if (ret)
goto err_free_device;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
device->name = kasprintf(GFP_KERNEL, "%s", kobject_name(&dev->kobj));
rename:
if (!device->name) {
ret = -ENOMEM;
goto err_remove_link;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
ret = sysfs_create_link_nowarn(group->devices_kobj,
&dev->kobj, device->name);
if (ret) {
if (ret == -EEXIST && i >= 0) {
/*
* Account for the slim chance of collision
* and append an instance to the name.
*/
kfree(device->name);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
device->name = kasprintf(GFP_KERNEL, "%s.%d",
kobject_name(&dev->kobj), i++);
goto rename;
}
goto err_free_name;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
}
trace_add_device_to_group(group->id, dev);
dev_info(dev, "Adding to iommu group %d\n", group->id);
return device;
err_free_name:
kfree(device->name);
err_remove_link:
sysfs_remove_link(&dev->kobj, "iommu_group");
err_free_device:
kfree(device);
dev_err(dev, "Failed to add to iommu group %d: %d\n", group->id, ret);
return ERR_PTR(ret);
}
/**
* iommu_group_add_device - add a device to an iommu group
* @group: the group into which to add the device (reference should be held)
* @dev: the device
*
* This function is called by an iommu driver to add a device into a
* group. Adding a device increments the group reference count.
*/
int iommu_group_add_device(struct iommu_group *group, struct device *dev)
{
struct group_device *gdev;
gdev = iommu_group_alloc_device(group, dev);
if (IS_ERR(gdev))
return PTR_ERR(gdev);
iommu_group_ref_get(group);
dev->iommu_group = group;
mutex_lock(&group->mutex);
list_add_tail(&gdev->list, &group->devices);
mutex_unlock(&group->mutex);
return 0;
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
EXPORT_SYMBOL_GPL(iommu_group_add_device);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
/**
* iommu_group_remove_device - remove a device from it's current group
* @dev: device to be removed
*
* This function is called by an iommu driver to remove the device from
* it's current group. This decrements the iommu group reference count.
*/
void iommu_group_remove_device(struct device *dev)
{
struct iommu_group *group = dev->iommu_group;
if (!group)
return;
dev_info(dev, "Removing from iommu group %d\n", group->id);
__iommu_group_remove_device(dev);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
}
EXPORT_SYMBOL_GPL(iommu_group_remove_device);
#if IS_ENABLED(CONFIG_LOCKDEP) && IS_ENABLED(CONFIG_IOMMU_API)
/**
* iommu_group_mutex_assert - Check device group mutex lock
* @dev: the device that has group param set
*
* This function is called by an iommu driver to check whether it holds
* group mutex lock for the given device or not.
*
* Note that this function must be called after device group param is set.
*/
void iommu_group_mutex_assert(struct device *dev)
{
struct iommu_group *group = dev->iommu_group;
lockdep_assert_held(&group->mutex);
}
EXPORT_SYMBOL_GPL(iommu_group_mutex_assert);
#endif
static struct device *iommu_group_first_dev(struct iommu_group *group)
{
lockdep_assert_held(&group->mutex);
return list_first_entry(&group->devices, struct group_device, list)->dev;
}
/**
* iommu_group_for_each_dev - iterate over each device in the group
* @group: the group
* @data: caller opaque data to be passed to callback function
* @fn: caller supplied callback function
*
* This function is called by group users to iterate over group devices.
* Callers should hold a reference count to the group during callback.
* The group->mutex is held across callbacks, which will block calls to
* iommu_group_add/remove_device.
*/
int iommu_group_for_each_dev(struct iommu_group *group, void *data,
int (*fn)(struct device *, void *))
{
struct group_device *device;
int ret = 0;
mutex_lock(&group->mutex);
for_each_group_device(group, device) {
ret = fn(device->dev, data);
if (ret)
break;
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
mutex_unlock(&group->mutex);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
return ret;
}
EXPORT_SYMBOL_GPL(iommu_group_for_each_dev);
/**
* iommu_group_get - Return the group for a device and increment reference
* @dev: get the group that this device belongs to
*
* This function is called by iommu drivers and users to get the group
* for the specified device. If found, the group is returned and the group
* reference in incremented, else NULL.
*/
struct iommu_group *iommu_group_get(struct device *dev)
{
struct iommu_group *group = dev->iommu_group;
if (group)
kobject_get(group->devices_kobj);
return group;
}
EXPORT_SYMBOL_GPL(iommu_group_get);
/**
* iommu_group_ref_get - Increment reference on a group
* @group: the group to use, must not be NULL
*
* This function is called by iommu drivers to take additional references on an
* existing group. Returns the given group for convenience.
*/
struct iommu_group *iommu_group_ref_get(struct iommu_group *group)
{
kobject_get(group->devices_kobj);
return group;
}
EXPORT_SYMBOL_GPL(iommu_group_ref_get);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
/**
* iommu_group_put - Decrement group reference
* @group: the group to use
*
* This function is called by iommu drivers and users to release the
* iommu group. Once the reference count is zero, the group is released.
*/
void iommu_group_put(struct iommu_group *group)
{
if (group)
kobject_put(group->devices_kobj);
}
EXPORT_SYMBOL_GPL(iommu_group_put);
/**
* iommu_group_id - Return ID for a group
* @group: the group to ID
*
* Return the unique ID for the group matching the sysfs group number.
*/
int iommu_group_id(struct iommu_group *group)
{
return group->id;
}
EXPORT_SYMBOL_GPL(iommu_group_id);
static struct iommu_group *get_pci_alias_group(struct pci_dev *pdev,
unsigned long *devfns);
/*
* To consider a PCI device isolated, we require ACS to support Source
* Validation, Request Redirection, Completer Redirection, and Upstream
* Forwarding. This effectively means that devices cannot spoof their
* requester ID, requests and completions cannot be redirected, and all
* transactions are forwarded upstream, even as it passes through a
* bridge where the target device is downstream.
*/
#define REQ_ACS_FLAGS (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
/*
* For multifunction devices which are not isolated from each other, find
* all the other non-isolated functions and look for existing groups. For
* each function, we also need to look for aliases to or from other devices
* that may already have a group.
*/
static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev,
unsigned long *devfns)
{
struct pci_dev *tmp = NULL;
struct iommu_group *group;
if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))
return NULL;
for_each_pci_dev(tmp) {
if (tmp == pdev || tmp->bus != pdev->bus ||
PCI_SLOT(tmp->devfn) != PCI_SLOT(pdev->devfn) ||
pci_acs_enabled(tmp, REQ_ACS_FLAGS))
continue;
group = get_pci_alias_group(tmp, devfns);
if (group) {
pci_dev_put(tmp);
return group;
}
}
return NULL;
}
/*
PCI: Add support for multiple DMA aliases Solve IOMMU support issues with PCIe non-transparent bridges that use Requester ID look-up tables (RID-LUT), e.g., the PEX8733. The NTB connects devices in two independent PCI domains. Devices separated by the NTB are not able to discover each other. A PCI packet being forwared from one domain to another has to have its RID modified so it appears on correct bus and completions are forwarded back to the original domain through the NTB. The RID is translated using a preprogrammed table (LUT) and the PCI packet propagates upstream away from the NTB. If the destination system has IOMMU enabled, the packet will be discarded because the new RID is unknown to the IOMMU. Adding a DMA alias for the new RID allows IOMMU to properly recognize the packet. Each device behind the NTB has a unique RID assigned in the RID-LUT. The current DMA alias implementation supports only a single alias, so it's not possible to support mutiple devices behind the NTB when IOMMU is enabled. Enable all possible aliases on a given bus (256) that are stored in a bitset. Alias devfn is directly translated to a bit number. The bitset is not allocated for devices that have no need for DMA aliases. More details can be found in the following article: http://www.plxtech.com/files/pdf/technical/expresslane/RTC_Enabling%20MulitHostSystemDesigns.pdf Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Joerg Roedel <jroedel@suse.de>
2016-03-03 15:38:02 +01:00
* Look for aliases to or from the given device for existing groups. DMA
* aliases are only supported on the same bus, therefore the search
* space is quite small (especially since we're really only looking at pcie
* device, and therefore only expect multiple slots on the root complex or
* downstream switch ports). It's conceivable though that a pair of
* multifunction devices could have aliases between them that would cause a
* loop. To prevent this, we use a bitmap to track where we've been.
*/
static struct iommu_group *get_pci_alias_group(struct pci_dev *pdev,
unsigned long *devfns)
{
struct pci_dev *tmp = NULL;
struct iommu_group *group;
if (test_and_set_bit(pdev->devfn & 0xff, devfns))
return NULL;
group = iommu_group_get(&pdev->dev);
if (group)
return group;
for_each_pci_dev(tmp) {
if (tmp == pdev || tmp->bus != pdev->bus)
continue;
/* We alias them or they alias us */
PCI: Add support for multiple DMA aliases Solve IOMMU support issues with PCIe non-transparent bridges that use Requester ID look-up tables (RID-LUT), e.g., the PEX8733. The NTB connects devices in two independent PCI domains. Devices separated by the NTB are not able to discover each other. A PCI packet being forwared from one domain to another has to have its RID modified so it appears on correct bus and completions are forwarded back to the original domain through the NTB. The RID is translated using a preprogrammed table (LUT) and the PCI packet propagates upstream away from the NTB. If the destination system has IOMMU enabled, the packet will be discarded because the new RID is unknown to the IOMMU. Adding a DMA alias for the new RID allows IOMMU to properly recognize the packet. Each device behind the NTB has a unique RID assigned in the RID-LUT. The current DMA alias implementation supports only a single alias, so it's not possible to support mutiple devices behind the NTB when IOMMU is enabled. Enable all possible aliases on a given bus (256) that are stored in a bitset. Alias devfn is directly translated to a bit number. The bitset is not allocated for devices that have no need for DMA aliases. More details can be found in the following article: http://www.plxtech.com/files/pdf/technical/expresslane/RTC_Enabling%20MulitHostSystemDesigns.pdf Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Joerg Roedel <jroedel@suse.de>
2016-03-03 15:38:02 +01:00
if (pci_devs_are_dma_aliases(pdev, tmp)) {
group = get_pci_alias_group(tmp, devfns);
if (group) {
pci_dev_put(tmp);
return group;
}
group = get_pci_function_alias_group(tmp, devfns);
if (group) {
pci_dev_put(tmp);
return group;
}
}
}
return NULL;
}
struct group_for_pci_data {
struct pci_dev *pdev;
struct iommu_group *group;
};
/*
* DMA alias iterator callback, return the last seen device. Stop and return
* the IOMMU group if we find one along the way.
*/
static int get_pci_alias_or_group(struct pci_dev *pdev, u16 alias, void *opaque)
{
struct group_for_pci_data *data = opaque;
data->pdev = pdev;
data->group = iommu_group_get(&pdev->dev);
return data->group != NULL;
}
/*
* Generic device_group call-back function. It just allocates one
* iommu-group per device.
*/
struct iommu_group *generic_device_group(struct device *dev)
{
return iommu_group_alloc();
}
EXPORT_SYMBOL_GPL(generic_device_group);
/*
* Generic device_group call-back function. It just allocates one
* iommu-group per iommu driver instance shared by every device
* probed by that iommu driver.
*/
struct iommu_group *generic_single_device_group(struct device *dev)
{
struct iommu_device *iommu = dev->iommu->iommu_dev;
if (!iommu->singleton_group) {
struct iommu_group *group;
group = iommu_group_alloc();
if (IS_ERR(group))
return group;
iommu->singleton_group = group;
}
return iommu_group_ref_get(iommu->singleton_group);
}
EXPORT_SYMBOL_GPL(generic_single_device_group);
/*
* Use standard PCI bus topology, isolation features, and DMA alias quirks
* to find or create an IOMMU group for a device.
*/
struct iommu_group *pci_device_group(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
struct group_for_pci_data data;
struct pci_bus *bus;
struct iommu_group *group = NULL;
u64 devfns[4] = { 0 };
if (WARN_ON(!dev_is_pci(dev)))
return ERR_PTR(-EINVAL);
/*
* Find the upstream DMA alias for the device. A device must not
* be aliased due to topology in order to have its own IOMMU group.
* If we find an alias along the way that already belongs to a
* group, use it.
*/
if (pci_for_each_dma_alias(pdev, get_pci_alias_or_group, &data))
return data.group;
pdev = data.pdev;
/*
* Continue upstream from the point of minimum IOMMU granularity
* due to aliases to the point where devices are protected from
* peer-to-peer DMA by PCI ACS. Again, if we find an existing
* group, use it.
*/
for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
if (!bus->self)
continue;
if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
break;
pdev = bus->self;
group = iommu_group_get(&pdev->dev);
if (group)
return group;
}
/*
* Look for existing groups on device aliases. If we alias another
* device or another device aliases us, use the same group.
*/
group = get_pci_alias_group(pdev, (unsigned long *)devfns);
if (group)
return group;
/*
* Look for existing groups on non-isolated functions on the same
* slot and aliases of those funcions, if any. No need to clear
* the search bitmap, the tested devfns are still valid.
*/
group = get_pci_function_alias_group(pdev, (unsigned long *)devfns);
if (group)
return group;
/* No shared group found, allocate new */
return iommu_group_alloc();
}
EXPORT_SYMBOL_GPL(pci_device_group);
/* Get the IOMMU group for device on fsl-mc bus */
struct iommu_group *fsl_mc_device_group(struct device *dev)
{
struct device *cont_dev = fsl_mc_cont_dev(dev);
struct iommu_group *group;
group = iommu_group_get(cont_dev);
if (!group)
group = iommu_group_alloc();
return group;
}
EXPORT_SYMBOL_GPL(fsl_mc_device_group);
static struct iommu_domain *__iommu_alloc_identity_domain(struct device *dev)
{
const struct iommu_ops *ops = dev_iommu_ops(dev);
struct iommu_domain *domain;
if (ops->identity_domain)
return ops->identity_domain;
if (ops->domain_alloc_identity) {
domain = ops->domain_alloc_identity(dev);
if (IS_ERR(domain))
return domain;
} else {
return ERR_PTR(-EOPNOTSUPP);
}
iommu_domain_init(domain, IOMMU_DOMAIN_IDENTITY, ops);
return domain;
}
static struct iommu_domain *
__iommu_group_alloc_default_domain(struct iommu_group *group, int req_type)
{
struct device *dev = iommu_group_first_dev(group);
struct iommu_domain *dom;
if (group->default_domain && group->default_domain->type == req_type)
return group->default_domain;
/*
* When allocating the DMA API domain assume that the driver is going to
* use PASID and make sure the RID's domain is PASID compatible.
*/
if (req_type & __IOMMU_DOMAIN_PAGING) {
dom = __iommu_paging_domain_alloc_flags(dev, req_type,
dev->iommu->max_pasids ? IOMMU_HWPT_ALLOC_PASID : 0);
/*
* If driver does not support PASID feature then
* try to allocate non-PASID domain
*/
if (PTR_ERR(dom) == -EOPNOTSUPP)
dom = __iommu_paging_domain_alloc_flags(dev, req_type, 0);
return dom;
}
if (req_type == IOMMU_DOMAIN_IDENTITY)
return __iommu_alloc_identity_domain(dev);
return ERR_PTR(-EINVAL);
}
/*
* req_type of 0 means "auto" which means to select a domain based on
* iommu_def_domain_type or what the driver actually supports.
*/
static struct iommu_domain *
iommu_group_alloc_default_domain(struct iommu_group *group, int req_type)
{
const struct iommu_ops *ops = dev_iommu_ops(iommu_group_first_dev(group));
struct iommu_domain *dom;
lockdep_assert_held(&group->mutex);
/*
* Allow legacy drivers to specify the domain that will be the default
* domain. This should always be either an IDENTITY/BLOCKED/PLATFORM
* domain. Do not use in new drivers.
*/
if (ops->default_domain) {
iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMA The ops->default_domain flow used a 0 req_type to select the default domain and this was enforced by iommu_group_alloc_default_domain(). When !CONFIG_IOMMU_DMA started forcing the old ARM32 drivers into IDENTITY it also overroad the 0 req_type of the ops->default_domain drivers to IDENTITY which ends up causing failures during device probe. Make iommu_group_alloc_default_domain() accept a req_type that matches the ops->default_domain and have iommu_group_alloc_default_domain() generate a req_type that matches the default_domain. This way the req_type always describes what kind of domain should be attached and ops->default_domain overrides all other mechanisms to choose the default domain. Fixes: 2ad56efa80db ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops") Fixes: 0f6a90436a57 ("iommu: Do not use IOMMU_DOMAIN_DMA if CONFIG_IOMMU_DMA is not enabled") Reported-by: Ovidiu Panait <ovidiu.panait@windriver.com> Closes: https://lore.kernel.org/linux-iommu/20240123165829.630276-1-ovidiu.panait@windriver.com/ Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Closes: https://lore.kernel.org/linux-iommu/170618452753.3805.4425669653666211728.stgit@ltcd48-lp2.aus.stglab.ibm.com/ Tested-by: Ovidiu Panait <ovidiu.panait@windriver.com> Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-755bd21c4a64+525b8-iommu_def_dom_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-01-30 12:12:53 -04:00
if (req_type != ops->default_domain->type)
return ERR_PTR(-EINVAL);
return ops->default_domain;
}
if (req_type)
return __iommu_group_alloc_default_domain(group, req_type);
/* The driver gave no guidance on what type to use, try the default */
dom = __iommu_group_alloc_default_domain(group, iommu_def_domain_type);
if (!IS_ERR(dom))
return dom;
/* Otherwise IDENTITY and DMA_FQ defaults will try DMA */
if (iommu_def_domain_type == IOMMU_DOMAIN_DMA)
return ERR_PTR(-EINVAL);
dom = __iommu_group_alloc_default_domain(group, IOMMU_DOMAIN_DMA);
if (IS_ERR(dom))
return dom;
pr_warn("Failed to allocate default IOMMU domain of type %u for group %s - Falling back to IOMMU_DOMAIN_DMA",
iommu_def_domain_type, group->name);
return dom;
}
struct iommu_domain *iommu_group_default_domain(struct iommu_group *group)
{
return group->default_domain;
}
static int probe_iommu_group(struct device *dev, void *data)
{
struct list_head *group_list = data;
int ret;
mutex_lock(&iommu_probe_device_lock);
ret = __iommu_probe_device(dev, group_list);
mutex_unlock(&iommu_probe_device_lock);
if (ret == -ENODEV)
ret = 0;
return ret;
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
static int iommu_bus_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
struct device *dev = data;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
if (action == BUS_NOTIFY_ADD_DEVICE) {
int ret;
ret = iommu_probe_device(dev);
return (ret) ? NOTIFY_DONE : NOTIFY_OK;
} else if (action == BUS_NOTIFY_REMOVED_DEVICE) {
iommu_release_device(dev);
return NOTIFY_OK;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
}
return 0;
}
/*
* Combine the driver's chosen def_domain_type across all the devices in a
* group. Drivers must give a consistent result.
*/
static int iommu_get_def_domain_type(struct iommu_group *group,
struct device *dev, int cur_type)
{
const struct iommu_ops *ops = dev_iommu_ops(dev);
int type;
iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMA The ops->default_domain flow used a 0 req_type to select the default domain and this was enforced by iommu_group_alloc_default_domain(). When !CONFIG_IOMMU_DMA started forcing the old ARM32 drivers into IDENTITY it also overroad the 0 req_type of the ops->default_domain drivers to IDENTITY which ends up causing failures during device probe. Make iommu_group_alloc_default_domain() accept a req_type that matches the ops->default_domain and have iommu_group_alloc_default_domain() generate a req_type that matches the default_domain. This way the req_type always describes what kind of domain should be attached and ops->default_domain overrides all other mechanisms to choose the default domain. Fixes: 2ad56efa80db ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops") Fixes: 0f6a90436a57 ("iommu: Do not use IOMMU_DOMAIN_DMA if CONFIG_IOMMU_DMA is not enabled") Reported-by: Ovidiu Panait <ovidiu.panait@windriver.com> Closes: https://lore.kernel.org/linux-iommu/20240123165829.630276-1-ovidiu.panait@windriver.com/ Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Closes: https://lore.kernel.org/linux-iommu/170618452753.3805.4425669653666211728.stgit@ltcd48-lp2.aus.stglab.ibm.com/ Tested-by: Ovidiu Panait <ovidiu.panait@windriver.com> Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-755bd21c4a64+525b8-iommu_def_dom_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-01-30 12:12:53 -04:00
if (ops->default_domain) {
/*
* Drivers that declare a global static default_domain will
* always choose that.
*/
type = ops->default_domain->type;
} else {
if (ops->def_domain_type)
type = ops->def_domain_type(dev);
else
return cur_type;
}
if (!type || cur_type == type)
return cur_type;
if (!cur_type)
return type;
dev_err_ratelimited(
dev,
"IOMMU driver error, requesting conflicting def_domain_type, %s and %s, for devices in group %u.\n",
iommu_domain_type_str(cur_type), iommu_domain_type_str(type),
group->id);
/*
* Try to recover, drivers are allowed to force IDENTITY or DMA, IDENTITY
* takes precedence.
*/
if (type == IOMMU_DOMAIN_IDENTITY)
return type;
return cur_type;
}
/*
* A target_type of 0 will select the best domain type. 0 can be returned in
* this case meaning the global default should be used.
*/
static int iommu_get_default_domain_type(struct iommu_group *group,
int target_type)
{
struct device *untrusted = NULL;
struct group_device *gdev;
int driver_type = 0;
lockdep_assert_held(&group->mutex);
iommu: Allow an IDENTITY domain as the default_domain in ARM32 Even though dma-iommu.c and CONFIG_ARM_DMA_USE_IOMMU do approximately the same stuff, the way they relate to the IOMMU core is quiet different. dma-iommu.c expects the core code to setup an UNMANAGED domain (of type IOMMU_DOMAIN_DMA) and then configures itself to use that domain. This becomes the default_domain for the group. ARM_DMA_USE_IOMMU does not use the default_domain, instead it directly allocates an UNMANAGED domain and operates it just like an external driver. In this case group->default_domain is NULL. If the driver provides a global static identity_domain then automatically use it as the default_domain when in ARM_DMA_USE_IOMMU mode. This allows drivers that implemented default_domain == NULL as an IDENTITY translation to trivially get a properly labeled non-NULL default_domain on ARM32 configs. With this arrangment when ARM_DMA_USE_IOMMU wants to disconnect from the device the normal detach_domain flow will restore the IDENTITY domain as the default domain. Overall this makes attach_dev() of the IDENTITY domain called in the same places as detach_dev(). This effectively migrates these drivers to default_domain mode. For drivers that support ARM64 they will gain support for the IDENTITY translation mode for the dma_api and behave in a uniform way. Drivers use this by setting ops->identity_domain to a static singleton iommu_domain that implements the identity attach. If the core detects ARM_DMA_USE_IOMMU mode then it automatically attaches the IDENTITY domain during probe. Drivers can continue to prevent the use of DMA translation by returning IOMMU_DOMAIN_IDENTITY from def_domain_type, this will completely prevent IOMMU_DMA from running but will not impact ARM_DMA_USE_IOMMU. This allows removing the set_platform_dma_ops() from every remaining driver. Remove the set_platform_dma_ops from rockchip and mkt_v1 as all it does is set an existing global static identity domain. mkt_v1 does not support IOMMU_DOMAIN_DMA and it does not compile on ARM64 so this transformation is safe. Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-13 10:43:42 -03:00
/*
* ARM32 drivers supporting CONFIG_ARM_DMA_USE_IOMMU can declare an
* identity_domain and it will automatically become their default
* domain. Later on ARM_DMA_USE_IOMMU will install its UNMANAGED domain.
iommu: Require a default_domain for all iommu drivers At this point every iommu driver will cause a default_domain to be selected, so we can finally remove this gap from the core code. The following table explains what each driver supports and what the resulting default_domain will be: ops->defaut_domain IDENTITY DMA PLATFORM v ARM32 dma-iommu ARCH amd/iommu.c Y Y N/A either apple-dart.c Y Y N/A either arm-smmu.c Y Y IDENTITY either qcom_iommu.c G Y IDENTITY either arm-smmu-v3.c Y Y N/A either exynos-iommu.c G Y IDENTITY either fsl_pamu_domain.c Y Y N/A N/A PLATFORM intel/iommu.c Y Y N/A either ipmmu-vmsa.c G Y IDENTITY either msm_iommu.c G IDENTITY N/A mtk_iommu.c G Y IDENTITY either mtk_iommu_v1.c G IDENTITY N/A omap-iommu.c G IDENTITY N/A rockchip-iommu.c G Y IDENTITY either s390-iommu.c Y Y N/A N/A PLATFORM sprd-iommu.c Y N/A DMA sun50i-iommu.c G Y IDENTITY either tegra-smmu.c G Y IDENTITY IDENTITY virtio-iommu.c Y Y N/A either spapr Y Y N/A N/A PLATFORM * G means ops->identity_domain is used * N/A means the driver will not compile in this configuration ARM32 drivers select an IDENTITY default domain through either the ops->identity_domain or directly requesting an IDENTIY domain through alloc_domain(). In ARM64 mode tegra-smmu will still block the use of dma-iommu.c and forces an IDENTITY domain. S390 uses a PLATFORM domain to represent when the dma_ops are set to the s390 iommu code. fsl_pamu uses an PLATFORM domain. POWER SPAPR uses PLATFORM and blocking to enable its weird VFIO mode. The x86 drivers continue unchanged. After this patch group->default_domain is only NULL for a short period during bus iommu probing while all the groups are constituted. Otherwise it is always !NULL. This completes changing the iommu subsystem driver contract to a system where the current iommu_domain always represents some form of translation and the driver is continuously asserting a definable translation mode. It resolves the confusion that the original ops->detach_dev() caused around what translation, exactly, is the IOMMU performing after detach. There were at least three different answers to that question in the tree, they are all now clearly named with domain types. Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-13 10:43:53 -03:00
* Override the selection to IDENTITY.
iommu: Allow an IDENTITY domain as the default_domain in ARM32 Even though dma-iommu.c and CONFIG_ARM_DMA_USE_IOMMU do approximately the same stuff, the way they relate to the IOMMU core is quiet different. dma-iommu.c expects the core code to setup an UNMANAGED domain (of type IOMMU_DOMAIN_DMA) and then configures itself to use that domain. This becomes the default_domain for the group. ARM_DMA_USE_IOMMU does not use the default_domain, instead it directly allocates an UNMANAGED domain and operates it just like an external driver. In this case group->default_domain is NULL. If the driver provides a global static identity_domain then automatically use it as the default_domain when in ARM_DMA_USE_IOMMU mode. This allows drivers that implemented default_domain == NULL as an IDENTITY translation to trivially get a properly labeled non-NULL default_domain on ARM32 configs. With this arrangment when ARM_DMA_USE_IOMMU wants to disconnect from the device the normal detach_domain flow will restore the IDENTITY domain as the default domain. Overall this makes attach_dev() of the IDENTITY domain called in the same places as detach_dev(). This effectively migrates these drivers to default_domain mode. For drivers that support ARM64 they will gain support for the IDENTITY translation mode for the dma_api and behave in a uniform way. Drivers use this by setting ops->identity_domain to a static singleton iommu_domain that implements the identity attach. If the core detects ARM_DMA_USE_IOMMU mode then it automatically attaches the IDENTITY domain during probe. Drivers can continue to prevent the use of DMA translation by returning IOMMU_DOMAIN_IDENTITY from def_domain_type, this will completely prevent IOMMU_DMA from running but will not impact ARM_DMA_USE_IOMMU. This allows removing the set_platform_dma_ops() from every remaining driver. Remove the set_platform_dma_ops from rockchip and mkt_v1 as all it does is set an existing global static identity domain. mkt_v1 does not support IOMMU_DOMAIN_DMA and it does not compile on ARM64 so this transformation is safe. Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-13 10:43:42 -03:00
*/
iommu: Require a default_domain for all iommu drivers At this point every iommu driver will cause a default_domain to be selected, so we can finally remove this gap from the core code. The following table explains what each driver supports and what the resulting default_domain will be: ops->defaut_domain IDENTITY DMA PLATFORM v ARM32 dma-iommu ARCH amd/iommu.c Y Y N/A either apple-dart.c Y Y N/A either arm-smmu.c Y Y IDENTITY either qcom_iommu.c G Y IDENTITY either arm-smmu-v3.c Y Y N/A either exynos-iommu.c G Y IDENTITY either fsl_pamu_domain.c Y Y N/A N/A PLATFORM intel/iommu.c Y Y N/A either ipmmu-vmsa.c G Y IDENTITY either msm_iommu.c G IDENTITY N/A mtk_iommu.c G Y IDENTITY either mtk_iommu_v1.c G IDENTITY N/A omap-iommu.c G IDENTITY N/A rockchip-iommu.c G Y IDENTITY either s390-iommu.c Y Y N/A N/A PLATFORM sprd-iommu.c Y N/A DMA sun50i-iommu.c G Y IDENTITY either tegra-smmu.c G Y IDENTITY IDENTITY virtio-iommu.c Y Y N/A either spapr Y Y N/A N/A PLATFORM * G means ops->identity_domain is used * N/A means the driver will not compile in this configuration ARM32 drivers select an IDENTITY default domain through either the ops->identity_domain or directly requesting an IDENTIY domain through alloc_domain(). In ARM64 mode tegra-smmu will still block the use of dma-iommu.c and forces an IDENTITY domain. S390 uses a PLATFORM domain to represent when the dma_ops are set to the s390 iommu code. fsl_pamu uses an PLATFORM domain. POWER SPAPR uses PLATFORM and blocking to enable its weird VFIO mode. The x86 drivers continue unchanged. After this patch group->default_domain is only NULL for a short period during bus iommu probing while all the groups are constituted. Otherwise it is always !NULL. This completes changing the iommu subsystem driver contract to a system where the current iommu_domain always represents some form of translation and the driver is continuously asserting a definable translation mode. It resolves the confusion that the original ops->detach_dev() caused around what translation, exactly, is the IOMMU performing after detach. There were at least three different answers to that question in the tree, they are all now clearly named with domain types. Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-13 10:43:53 -03:00
if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)) {
static_assert(!(IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) &&
IS_ENABLED(CONFIG_IOMMU_DMA)));
iommu: Allow an IDENTITY domain as the default_domain in ARM32 Even though dma-iommu.c and CONFIG_ARM_DMA_USE_IOMMU do approximately the same stuff, the way they relate to the IOMMU core is quiet different. dma-iommu.c expects the core code to setup an UNMANAGED domain (of type IOMMU_DOMAIN_DMA) and then configures itself to use that domain. This becomes the default_domain for the group. ARM_DMA_USE_IOMMU does not use the default_domain, instead it directly allocates an UNMANAGED domain and operates it just like an external driver. In this case group->default_domain is NULL. If the driver provides a global static identity_domain then automatically use it as the default_domain when in ARM_DMA_USE_IOMMU mode. This allows drivers that implemented default_domain == NULL as an IDENTITY translation to trivially get a properly labeled non-NULL default_domain on ARM32 configs. With this arrangment when ARM_DMA_USE_IOMMU wants to disconnect from the device the normal detach_domain flow will restore the IDENTITY domain as the default domain. Overall this makes attach_dev() of the IDENTITY domain called in the same places as detach_dev(). This effectively migrates these drivers to default_domain mode. For drivers that support ARM64 they will gain support for the IDENTITY translation mode for the dma_api and behave in a uniform way. Drivers use this by setting ops->identity_domain to a static singleton iommu_domain that implements the identity attach. If the core detects ARM_DMA_USE_IOMMU mode then it automatically attaches the IDENTITY domain during probe. Drivers can continue to prevent the use of DMA translation by returning IOMMU_DOMAIN_IDENTITY from def_domain_type, this will completely prevent IOMMU_DMA from running but will not impact ARM_DMA_USE_IOMMU. This allows removing the set_platform_dma_ops() from every remaining driver. Remove the set_platform_dma_ops from rockchip and mkt_v1 as all it does is set an existing global static identity domain. mkt_v1 does not support IOMMU_DOMAIN_DMA and it does not compile on ARM64 so this transformation is safe. Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-13 10:43:42 -03:00
driver_type = IOMMU_DOMAIN_IDENTITY;
iommu: Require a default_domain for all iommu drivers At this point every iommu driver will cause a default_domain to be selected, so we can finally remove this gap from the core code. The following table explains what each driver supports and what the resulting default_domain will be: ops->defaut_domain IDENTITY DMA PLATFORM v ARM32 dma-iommu ARCH amd/iommu.c Y Y N/A either apple-dart.c Y Y N/A either arm-smmu.c Y Y IDENTITY either qcom_iommu.c G Y IDENTITY either arm-smmu-v3.c Y Y N/A either exynos-iommu.c G Y IDENTITY either fsl_pamu_domain.c Y Y N/A N/A PLATFORM intel/iommu.c Y Y N/A either ipmmu-vmsa.c G Y IDENTITY either msm_iommu.c G IDENTITY N/A mtk_iommu.c G Y IDENTITY either mtk_iommu_v1.c G IDENTITY N/A omap-iommu.c G IDENTITY N/A rockchip-iommu.c G Y IDENTITY either s390-iommu.c Y Y N/A N/A PLATFORM sprd-iommu.c Y N/A DMA sun50i-iommu.c G Y IDENTITY either tegra-smmu.c G Y IDENTITY IDENTITY virtio-iommu.c Y Y N/A either spapr Y Y N/A N/A PLATFORM * G means ops->identity_domain is used * N/A means the driver will not compile in this configuration ARM32 drivers select an IDENTITY default domain through either the ops->identity_domain or directly requesting an IDENTIY domain through alloc_domain(). In ARM64 mode tegra-smmu will still block the use of dma-iommu.c and forces an IDENTITY domain. S390 uses a PLATFORM domain to represent when the dma_ops are set to the s390 iommu code. fsl_pamu uses an PLATFORM domain. POWER SPAPR uses PLATFORM and blocking to enable its weird VFIO mode. The x86 drivers continue unchanged. After this patch group->default_domain is only NULL for a short period during bus iommu probing while all the groups are constituted. Otherwise it is always !NULL. This completes changing the iommu subsystem driver contract to a system where the current iommu_domain always represents some form of translation and the driver is continuously asserting a definable translation mode. It resolves the confusion that the original ops->detach_dev() caused around what translation, exactly, is the IOMMU performing after detach. There were at least three different answers to that question in the tree, they are all now clearly named with domain types. Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-13 10:43:53 -03:00
}
iommu: Allow an IDENTITY domain as the default_domain in ARM32 Even though dma-iommu.c and CONFIG_ARM_DMA_USE_IOMMU do approximately the same stuff, the way they relate to the IOMMU core is quiet different. dma-iommu.c expects the core code to setup an UNMANAGED domain (of type IOMMU_DOMAIN_DMA) and then configures itself to use that domain. This becomes the default_domain for the group. ARM_DMA_USE_IOMMU does not use the default_domain, instead it directly allocates an UNMANAGED domain and operates it just like an external driver. In this case group->default_domain is NULL. If the driver provides a global static identity_domain then automatically use it as the default_domain when in ARM_DMA_USE_IOMMU mode. This allows drivers that implemented default_domain == NULL as an IDENTITY translation to trivially get a properly labeled non-NULL default_domain on ARM32 configs. With this arrangment when ARM_DMA_USE_IOMMU wants to disconnect from the device the normal detach_domain flow will restore the IDENTITY domain as the default domain. Overall this makes attach_dev() of the IDENTITY domain called in the same places as detach_dev(). This effectively migrates these drivers to default_domain mode. For drivers that support ARM64 they will gain support for the IDENTITY translation mode for the dma_api and behave in a uniform way. Drivers use this by setting ops->identity_domain to a static singleton iommu_domain that implements the identity attach. If the core detects ARM_DMA_USE_IOMMU mode then it automatically attaches the IDENTITY domain during probe. Drivers can continue to prevent the use of DMA translation by returning IOMMU_DOMAIN_IDENTITY from def_domain_type, this will completely prevent IOMMU_DMA from running but will not impact ARM_DMA_USE_IOMMU. This allows removing the set_platform_dma_ops() from every remaining driver. Remove the set_platform_dma_ops from rockchip and mkt_v1 as all it does is set an existing global static identity domain. mkt_v1 does not support IOMMU_DOMAIN_DMA and it does not compile on ARM64 so this transformation is safe. Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-13 10:43:42 -03:00
for_each_group_device(group, gdev) {
driver_type = iommu_get_def_domain_type(group, gdev->dev,
driver_type);
iommu: Allow an IDENTITY domain as the default_domain in ARM32 Even though dma-iommu.c and CONFIG_ARM_DMA_USE_IOMMU do approximately the same stuff, the way they relate to the IOMMU core is quiet different. dma-iommu.c expects the core code to setup an UNMANAGED domain (of type IOMMU_DOMAIN_DMA) and then configures itself to use that domain. This becomes the default_domain for the group. ARM_DMA_USE_IOMMU does not use the default_domain, instead it directly allocates an UNMANAGED domain and operates it just like an external driver. In this case group->default_domain is NULL. If the driver provides a global static identity_domain then automatically use it as the default_domain when in ARM_DMA_USE_IOMMU mode. This allows drivers that implemented default_domain == NULL as an IDENTITY translation to trivially get a properly labeled non-NULL default_domain on ARM32 configs. With this arrangment when ARM_DMA_USE_IOMMU wants to disconnect from the device the normal detach_domain flow will restore the IDENTITY domain as the default domain. Overall this makes attach_dev() of the IDENTITY domain called in the same places as detach_dev(). This effectively migrates these drivers to default_domain mode. For drivers that support ARM64 they will gain support for the IDENTITY translation mode for the dma_api and behave in a uniform way. Drivers use this by setting ops->identity_domain to a static singleton iommu_domain that implements the identity attach. If the core detects ARM_DMA_USE_IOMMU mode then it automatically attaches the IDENTITY domain during probe. Drivers can continue to prevent the use of DMA translation by returning IOMMU_DOMAIN_IDENTITY from def_domain_type, this will completely prevent IOMMU_DMA from running but will not impact ARM_DMA_USE_IOMMU. This allows removing the set_platform_dma_ops() from every remaining driver. Remove the set_platform_dma_ops from rockchip and mkt_v1 as all it does is set an existing global static identity domain. mkt_v1 does not support IOMMU_DOMAIN_DMA and it does not compile on ARM64 so this transformation is safe. Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-13 10:43:42 -03:00
if (dev_is_pci(gdev->dev) && to_pci_dev(gdev->dev)->untrusted) {
/*
* No ARM32 using systems will set untrusted, it cannot
* work.
*/
if (WARN_ON(IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)))
return -1;
untrusted = gdev->dev;
iommu: Allow an IDENTITY domain as the default_domain in ARM32 Even though dma-iommu.c and CONFIG_ARM_DMA_USE_IOMMU do approximately the same stuff, the way they relate to the IOMMU core is quiet different. dma-iommu.c expects the core code to setup an UNMANAGED domain (of type IOMMU_DOMAIN_DMA) and then configures itself to use that domain. This becomes the default_domain for the group. ARM_DMA_USE_IOMMU does not use the default_domain, instead it directly allocates an UNMANAGED domain and operates it just like an external driver. In this case group->default_domain is NULL. If the driver provides a global static identity_domain then automatically use it as the default_domain when in ARM_DMA_USE_IOMMU mode. This allows drivers that implemented default_domain == NULL as an IDENTITY translation to trivially get a properly labeled non-NULL default_domain on ARM32 configs. With this arrangment when ARM_DMA_USE_IOMMU wants to disconnect from the device the normal detach_domain flow will restore the IDENTITY domain as the default domain. Overall this makes attach_dev() of the IDENTITY domain called in the same places as detach_dev(). This effectively migrates these drivers to default_domain mode. For drivers that support ARM64 they will gain support for the IDENTITY translation mode for the dma_api and behave in a uniform way. Drivers use this by setting ops->identity_domain to a static singleton iommu_domain that implements the identity attach. If the core detects ARM_DMA_USE_IOMMU mode then it automatically attaches the IDENTITY domain during probe. Drivers can continue to prevent the use of DMA translation by returning IOMMU_DOMAIN_IDENTITY from def_domain_type, this will completely prevent IOMMU_DMA from running but will not impact ARM_DMA_USE_IOMMU. This allows removing the set_platform_dma_ops() from every remaining driver. Remove the set_platform_dma_ops from rockchip and mkt_v1 as all it does is set an existing global static identity domain. mkt_v1 does not support IOMMU_DOMAIN_DMA and it does not compile on ARM64 so this transformation is safe. Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-13 10:43:42 -03:00
}
}
/*
* If the common dma ops are not selected in kconfig then we cannot use
* IOMMU_DOMAIN_DMA at all. Force IDENTITY if nothing else has been
* selected.
*/
if (!IS_ENABLED(CONFIG_IOMMU_DMA)) {
if (WARN_ON(driver_type == IOMMU_DOMAIN_DMA))
return -1;
if (!driver_type)
driver_type = IOMMU_DOMAIN_IDENTITY;
}
if (untrusted) {
if (driver_type && driver_type != IOMMU_DOMAIN_DMA) {
dev_err_ratelimited(
untrusted,
"Device is not trusted, but driver is overriding group %u to %s, refusing to probe.\n",
group->id, iommu_domain_type_str(driver_type));
return -1;
}
driver_type = IOMMU_DOMAIN_DMA;
}
if (target_type) {
if (driver_type && target_type != driver_type)
return -1;
return target_type;
}
return driver_type;
}
static void iommu_group_do_probe_finalize(struct device *dev)
{
const struct iommu_ops *ops = dev_iommu_ops(dev);
if (ops->probe_finalize)
ops->probe_finalize(dev);
}
static int bus_iommu_probe(const struct bus_type *bus)
{
struct iommu_group *group, *next;
LIST_HEAD(group_list);
int ret;
ret = bus_for_each_dev(bus, NULL, &group_list, probe_iommu_group);
if (ret)
return ret;
list_for_each_entry_safe(group, next, &group_list, entry) {
struct group_device *gdev;
mutex_lock(&group->mutex);
/* Remove item from the list */
list_del_init(&group->entry);
/*
* We go to the trouble of deferred default domain creation so
* that the cross-group default domain type and the setup of the
* IOMMU_RESV_DIRECT will work correctly in non-hotpug scenarios.
*/
ret = iommu_setup_default_domain(group, 0);
if (ret) {
mutex_unlock(&group->mutex);
return ret;
}
iommu/dma: Centralise iommu_setup_dma_ops() It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only ever call iommu_setup_dma_ops() after a successful iommu_probe_device(), which means there should be no harm in achieving the same order of operations by running it off the back of iommu_probe_device() itself. This then puts it in line with the x86 and s390 .probe_finalize bodges, letting us pull it all into the main flow properly. As a bonus this lets us fold in and de-scope the PCI workaround setup as well. At this point we can also then pull the call up inside the group mutex, and avoid having to think about whether iommu_group_store_type() could theoretically race and free the domain if iommu_setup_dma_ops() ran just *before* iommu_device_use_default_domain() claims it... Furthermore we replace one .probe_finalize call completely, since the only remaining implementations are now one which only needs to run once for the initial boot-time probe, and two which themselves render that path unreachable. This leaves us a big step closer to realistically being able to unpick the variety of different things that iommu_setup_dma_ops() has been muddling together, and further streamline iommu-dma into core API flows in future. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/bebea331c1d688b34d9862eefd5ede47503961b8.1713523152.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-19 17:54:45 +01:00
for_each_group_device(group, gdev)
iommu_setup_dma_ops(gdev->dev);
mutex_unlock(&group->mutex);
/*
* FIXME: Mis-locked because the ops->probe_finalize() call-back
* of some IOMMU drivers calls arm_iommu_attach_device() which
* in-turn might call back into IOMMU core code, where it tries
* to take group->mutex, resulting in a deadlock.
*/
for_each_group_device(group, gdev)
iommu_group_do_probe_finalize(gdev->dev);
}
return 0;
}
/**
* device_iommu_capable() - check for a general IOMMU capability
* @dev: device to which the capability would be relevant, if available
* @cap: IOMMU capability
*
* Return: true if an IOMMU is present and supports the given capability
* for the given device, otherwise false.
*/
bool device_iommu_capable(struct device *dev, enum iommu_cap cap)
{
const struct iommu_ops *ops;
if (!dev_has_iommu(dev))
return false;
ops = dev_iommu_ops(dev);
if (!ops->capable)
return false;
return ops->capable(dev, cap);
}
EXPORT_SYMBOL_GPL(device_iommu_capable);
/**
* iommu_group_has_isolated_msi() - Compute msi_device_has_isolated_msi()
* for a group
* @group: Group to query
*
* IOMMU groups should not have differing values of
* msi_device_has_isolated_msi() for devices in a group. However nothing
* directly prevents this, so ensure mistakes don't result in isolation failures
* by checking that all the devices are the same.
*/
bool iommu_group_has_isolated_msi(struct iommu_group *group)
{
struct group_device *group_dev;
bool ret = true;
mutex_lock(&group->mutex);
for_each_group_device(group, group_dev)
ret &= msi_device_has_isolated_msi(group_dev->dev);
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_group_has_isolated_msi);
/**
* iommu_set_fault_handler() - set a fault handler for an iommu domain
* @domain: iommu domain
* @handler: fault handler
* @token: user data, will be passed back to the fault handler
*
* This function should be used by IOMMU users which want to be notified
* whenever an IOMMU fault happens.
*
* The fault handler itself should return 0 on success, and an appropriate
* error code otherwise.
*/
void iommu_set_fault_handler(struct iommu_domain *domain,
iommu_fault_handler_t handler,
void *token)
{
iommu: Sort out domain user data When DMA/MSI cookies were made first-class citizens back in commit 46983fcd67ac ("iommu: Pull IOVA cookie management into the core"), there was no real need to further expose the two different cookie types. However, now that IOMMUFD wants to add a third type of MSI-mapping cookie, we do have a nicely compelling reason to properly dismabiguate things at the domain level beyond just vaguely guessing from the domain type. Meanwhile, we also effectively have another "cookie" in the form of the anonymous union for other user data, which isn't much better in terms of being vague and unenforced. The fact is that all these cookie types are mutually exclusive, in the sense that combining them makes zero sense and/or would be catastrophic (iommu_set_fault_handler() on an SVA domain, anyone?) - the only combination which *might* be reasonable is perhaps a fault handler and an MSI cookie, but nobody's doing that at the moment, so let's rule it out as well for the sake of being clear and robust. To that end, we pull DMA and MSI cookies apart a little more, mostly to clear up the ambiguity at domain teardown, then for clarity (and to save a little space), move them into the union, whose ownership we can then properly describe and enforce entirely unambiguously. [nicolinc: rebase on latest tree; use prefix IOMMU_COOKIE_; merge unions in iommu_domain; add IOMMU_COOKIE_IOMMUFD for iommufd_hwpt] Link: https://patch.msgid.link/r/1ace9076c95204bbe193ee77499d395f15f44b23.1742871535.git.nicolinc@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-24 21:05:15 -07:00
if (WARN_ON(!domain || domain->cookie_type != IOMMU_COOKIE_NONE))
return;
iommu: Sort out domain user data When DMA/MSI cookies were made first-class citizens back in commit 46983fcd67ac ("iommu: Pull IOVA cookie management into the core"), there was no real need to further expose the two different cookie types. However, now that IOMMUFD wants to add a third type of MSI-mapping cookie, we do have a nicely compelling reason to properly dismabiguate things at the domain level beyond just vaguely guessing from the domain type. Meanwhile, we also effectively have another "cookie" in the form of the anonymous union for other user data, which isn't much better in terms of being vague and unenforced. The fact is that all these cookie types are mutually exclusive, in the sense that combining them makes zero sense and/or would be catastrophic (iommu_set_fault_handler() on an SVA domain, anyone?) - the only combination which *might* be reasonable is perhaps a fault handler and an MSI cookie, but nobody's doing that at the moment, so let's rule it out as well for the sake of being clear and robust. To that end, we pull DMA and MSI cookies apart a little more, mostly to clear up the ambiguity at domain teardown, then for clarity (and to save a little space), move them into the union, whose ownership we can then properly describe and enforce entirely unambiguously. [nicolinc: rebase on latest tree; use prefix IOMMU_COOKIE_; merge unions in iommu_domain; add IOMMU_COOKIE_IOMMUFD for iommufd_hwpt] Link: https://patch.msgid.link/r/1ace9076c95204bbe193ee77499d395f15f44b23.1742871535.git.nicolinc@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-24 21:05:15 -07:00
domain->cookie_type = IOMMU_COOKIE_FAULT_HANDLER;
domain->handler = handler;
domain->handler_token = token;
}
EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
static void iommu_domain_init(struct iommu_domain *domain, unsigned int type,
const struct iommu_ops *ops)
{
domain->type = type;
domain->owner = ops;
if (!domain->ops)
domain->ops = ops->default_domain_ops;
}
static struct iommu_domain *
__iommu_paging_domain_alloc_flags(struct device *dev, unsigned int type,
unsigned int flags)
{
const struct iommu_ops *ops;
struct iommu_domain *domain;
if (!dev_has_iommu(dev))
return ERR_PTR(-ENODEV);
ops = dev_iommu_ops(dev);
if (ops->domain_alloc_paging && !flags)
domain = ops->domain_alloc_paging(dev);
else if (ops->domain_alloc_paging_flags)
domain = ops->domain_alloc_paging_flags(dev, flags, NULL);
#if IS_ENABLED(CONFIG_FSL_PAMU)
else if (ops->domain_alloc && !flags)
domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED);
#endif
else
return ERR_PTR(-EOPNOTSUPP);
if (IS_ERR(domain))
return domain;
if (!domain)
return ERR_PTR(-ENOMEM);
iommu_domain_init(domain, type, ops);
return domain;
}
/**
* iommu_paging_domain_alloc_flags() - Allocate a paging domain
* @dev: device for which the domain is allocated
* @flags: Bitmap of iommufd_hwpt_alloc_flags
*
* Allocate a paging domain which will be managed by a kernel driver. Return
* allocated domain if successful, or an ERR pointer for failure.
*/
struct iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev,
unsigned int flags)
{
return __iommu_paging_domain_alloc_flags(dev,
IOMMU_DOMAIN_UNMANAGED, flags);
}
EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc_flags);
void iommu_domain_free(struct iommu_domain *domain)
{
iommu: Sort out domain user data When DMA/MSI cookies were made first-class citizens back in commit 46983fcd67ac ("iommu: Pull IOVA cookie management into the core"), there was no real need to further expose the two different cookie types. However, now that IOMMUFD wants to add a third type of MSI-mapping cookie, we do have a nicely compelling reason to properly dismabiguate things at the domain level beyond just vaguely guessing from the domain type. Meanwhile, we also effectively have another "cookie" in the form of the anonymous union for other user data, which isn't much better in terms of being vague and unenforced. The fact is that all these cookie types are mutually exclusive, in the sense that combining them makes zero sense and/or would be catastrophic (iommu_set_fault_handler() on an SVA domain, anyone?) - the only combination which *might* be reasonable is perhaps a fault handler and an MSI cookie, but nobody's doing that at the moment, so let's rule it out as well for the sake of being clear and robust. To that end, we pull DMA and MSI cookies apart a little more, mostly to clear up the ambiguity at domain teardown, then for clarity (and to save a little space), move them into the union, whose ownership we can then properly describe and enforce entirely unambiguously. [nicolinc: rebase on latest tree; use prefix IOMMU_COOKIE_; merge unions in iommu_domain; add IOMMU_COOKIE_IOMMUFD for iommufd_hwpt] Link: https://patch.msgid.link/r/1ace9076c95204bbe193ee77499d395f15f44b23.1742871535.git.nicolinc@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-24 21:05:15 -07:00
switch (domain->cookie_type) {
case IOMMU_COOKIE_DMA_IOVA:
iommu_put_dma_cookie(domain);
break;
case IOMMU_COOKIE_DMA_MSI:
iommu_put_msi_cookie(domain);
break;
case IOMMU_COOKIE_SVA:
mmdrop(domain->mm);
iommu: Sort out domain user data When DMA/MSI cookies were made first-class citizens back in commit 46983fcd67ac ("iommu: Pull IOVA cookie management into the core"), there was no real need to further expose the two different cookie types. However, now that IOMMUFD wants to add a third type of MSI-mapping cookie, we do have a nicely compelling reason to properly dismabiguate things at the domain level beyond just vaguely guessing from the domain type. Meanwhile, we also effectively have another "cookie" in the form of the anonymous union for other user data, which isn't much better in terms of being vague and unenforced. The fact is that all these cookie types are mutually exclusive, in the sense that combining them makes zero sense and/or would be catastrophic (iommu_set_fault_handler() on an SVA domain, anyone?) - the only combination which *might* be reasonable is perhaps a fault handler and an MSI cookie, but nobody's doing that at the moment, so let's rule it out as well for the sake of being clear and robust. To that end, we pull DMA and MSI cookies apart a little more, mostly to clear up the ambiguity at domain teardown, then for clarity (and to save a little space), move them into the union, whose ownership we can then properly describe and enforce entirely unambiguously. [nicolinc: rebase on latest tree; use prefix IOMMU_COOKIE_; merge unions in iommu_domain; add IOMMU_COOKIE_IOMMUFD for iommufd_hwpt] Link: https://patch.msgid.link/r/1ace9076c95204bbe193ee77499d395f15f44b23.1742871535.git.nicolinc@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-24 21:05:15 -07:00
break;
default:
break;
}
if (domain->ops->free)
domain->ops->free(domain);
}
EXPORT_SYMBOL_GPL(iommu_domain_free);
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
/*
* Put the group's domain back to the appropriate core-owned domain - either the
* standard kernel-mode DMA configuration or an all-DMA-blocked domain.
*/
static void __iommu_group_set_core_domain(struct iommu_group *group)
{
struct iommu_domain *new_domain;
if (group->owner)
new_domain = group->blocking_domain;
else
new_domain = group->default_domain;
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
__iommu_group_set_domain_nofail(group, new_domain);
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
}
static int __iommu_attach_device(struct iommu_domain *domain,
struct device *dev)
{
int ret;
if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;
ret = domain->ops->attach_dev(domain, dev);
if (ret)
return ret;
dev->iommu->attach_deferred = 0;
trace_attach_device_to_domain(dev);
return 0;
}
/**
* iommu_attach_device - Attach an IOMMU domain to a device
* @domain: IOMMU domain to attach
* @dev: Device that will be attached
*
* Returns 0 on success and error code on failure
*
* Note that EINVAL can be treated as a soft failure, indicating
* that certain configuration of the domain is incompatible with
* the device. In this case attaching a different domain to the
* device may succeed.
*/
int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
int ret;
if (!group)
return -ENODEV;
/*
* Lock the group to make sure the device-count doesn't
* change while we are attaching
*/
mutex_lock(&group->mutex);
ret = -EINVAL;
if (list_count_nodes(&group->devices) != 1)
goto out_unlock;
ret = __iommu_attach_group(domain, group);
out_unlock:
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_attach_device);
iommu: use the __iommu_attach_device() directly for deferred attach Currently, because domain attach allows to be deferred from iommu driver to device driver, and when iommu initializes, the devices on the bus will be scanned and the default groups will be allocated. Due to the above changes, some devices could be added to the same group as below: [ 3.859417] pci 0000:01:00.0: Adding to iommu group 16 [ 3.864572] pci 0000:01:00.1: Adding to iommu group 16 [ 3.869738] pci 0000:02:00.0: Adding to iommu group 17 [ 3.874892] pci 0000:02:00.1: Adding to iommu group 17 But when attaching these devices, it doesn't allow that a group has more than one device, otherwise it will return an error. This conflicts with the deferred attaching. Unfortunately, it has two devices in the same group for my side, for example: [ 9.627014] iommu_group_device_count(): device name[0]:0000:01:00.0 [ 9.633545] iommu_group_device_count(): device name[1]:0000:01:00.1 ... [ 10.255609] iommu_group_device_count(): device name[0]:0000:02:00.0 [ 10.262144] iommu_group_device_count(): device name[1]:0000:02:00.1 Finally, which caused the failure of tg3 driver when tg3 driver calls the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma(). [ 9.660310] tg3 0000:01:00.0: DMA engine test failed, aborting [ 9.754085] tg3: probe of 0000:01:00.0 failed with error -12 [ 9.997512] tg3 0000:01:00.1: DMA engine test failed, aborting [ 10.043053] tg3: probe of 0000:01:00.1 failed with error -12 [ 10.288905] tg3 0000:02:00.0: DMA engine test failed, aborting [ 10.334070] tg3: probe of 0000:02:00.0 failed with error -12 [ 10.578303] tg3 0000:02:00.1: DMA engine test failed, aborting [ 10.622629] tg3: probe of 0000:02:00.1 failed with error -12 In addition, the similar situations also occur in other drivers such as the bnxt_en driver. That can be reproduced easily in kdump kernel when SME is active. Let's move the handling currently in iommu_dma_deferred_attach() into the iommu core code so that it can call the __iommu_attach_device() directly instead of the iommu_attach_device(). The external interface iommu_attach_device() is not suitable for handling this situation. Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20210126115337.20068-3-lijiang@redhat.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-01-26 19:53:37 +08:00
int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
{
if (dev->iommu && dev->iommu->attach_deferred)
iommu: use the __iommu_attach_device() directly for deferred attach Currently, because domain attach allows to be deferred from iommu driver to device driver, and when iommu initializes, the devices on the bus will be scanned and the default groups will be allocated. Due to the above changes, some devices could be added to the same group as below: [ 3.859417] pci 0000:01:00.0: Adding to iommu group 16 [ 3.864572] pci 0000:01:00.1: Adding to iommu group 16 [ 3.869738] pci 0000:02:00.0: Adding to iommu group 17 [ 3.874892] pci 0000:02:00.1: Adding to iommu group 17 But when attaching these devices, it doesn't allow that a group has more than one device, otherwise it will return an error. This conflicts with the deferred attaching. Unfortunately, it has two devices in the same group for my side, for example: [ 9.627014] iommu_group_device_count(): device name[0]:0000:01:00.0 [ 9.633545] iommu_group_device_count(): device name[1]:0000:01:00.1 ... [ 10.255609] iommu_group_device_count(): device name[0]:0000:02:00.0 [ 10.262144] iommu_group_device_count(): device name[1]:0000:02:00.1 Finally, which caused the failure of tg3 driver when tg3 driver calls the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma(). [ 9.660310] tg3 0000:01:00.0: DMA engine test failed, aborting [ 9.754085] tg3: probe of 0000:01:00.0 failed with error -12 [ 9.997512] tg3 0000:01:00.1: DMA engine test failed, aborting [ 10.043053] tg3: probe of 0000:01:00.1 failed with error -12 [ 10.288905] tg3 0000:02:00.0: DMA engine test failed, aborting [ 10.334070] tg3: probe of 0000:02:00.0 failed with error -12 [ 10.578303] tg3 0000:02:00.1: DMA engine test failed, aborting [ 10.622629] tg3: probe of 0000:02:00.1 failed with error -12 In addition, the similar situations also occur in other drivers such as the bnxt_en driver. That can be reproduced easily in kdump kernel when SME is active. Let's move the handling currently in iommu_dma_deferred_attach() into the iommu core code so that it can call the __iommu_attach_device() directly instead of the iommu_attach_device(). The external interface iommu_attach_device() is not suitable for handling this situation. Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20210126115337.20068-3-lijiang@redhat.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-01-26 19:53:37 +08:00
return __iommu_attach_device(domain, dev);
return 0;
}
void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
if (!group)
return;
mutex_lock(&group->mutex);
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
if (WARN_ON(domain != group->domain) ||
WARN_ON(list_count_nodes(&group->devices) != 1))
goto out_unlock;
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
__iommu_group_set_core_domain(group);
out_unlock:
mutex_unlock(&group->mutex);
}
EXPORT_SYMBOL_GPL(iommu_detach_device);
struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
if (!group)
return NULL;
return group->domain;
}
EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
/*
* For IOMMU_DOMAIN_DMA implementations which already provide their own
* guarantees that the group and its default domain are valid and correct.
*/
struct iommu_domain *iommu_get_dma_domain(struct device *dev)
{
return dev->iommu_group->default_domain;
}
static void *iommu_make_pasid_array_entry(struct iommu_domain *domain,
struct iommu_attach_handle *handle)
{
if (handle) {
handle->domain = domain;
return xa_tag_pointer(handle, IOMMU_PASID_ARRAY_HANDLE);
}
return xa_tag_pointer(domain, IOMMU_PASID_ARRAY_DOMAIN);
}
iommu: Allow attaching static domains in iommu_attach_device_pasid() The idxd driver attaches the default domain to a PASID of the device to perform kernel DMA using that PASID. The domain is attached to the device's PASID through iommu_attach_device_pasid(), which checks if the domain->owner matches the iommu_ops retrieved from the device. If they do not match, it returns a failure. if (ops != domain->owner || pasid == IOMMU_NO_PASID) return -EINVAL; The static identity domain implemented by the intel iommu driver doesn't specify the domain owner. Therefore, kernel DMA with PASID doesn't work for the idxd driver if the device translation mode is set to passthrough. Generally the owner field of static domains are not set because they are already part of iommu ops. Add a helper domain_iommu_ops_compatible() that checks if a domain is compatible with the device's iommu ops. This helper explicitly allows the static blocked and identity domains associated with the device's iommu_ops to be considered compatible. Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220031 Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/20250422191554.GC1213339@ziepe.ca/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250424034123.2311362-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 11:41:23 +08:00
static bool domain_iommu_ops_compatible(const struct iommu_ops *ops,
struct iommu_domain *domain)
{
if (domain->owner == ops)
return true;
/* For static domains, owner isn't set. */
if (domain == ops->blocked_domain || domain == ops->identity_domain)
return true;
return false;
}
static int __iommu_attach_group(struct iommu_domain *domain,
struct iommu_group *group)
{
struct device *dev;
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
if (group->domain && group->domain != group->default_domain &&
group->domain != group->blocking_domain)
return -EBUSY;
dev = iommu_group_first_dev(group);
iommu: Allow attaching static domains in iommu_attach_device_pasid() The idxd driver attaches the default domain to a PASID of the device to perform kernel DMA using that PASID. The domain is attached to the device's PASID through iommu_attach_device_pasid(), which checks if the domain->owner matches the iommu_ops retrieved from the device. If they do not match, it returns a failure. if (ops != domain->owner || pasid == IOMMU_NO_PASID) return -EINVAL; The static identity domain implemented by the intel iommu driver doesn't specify the domain owner. Therefore, kernel DMA with PASID doesn't work for the idxd driver if the device translation mode is set to passthrough. Generally the owner field of static domains are not set because they are already part of iommu ops. Add a helper domain_iommu_ops_compatible() that checks if a domain is compatible with the device's iommu ops. This helper explicitly allows the static blocked and identity domains associated with the device's iommu_ops to be considered compatible. Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220031 Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/20250422191554.GC1213339@ziepe.ca/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250424034123.2311362-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 11:41:23 +08:00
if (!dev_has_iommu(dev) ||
!domain_iommu_ops_compatible(dev_iommu_ops(dev), domain))
return -EINVAL;
return __iommu_group_set_domain(group, domain);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
}
/**
* iommu_attach_group - Attach an IOMMU domain to an IOMMU group
* @domain: IOMMU domain to attach
* @group: IOMMU group that will be attached
*
* Returns 0 on success and error code on failure
*
* Note that EINVAL can be treated as a soft failure, indicating
* that certain configuration of the domain is incompatible with
* the group. In this case attaching a different domain to the
* group may succeed.
*/
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
{
int ret;
mutex_lock(&group->mutex);
ret = __iommu_attach_group(domain, group);
mutex_unlock(&group->mutex);
return ret;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
}
EXPORT_SYMBOL_GPL(iommu_attach_group);
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
static int __iommu_device_set_domain(struct iommu_group *group,
struct device *dev,
struct iommu_domain *new_domain,
unsigned int flags)
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
{
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
int ret;
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
iommu: Prevent RESV_DIRECT devices from blocking domains The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 20:48:02 +08:00
/*
* If the device requires IOMMU_RESV_DIRECT then we cannot allow
* the blocking domain to be attached as it does not contain the
* required 1:1 mapping. This test effectively excludes the device
* being used with iommu_group_claim_dma_owner() which will block
* vfio and iommufd as well.
*/
if (dev->iommu->require_direct &&
(new_domain->type == IOMMU_DOMAIN_BLOCKED ||
new_domain == group->blocking_domain)) {
dev_warn(dev,
"Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.\n");
return -EINVAL;
}
if (dev->iommu->attach_deferred) {
if (new_domain == group->default_domain)
return 0;
dev->iommu->attach_deferred = 0;
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
ret = __iommu_attach_device(new_domain, dev);
if (ret) {
/*
* If we have a blocking domain then try to attach that in hopes
* of avoiding a UAF. Modern drivers should implement blocking
* domains as global statics that cannot fail.
*/
if ((flags & IOMMU_SET_DOMAIN_MUST_SUCCEED) &&
group->blocking_domain &&
group->blocking_domain != new_domain)
__iommu_attach_device(group->blocking_domain, dev);
return ret;
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
return 0;
}
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
/*
* If 0 is returned the group's domain is new_domain. If an error is returned
* then the group's domain will be set back to the existing domain unless
* IOMMU_SET_DOMAIN_MUST_SUCCEED, otherwise an error is returned and the group's
* domains is left inconsistent. This is a driver bug to fail attach with a
* previously good domain. We try to avoid a kernel UAF because of this.
*
* IOMMU groups are really the natural working unit of the IOMMU, but the IOMMU
* API works on domains and devices. Bridge that gap by iterating over the
* devices in a group. Ideally we'd have a single device which represents the
* requestor ID of the group, but we also allow IOMMU drivers to create policy
* defined minimum sets, where the physical hardware may be able to distiguish
* members, but we wish to group them at a higher level (ex. untrusted
* multi-function PCI devices). Thus we attach each device.
*/
static int __iommu_group_set_domain_internal(struct iommu_group *group,
struct iommu_domain *new_domain,
unsigned int flags)
{
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
struct group_device *last_gdev;
struct group_device *gdev;
int result;
int ret;
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
lockdep_assert_held(&group->mutex);
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
if (group->domain == new_domain)
return 0;
if (WARN_ON(!new_domain))
return -EINVAL;
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
/*
* Changing the domain is done by calling attach_dev() on the new
* domain. This switch does not have to be atomic and DMA can be
* discarded during the transition. DMA must only be able to access
* either new_domain or group->domain, never something else.
*/
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
result = 0;
for_each_group_device(group, gdev) {
ret = __iommu_device_set_domain(group, gdev->dev, new_domain,
flags);
if (ret) {
result = ret;
/*
* Keep trying the other devices in the group. If a
* driver fails attach to an otherwise good domain, and
* does not support blocking domains, it should at least
* drop its reference on the current domain so we don't
* UAF.
*/
if (flags & IOMMU_SET_DOMAIN_MUST_SUCCEED)
continue;
goto err_revert;
}
}
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
group->domain = new_domain;
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
return result;
err_revert:
/*
* This is called in error unwind paths. A well behaved driver should
* always allow us to attach to a domain that was already attached.
*/
last_gdev = gdev;
for_each_group_device(group, gdev) {
/*
* A NULL domain can happen only for first probe, in which case
* we leave group->domain as NULL and let release clean
* everything up.
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
*/
if (group->domain)
WARN_ON(__iommu_device_set_domain(
group, gdev->dev, group->domain,
IOMMU_SET_DOMAIN_MUST_SUCCEED));
if (gdev == last_gdev)
break;
}
return ret;
}
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
{
mutex_lock(&group->mutex);
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
__iommu_group_set_core_domain(group);
mutex_unlock(&group->mutex);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
}
EXPORT_SYMBOL_GPL(iommu_detach_group);
phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
{
if (domain->type == IOMMU_DOMAIN_IDENTITY)
return iova;
if (domain->type == IOMMU_DOMAIN_BLOCKED)
return 0;
return domain->ops->iova_to_phys(domain, iova);
}
EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, size_t *count)
{
unsigned int pgsize_idx, pgsize_idx_next;
unsigned long pgsizes;
size_t offset, pgsize, pgsize_next;
size_t offset_end;
unsigned long addr_merge = paddr | iova;
/* Page sizes supported by the hardware and small enough for @size */
pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
/* Constrain the page sizes further based on the maximum alignment */
if (likely(addr_merge))
pgsizes &= GENMASK(__ffs(addr_merge), 0);
/* Make sure we have at least one suitable page size */
BUG_ON(!pgsizes);
/* Pick the biggest page size remaining */
pgsize_idx = __fls(pgsizes);
pgsize = BIT(pgsize_idx);
if (!count)
return pgsize;
/* Find the next biggest support page size, if it exists */
pgsizes = domain->pgsize_bitmap & ~GENMASK(pgsize_idx, 0);
if (!pgsizes)
goto out_set_count;
pgsize_idx_next = __ffs(pgsizes);
pgsize_next = BIT(pgsize_idx_next);
/*
* There's no point trying a bigger page size unless the virtual
* and physical addresses are similarly offset within the larger page.
*/
if ((iova ^ paddr) & (pgsize_next - 1))
goto out_set_count;
/* Calculate the offset to the next page size alignment boundary */
offset = pgsize_next - (addr_merge & (pgsize_next - 1));
/*
* If size is big enough to accommodate the larger page, reduce
* the number of smaller pages.
*/
if (!check_add_overflow(offset, pgsize_next, &offset_end) &&
offset_end <= size)
size = offset;
out_set_count:
*count = size >> pgsize_idx;
return pgsize;
}
int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
{
const struct iommu_domain_ops *ops = domain->ops;
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
unsigned long orig_iova = iova;
unsigned int min_pagesz;
size_t orig_size = size;
phys_addr_t orig_paddr = paddr;
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
int ret = 0;
might_sleep_if(gfpflags_allow_blocking(gfp));
if (unlikely(!(domain->type & __IOMMU_DOMAIN_PAGING)))
return -EINVAL;
if (WARN_ON(!ops->map_pages || domain->pgsize_bitmap == 0UL))
return -ENODEV;
/* Discourage passing strange GFP flags */
if (WARN_ON_ONCE(gfp & (__GFP_COMP | __GFP_DMA | __GFP_DMA32 |
__GFP_HIGHMEM)))
return -EINVAL;
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
/* find out the minimum page size supported */
min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
/*
* both the virtual address and the physical one, as well as
* the size of the mapping, must be aligned (at least) to the
* size of the smallest page supported by the hardware
*/
if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n",
iova, &paddr, size, min_pagesz);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
return -EINVAL;
}
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, &paddr, size);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
while (size) {
size_t pgsize, count, mapped = 0;
pgsize = iommu_pgsize(domain, iova, paddr, size, &count);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx count %zu\n",
iova, &paddr, pgsize, count);
ret = ops->map_pages(domain, iova, paddr, pgsize, count, prot,
gfp, &mapped);
/*
* Some pages may have been mapped, even if an error occurred,
* so we should account for those so they can be unmapped.
*/
size -= mapped;
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
if (ret)
break;
iova += mapped;
paddr += mapped;
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
}
/* unroll mapping in case something went wrong */
if (ret)
iommu_unmap(domain, orig_iova, orig_size - size);
else
trace_map(orig_iova, orig_paddr, orig_size);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
return ret;
}
int iommu_sync_map(struct iommu_domain *domain, unsigned long iova, size_t size)
{
const struct iommu_domain_ops *ops = domain->ops;
if (!ops->iotlb_sync_map)
return 0;
return ops->iotlb_sync_map(domain, iova, size);
}
int iommu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
{
int ret;
ret = iommu_map_nosync(domain, iova, paddr, size, prot, gfp);
if (ret)
return ret;
ret = iommu_sync_map(domain, iova, size);
if (ret)
iommu_unmap(domain, iova, size);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_map);
static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather)
{
const struct iommu_domain_ops *ops = domain->ops;
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
size_t unmapped_page, unmapped = 0;
unsigned long orig_iova = iova;
unsigned int min_pagesz;
if (unlikely(!(domain->type & __IOMMU_DOMAIN_PAGING)))
return 0;
if (WARN_ON(!ops->unmap_pages || domain->pgsize_bitmap == 0UL))
return 0;
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
/* find out the minimum page size supported */
min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
/*
* The virtual address, as well as the size of the mapping, must be
* aligned (at least) to the size of the smallest page supported
* by the hardware
*/
if (!IS_ALIGNED(iova | size, min_pagesz)) {
pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
iova, size, min_pagesz);
return 0;
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
}
pr_debug("unmap this: iova 0x%lx size 0x%zx\n", iova, size);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
/*
* Keep iterating until we either unmap 'size' bytes (or more)
* or we hit an area that isn't mapped.
*/
while (unmapped < size) {
size_t pgsize, count;
pgsize = iommu_pgsize(domain, iova, iova, size - unmapped, &count);
unmapped_page = ops->unmap_pages(domain, iova, pgsize, count, iotlb_gather);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
if (!unmapped_page)
break;
pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
iova, unmapped_page);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
iova += unmapped_page;
unmapped += unmapped_page;
}
trace_unmap(orig_iova, size, unmapped);
iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-11-10 11:32:26 +02:00
return unmapped;
}
/**
* iommu_unmap() - Remove mappings from a range of IOVA
* @domain: Domain to manipulate
* @iova: IO virtual address to start
* @size: Length of the range starting from @iova
*
* iommu_unmap() will remove a translation created by iommu_map(). It cannot
* subdivide a mapping created by iommu_map(), so it should be called with IOVA
* ranges that match what was passed to iommu_map(). The range can aggregate
* contiguous iommu_map() calls so long as no individual range is split.
*
* Returns: Number of bytes of IOVA unmapped. iova + res will be the point
* unmapping stopped.
*/
size_t iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size)
{
struct iommu_iotlb_gather iotlb_gather;
size_t ret;
iommu_iotlb_gather_init(&iotlb_gather);
ret = __iommu_unmap(domain, iova, size, &iotlb_gather);
iommu_iotlb_sync(domain, &iotlb_gather);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_unmap);
/**
* iommu_unmap_fast() - Remove mappings from a range of IOVA without IOTLB sync
* @domain: Domain to manipulate
* @iova: IO virtual address to start
* @size: Length of the range starting from @iova
* @iotlb_gather: range information for a pending IOTLB flush
*
* iommu_unmap_fast() will remove a translation created by iommu_map().
* It can't subdivide a mapping created by iommu_map(), so it should be
* called with IOVA ranges that match what was passed to iommu_map(). The
* range can aggregate contiguous iommu_map() calls so long as no individual
* range is split.
*
* Basically iommu_unmap_fast() is the same as iommu_unmap() but for callers
* which manage the IOTLB flushing externally to perform a batched sync.
*
* Returns: Number of bytes of IOVA unmapped. iova + res will be the point
* unmapping stopped.
*/
size_t iommu_unmap_fast(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather)
{
return __iommu_unmap(domain, iova, size, iotlb_gather);
}
EXPORT_SYMBOL_GPL(iommu_unmap_fast);
ssize_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
struct scatterlist *sg, unsigned int nents, int prot,
gfp_t gfp)
{
size_t len = 0, mapped = 0;
phys_addr_t start;
unsigned int i = 0;
int ret;
while (i <= nents) {
phys_addr_t s_phys = sg_phys(sg);
if (len && s_phys != start + len) {
ret = iommu_map_nosync(domain, iova + mapped, start,
len, prot, gfp);
if (ret)
goto out_err;
mapped += len;
len = 0;
}
dma-mapping: name SG DMA flag helpers consistently sg_is_dma_bus_address() is inconsistent with the naming pattern of its corresponding setters and its own kerneldoc, so take the majority vote and rename it sg_dma_is_bus_address() (and fix up the missing underscores in the kerneldoc too). This gives us a nice clear pattern where SG DMA flags are SG_DMA_<NAME>, and the helpers for acting on them are sg_dma_<action>_<name>(). Link: https://lkml.kernel.org/r/20230612153201.554742-14-catalin.marinas@arm.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Link: https://lore.kernel.org/r/fa2eca2862c7ffc41b50337abffb2dfd2864d3ea.1685036694.git.robin.murphy@arm.com Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 16:31:57 +01:00
if (sg_dma_is_bus_address(sg))
goto next;
if (len) {
len += sg->length;
} else {
len = sg->length;
start = s_phys;
}
next:
if (++i < nents)
sg = sg_next(sg);
}
ret = iommu_sync_map(domain, iova, mapped);
if (ret)
goto out_err;
return mapped;
out_err:
/* undo mappings already done */
iommu_unmap(domain, iova, mapped);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_map_sg);
/**
* report_iommu_fault() - report about an IOMMU fault to the IOMMU framework
* @domain: the iommu domain where the fault has happened
* @dev: the device where the fault has happened
* @iova: the faulting address
* @flags: mmu fault flags (e.g. IOMMU_FAULT_READ/IOMMU_FAULT_WRITE/...)
*
* This function should be called by the low-level IOMMU implementations
* whenever IOMMU faults happen, to allow high-level users, that are
* interested in such events, to know about them.
*
* This event may be useful for several possible use cases:
* - mere logging of the event
* - dynamic TLB/PTE loading
* - if restarting of the faulting device is required
*
* Returns 0 on success and an appropriate error code otherwise (if dynamic
* PTE/TLB loading will one day be supported, implementations will be able
* to tell whether it succeeded or not according to this return value).
*
* Specifically, -ENOSYS is returned if a fault handler isn't installed
* (though fault handlers can also return -ENOSYS, in case they want to
* elicit the default behavior of the IOMMU drivers).
*/
int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
unsigned long iova, int flags)
{
int ret = -ENOSYS;
/*
* if upper layers showed interest and installed a fault handler,
* invoke it.
*/
iommu: Fix crash in report_iommu_fault() The following crash is observed while handling an IOMMU fault with a recent kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: ffff8c708299f700 PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3 Oops: Oops: 0011 [#1] SMP NOPTI CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20 PREEMPT(lazy) Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW 09/27/2024 RIP: 0010:0xffff8c708299f700 Call Trace: <TASK> ? report_iommu_fault+0x78/0xd3 ? amd_iommu_report_page_fault+0x91/0x150 ? amd_iommu_int_thread+0x77/0x180 ? __pfx_irq_thread_fn+0x10/0x10 ? irq_thread_fn+0x23/0x60 ? irq_thread+0xf9/0x1e0 ? __pfx_irq_thread_dtor+0x10/0x10 ? __pfx_irq_thread+0x10/0x10 ? kthread+0xfc/0x240 ? __pfx_kthread+0x10/0x10 ? ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ? ret_from_fork_asm+0x1a/0x30 </TASK> report_iommu_fault() checks for an installed handler comparing the corresponding field to NULL. It can (and could before) be called for a domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA, specifically. Cookie is represented as a union so we may end up with a garbage value treated there if this happens for a domain with another cookie type. Formerly there were two exclusive cookie types in the union. IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault(). Call the fault handler only if the passed domain has a required cookie type. Found by Linux Verification Center (linuxtesting.org). Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250408213342.285955-1-pchelkin@ispras.ru Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-09 00:33:41 +03:00
if (domain->cookie_type == IOMMU_COOKIE_FAULT_HANDLER &&
domain->handler)
ret = domain->handler(domain, dev, iova, flags,
domain->handler_token);
trace_io_page_fault(dev, iova, flags);
return ret;
}
EXPORT_SYMBOL_GPL(report_iommu_fault);
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
static int __init iommu_init(void)
{
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
iommu_group_kset = kset_create_and_add("iommu_groups",
NULL, kernel_kobj);
BUG_ON(!iommu_group_kset);
iommu_debugfs_setup();
iommu: IOMMU Groups IOMMU device groups are currently a rather vague associative notion with assembly required by the user or user level driver provider to do anything useful. This patch intends to grow the IOMMU group concept into something a bit more consumable. To do this, we first create an object representing the group, struct iommu_group. This structure is allocated (iommu_group_alloc) and filled (iommu_group_add_device) by the iommu driver. The iommu driver is free to add devices to the group using it's own set of policies. This allows inclusion of devices based on physical hardware or topology limitations of the platform, as well as soft requirements, such as multi-function trust levels or peer-to-peer protection of the interconnects. Each device may only belong to a single iommu group, which is linked from struct device.iommu_group. IOMMU groups are maintained using kobject reference counting, allowing for automatic removal of empty, unreferenced groups. It is the responsibility of the iommu driver to remove devices from the group (iommu_group_remove_device). IOMMU groups also include a userspace representation in sysfs under /sys/kernel/iommu_groups. When allocated, each group is given a dynamically assign ID (int). The ID is managed by the core IOMMU group code to support multiple heterogeneous iommu drivers, which could potentially collide in group naming/numbering. This also keeps group IDs to small, easily managed values. A directory is created under /sys/kernel/iommu_groups for each group. A further subdirectory named "devices" contains links to each device within the group. The iommu_group file in the device's sysfs directory, which formerly contained a group number when read, is now a link to the iommu group. Example: $ ls -l /sys/kernel/iommu_groups/26/devices/ total 0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 -> ../../../../devices/pci0000:00/0000:00:1e.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 $ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group [truncating perms/owner/timestamp] /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group -> ../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group -> ../../../../kernel/iommu_groups/26 /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group -> ../../../../kernel/iommu_groups/26 Groups also include several exported functions for use by user level driver providers, for example VFIO. These include: iommu_group_get(): Acquires a reference to a group from a device iommu_group_put(): Releases reference iommu_group_for_each_dev(): Iterates over group devices using callback iommu_group_[un]register_notifier(): Allows notification of device add and remove operations relevant to the group iommu_group_id(): Return the group number This patch also extends the IOMMU API to allow attaching groups to domains. This is currently a simple wrapper for iterating through devices within a group, but it's expected that the IOMMU API may eventually make groups a more integral part of domains. Groups intentionally do not try to manage group ownership. A user level driver provider must independently acquire ownership for each device within a group before making use of the group as a whole. This may change in the future if group usage becomes more pervasive across both DMA and IOMMU ops. Groups intentionally do not provide a mechanism for driver locking or otherwise manipulating driver matching/probing of devices within the group. Such interfaces are generic to devices and beyond the scope of IOMMU groups. If implemented, user level providers have ready access via iommu_group_for_each_dev and group notifiers. iommu_device_group() is removed here as it has no users. The replacement is: group = iommu_group_get(dev); id = iommu_group_id(group); iommu_group_put(group); AMD-Vi & Intel VT-d support re-added in following patches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
return 0;
}
core_initcall(iommu_init);
int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirk)
{
if (domain->type != IOMMU_DOMAIN_UNMANAGED)
return -EINVAL;
if (!domain->ops->set_pgtable_quirks)
return -EINVAL;
return domain->ops->set_pgtable_quirks(domain, quirk);
}
EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks);
/**
* iommu_get_resv_regions - get reserved regions
* @dev: device for which to get reserved regions
* @list: reserved region list for device
*
* This returns a list of reserved IOVA regions specific to this device.
* A domain user should not map IOVA in these ranges.
*/
void iommu_get_resv_regions(struct device *dev, struct list_head *list)
{
const struct iommu_ops *ops = dev_iommu_ops(dev);
if (ops->get_resv_regions)
ops->get_resv_regions(dev, list);
}
EXPORT_SYMBOL_GPL(iommu_get_resv_regions);
/**
* iommu_put_resv_regions - release reserved regions
* @dev: device for which to free reserved regions
* @list: reserved region list for device
*
* This releases a reserved region list acquired by iommu_get_resv_regions().
*/
void iommu_put_resv_regions(struct device *dev, struct list_head *list)
{
struct iommu_resv_region *entry, *next;
list_for_each_entry_safe(entry, next, list, list) {
if (entry->free)
entry->free(dev, entry);
else
kfree(entry);
}
}
EXPORT_SYMBOL(iommu_put_resv_regions);
struct iommu_resv_region *iommu_alloc_resv_region(phys_addr_t start,
iommu: Disambiguate MSI region types The introduction of reserved regions has left a couple of rough edges which we could do with sorting out sooner rather than later. Since we are not yet addressing the potential dynamic aspect of software-managed reservations and presenting them at arbitrary fixed addresses, it is incongruous that we end up displaying hardware vs. software-managed MSI regions to userspace differently, especially since ARM-based systems may actually require one or the other, or even potentially both at once, (which iommu-dma currently has no hope of dealing with at all). Let's resolve the former user-visible inconsistency ASAP before the ABI has been baked into a kernel release, in a way that also lays the groundwork for the latter shortcoming to be addressed by follow-up patches. For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use IOMMU_RESV_MSI to describe the hardware type, and document everything a little bit. Since the x86 MSI remapping hardware falls squarely under this meaning of IOMMU_RESV_MSI, apply that type to their regions as well, so that we tell the same story to userspace across all platforms. Secondly, as the various region types require quite different handling, and it really makes little sense to ever try combining them, convert the bitfield-esque #defines to a plain enum in the process before anyone gets the wrong impression. Fixes: d30ddcaa7b02 ("iommu: Add a new type field in iommu_resv_region") Reviewed-by: Eric Auger <eric.auger@redhat.com> CC: Alex Williamson <alex.williamson@redhat.com> CC: David Woodhouse <dwmw2@infradead.org> CC: kvm@vger.kernel.org Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-16 17:00:16 +00:00
size_t length, int prot,
enum iommu_resv_type type,
gfp_t gfp)
{
struct iommu_resv_region *region;
region = kzalloc(sizeof(*region), gfp);
if (!region)
return NULL;
INIT_LIST_HEAD(&region->list);
region->start = start;
region->length = length;
region->prot = prot;
region->type = type;
return region;
}
EXPORT_SYMBOL_GPL(iommu_alloc_resv_region);
void iommu_set_default_passthrough(bool cmd_line)
{
if (cmd_line)
iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
}
void iommu_set_default_translated(bool cmd_line)
{
if (cmd_line)
iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
iommu_def_domain_type = IOMMU_DOMAIN_DMA;
}
bool iommu_default_passthrough(void)
{
return iommu_def_domain_type == IOMMU_DOMAIN_IDENTITY;
}
EXPORT_SYMBOL_GPL(iommu_default_passthrough);
iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit b46064a18810 ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 18:41:28 +01:00
static const struct iommu_device *iommu_from_fwnode(const struct fwnode_handle *fwnode)
iommu: Make of_iommu_set/get_ops() DT agnostic The of_iommu_{set/get}_ops() API is used to associate a device tree node with a specific set of IOMMU operations. The same kernel interface is required on systems booting with ACPI, where devices are not associated with a device tree node, therefore the interface requires generalization. The struct device fwnode member represents the fwnode token associated with the device and the struct it points at is firmware specific; regardless, it is initialized on both ACPI and DT systems and makes an ideal candidate to use it to associate a set of IOMMU operations to a given device, through its struct device.fwnode member pointer, paving the way for representing per-device iommu_ops (ie an iommu instance associated with a device). Convert the DT specific of_iommu_{set/get}_ops() interface to use struct device.fwnode as a look-up token, making the interface usable on ACPI systems and rename the data structures and the registration API so that they are made to represent their usage more clearly. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tn@semihalf.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Tomasz Nowicki <tn@semihalf.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-21 10:01:36 +00:00
{
iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit b46064a18810 ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 18:41:28 +01:00
const struct iommu_device *iommu, *ret = NULL;
iommu: Make of_iommu_set/get_ops() DT agnostic The of_iommu_{set/get}_ops() API is used to associate a device tree node with a specific set of IOMMU operations. The same kernel interface is required on systems booting with ACPI, where devices are not associated with a device tree node, therefore the interface requires generalization. The struct device fwnode member represents the fwnode token associated with the device and the struct it points at is firmware specific; regardless, it is initialized on both ACPI and DT systems and makes an ideal candidate to use it to associate a set of IOMMU operations to a given device, through its struct device.fwnode member pointer, paving the way for representing per-device iommu_ops (ie an iommu instance associated with a device). Convert the DT specific of_iommu_{set/get}_ops() interface to use struct device.fwnode as a look-up token, making the interface usable on ACPI systems and rename the data structures and the registration API so that they are made to represent their usage more clearly. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tn@semihalf.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Tomasz Nowicki <tn@semihalf.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-21 10:01:36 +00:00
spin_lock(&iommu_device_lock);
list_for_each_entry(iommu, &iommu_device_list, list)
if (iommu->fwnode == fwnode) {
iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit b46064a18810 ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 18:41:28 +01:00
ret = iommu;
iommu: Make of_iommu_set/get_ops() DT agnostic The of_iommu_{set/get}_ops() API is used to associate a device tree node with a specific set of IOMMU operations. The same kernel interface is required on systems booting with ACPI, where devices are not associated with a device tree node, therefore the interface requires generalization. The struct device fwnode member represents the fwnode token associated with the device and the struct it points at is firmware specific; regardless, it is initialized on both ACPI and DT systems and makes an ideal candidate to use it to associate a set of IOMMU operations to a given device, through its struct device.fwnode member pointer, paving the way for representing per-device iommu_ops (ie an iommu instance associated with a device). Convert the DT specific of_iommu_{set/get}_ops() interface to use struct device.fwnode as a look-up token, making the interface usable on ACPI systems and rename the data structures and the registration API so that they are made to represent their usage more clearly. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tn@semihalf.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Tomasz Nowicki <tn@semihalf.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-21 10:01:36 +00:00
break;
}
spin_unlock(&iommu_device_lock);
iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit b46064a18810 ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 18:41:28 +01:00
return ret;
}
const struct iommu_ops *iommu_ops_from_fwnode(const struct fwnode_handle *fwnode)
{
const struct iommu_device *iommu = iommu_from_fwnode(fwnode);
return iommu ? iommu->ops : NULL;
iommu: Make of_iommu_set/get_ops() DT agnostic The of_iommu_{set/get}_ops() API is used to associate a device tree node with a specific set of IOMMU operations. The same kernel interface is required on systems booting with ACPI, where devices are not associated with a device tree node, therefore the interface requires generalization. The struct device fwnode member represents the fwnode token associated with the device and the struct it points at is firmware specific; regardless, it is initialized on both ACPI and DT systems and makes an ideal candidate to use it to associate a set of IOMMU operations to a given device, through its struct device.fwnode member pointer, paving the way for representing per-device iommu_ops (ie an iommu instance associated with a device). Convert the DT specific of_iommu_{set/get}_ops() interface to use struct device.fwnode as a look-up token, making the interface usable on ACPI systems and rename the data structures and the registration API so that they are made to represent their usage more clearly. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tn@semihalf.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Tomasz Nowicki <tn@semihalf.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-21 10:01:36 +00:00
}
int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode)
{
iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit b46064a18810 ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 18:41:28 +01:00
const struct iommu_device *iommu = iommu_from_fwnode(iommu_fwnode);
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit b46064a18810 ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 18:41:28 +01:00
if (!iommu)
return driver_deferred_probe_check_state(dev);
iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit b46064a18810 ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 18:41:28 +01:00
if (!dev->iommu && !READ_ONCE(iommu->ready))
return -EPROBE_DEFER;
if (fwspec)
iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit b46064a18810 ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 18:41:28 +01:00
return iommu->ops == iommu_fwspec_ops(fwspec) ? 0 : -EINVAL;
if (!dev_iommu_get(dev))
return -ENOMEM;
/* Preallocate for the overwhelmingly common case of 1 ID */
fwspec = kzalloc(struct_size(fwspec, ids, 1), GFP_KERNEL);
if (!fwspec)
return -ENOMEM;
fwnode_handle_get(iommu_fwnode);
fwspec->iommu_fwnode = iommu_fwnode;
dev_iommu_fwspec_set(dev, fwspec);
return 0;
}
EXPORT_SYMBOL_GPL(iommu_fwspec_init);
void iommu_fwspec_free(struct device *dev)
{
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
if (fwspec) {
fwnode_handle_put(fwspec->iommu_fwnode);
kfree(fwspec);
dev_iommu_fwspec_set(dev, NULL);
}
}
int iommu_fwspec_add_ids(struct device *dev, const u32 *ids, int num_ids)
{
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
int i, new_num;
if (!fwspec)
return -EINVAL;
new_num = fwspec->num_ids + num_ids;
if (new_num > 1) {
fwspec = krealloc(fwspec, struct_size(fwspec, ids, new_num),
GFP_KERNEL);
if (!fwspec)
return -ENOMEM;
dev_iommu_fwspec_set(dev, fwspec);
}
for (i = 0; i < num_ids; i++)
fwspec->ids[fwspec->num_ids + i] = ids[i];
fwspec->num_ids = new_num;
return 0;
}
EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
iommu: Add APIs for multiple domains per device Sharing a physical PCI device in a finer-granularity way is becoming a consensus in the industry. IOMMU vendors are also engaging efforts to support such sharing as well as possible. Among the efforts, the capability of support finer-granularity DMA isolation is a common requirement due to the security consideration. With finer-granularity DMA isolation, subsets of a PCI function can be isolated from each others by the IOMMU. As a result, there is a request in software to attach multiple domains to a physical PCI device. One example of such use model is the Intel Scalable IOV [1] [2]. The Intel vt-d 3.0 spec [3] introduces the scalable mode which enables PASID granularity DMA isolation. This adds the APIs to support multiple domains per device. In order to ease the discussions, we call it 'a domain in auxiliary mode' or simply 'auxiliary domain' when multiple domains are attached to a physical device. The APIs include: * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX) - Detect both IOMMU and PCI endpoint devices supporting the feature (aux-domain here) without the host driver dependency. * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX) - Check the enabling status of the feature (aux-domain here). The aux-domain interfaces are available only if this returns true. * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX) - Enable/disable device specific aux-domain feature. * iommu_aux_attach_device(domain, dev) - Attaches @domain to @dev in the auxiliary mode. Multiple domains could be attached to a single device in the auxiliary mode with each domain representing an isolated address space for an assignable subset of the device. * iommu_aux_detach_device(domain, dev) - Detach @domain which has been attached to @dev in the auxiliary mode. * iommu_aux_get_pasid(domain, dev) - Return ID used for finer-granularity DMA translation. For the Intel Scalable IOV usage model, this will be a PASID. The device which supports Scalable IOV needs to write this ID to the device register so that DMA requests could be tagged with a right PASID prefix. This has been updated with the latest proposal from Joerg posted here [5]. Many people involved in discussions of this design. Kevin Tian <kevin.tian@intel.com> Liu Yi L <yi.l.liu@intel.com> Ashok Raj <ashok.raj@intel.com> Sanjay Kumar <sanjay.k.kumar@intel.com> Jacob Pan <jacob.jun.pan@linux.intel.com> Alex Williamson <alex.williamson@redhat.com> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Joerg Roedel <joro@8bytes.org> and some discussions can be found here [4] [5]. [1] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification [2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf [3] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification [4] https://lkml.org/lkml/2018/7/26/4 [5] https://www.spinics.net/lists/iommu/msg31874.html Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Suggested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-03-25 09:30:28 +08:00
/**
* iommu_setup_default_domain - Set the default_domain for the group
* @group: Group to change
* @target_type: Domain type to set as the default_domain
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
*
* Allocate a default domain and set it as the current domain on the group. If
* the group already has a default domain it will be changed to the target_type.
* When target_type is 0 the default domain is selected based on driver and
* system preferences.
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
*/
static int iommu_setup_default_domain(struct iommu_group *group,
int target_type)
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
{
struct iommu_domain *old_dom = group->default_domain;
struct group_device *gdev;
struct iommu_domain *dom;
bool direct_failed;
int req_type;
int ret;
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
lockdep_assert_held(&group->mutex);
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
req_type = iommu_get_default_domain_type(group, target_type);
if (req_type < 0)
return -EINVAL;
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
dom = iommu_group_alloc_default_domain(group, req_type);
if (IS_ERR(dom))
return PTR_ERR(dom);
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
if (group->default_domain == dom)
return 0;
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
if (iommu_is_dma_domain(dom)) {
ret = iommu_get_dma_cookie(dom);
if (ret) {
iommu_domain_free(dom);
return ret;
}
}
/*
* IOMMU_RESV_DIRECT and IOMMU_RESV_DIRECT_RELAXABLE regions must be
* mapped before their device is attached, in order to guarantee
* continuity with any FW activity
*/
direct_failed = false;
for_each_group_device(group, gdev) {
if (iommu_create_device_direct_mappings(dom, gdev->dev)) {
direct_failed = true;
dev_warn_once(
gdev->dev->iommu->iommu_dev->dev,
"IOMMU driver was not able to establish FW requested direct mapping.");
}
}
/* We must set default_domain early for __iommu_device_set_domain */
group->default_domain = dom;
if (!group->domain) {
/*
* Drivers are not allowed to fail the first domain attach.
* The only way to recover from this is to fail attaching the
* iommu driver and call ops->release_device. Put the domain
* in group->default_domain so it is freed after.
*/
ret = __iommu_group_set_domain_internal(
group, dom, IOMMU_SET_DOMAIN_MUST_SUCCEED);
if (WARN_ON(ret))
iommu: Fix crash during syfs iommu_groups/N/type The err_restore_domain flow was accidently inserted into the success path in commit 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM"). It should only happen if iommu_create_device_direct_mappings() fails. This caused the domains the be wrongly changed and freed whenever the sysfs is used, resulting in an oops: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 3417 Comm: avocado Not tainted 6.4.0-rc4-next-20230602 #3 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:__iommu_attach_device+0xc/0xa0 Code: c0 c3 cc cc cc cc 48 89 f0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 8b 47 08 <48> 8b 00 48 85 c0 74 74 48 89 f5 e8 64 12 49 00 41 89 c4 85 c0 74 RSP: 0018:ffffabae0220bd48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9ac04f70e410 RCX: 0000000000000001 RDX: ffff9ac044db20c0 RSI: ffff9ac044fa50d0 RDI: ffff9ac04f70e410 RBP: ffff9ac044fa50d0 R08: 1000000100209001 R09: 00000000000002dc R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ac043d54700 R13: ffff9ac043d54700 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f02e30ae000(0000) GS:ffff9afeb2440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000012afca006 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x150 ? __iommu_queue_command_sync+0x80/0xc0 ? exc_page_fault+0x69/0x150 ? asm_exc_page_fault+0x26/0x30 ? __iommu_attach_device+0xc/0xa0 ? __iommu_attach_device+0x1c/0xa0 __iommu_device_set_domain+0x42/0x80 __iommu_group_set_domain_internal+0x5d/0x160 iommu_setup_default_domain+0x318/0x400 iommu_group_store_type+0xb1/0x200 kernfs_fop_write_iter+0x12f/0x1c0 vfs_write+0x2a2/0x3b0 ksys_write+0x63/0xe0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f02e2f14a6f Reorganize the error flow so that the success branch and error branches are clearer. Fixes: 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM") Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-5bd8cc969d9e+1f1-iommu_set_def_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-26 12:13:11 -03:00
goto out_free_old;
} else {
ret = __iommu_group_set_domain(group, dom);
iommu: Fix crash during syfs iommu_groups/N/type The err_restore_domain flow was accidently inserted into the success path in commit 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM"). It should only happen if iommu_create_device_direct_mappings() fails. This caused the domains the be wrongly changed and freed whenever the sysfs is used, resulting in an oops: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 3417 Comm: avocado Not tainted 6.4.0-rc4-next-20230602 #3 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:__iommu_attach_device+0xc/0xa0 Code: c0 c3 cc cc cc cc 48 89 f0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 8b 47 08 <48> 8b 00 48 85 c0 74 74 48 89 f5 e8 64 12 49 00 41 89 c4 85 c0 74 RSP: 0018:ffffabae0220bd48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9ac04f70e410 RCX: 0000000000000001 RDX: ffff9ac044db20c0 RSI: ffff9ac044fa50d0 RDI: ffff9ac04f70e410 RBP: ffff9ac044fa50d0 R08: 1000000100209001 R09: 00000000000002dc R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ac043d54700 R13: ffff9ac043d54700 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f02e30ae000(0000) GS:ffff9afeb2440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000012afca006 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x150 ? __iommu_queue_command_sync+0x80/0xc0 ? exc_page_fault+0x69/0x150 ? asm_exc_page_fault+0x26/0x30 ? __iommu_attach_device+0xc/0xa0 ? __iommu_attach_device+0x1c/0xa0 __iommu_device_set_domain+0x42/0x80 __iommu_group_set_domain_internal+0x5d/0x160 iommu_setup_default_domain+0x318/0x400 iommu_group_store_type+0xb1/0x200 kernfs_fop_write_iter+0x12f/0x1c0 vfs_write+0x2a2/0x3b0 ksys_write+0x63/0xe0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f02e2f14a6f Reorganize the error flow so that the success branch and error branches are clearer. Fixes: 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM") Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-5bd8cc969d9e+1f1-iommu_set_def_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-26 12:13:11 -03:00
if (ret)
goto err_restore_def_domain;
}
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
/*
* Drivers are supposed to allow mappings to be installed in a domain
* before device attachment, but some don't. Hack around this defect by
* trying again after attaching. If this happens it means the device
* will not continuously have the IOMMU_RESV_DIRECT map.
*/
if (direct_failed) {
for_each_group_device(group, gdev) {
ret = iommu_create_device_direct_mappings(dom, gdev->dev);
if (ret)
iommu: Fix crash during syfs iommu_groups/N/type The err_restore_domain flow was accidently inserted into the success path in commit 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM"). It should only happen if iommu_create_device_direct_mappings() fails. This caused the domains the be wrongly changed and freed whenever the sysfs is used, resulting in an oops: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 3417 Comm: avocado Not tainted 6.4.0-rc4-next-20230602 #3 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:__iommu_attach_device+0xc/0xa0 Code: c0 c3 cc cc cc cc 48 89 f0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 8b 47 08 <48> 8b 00 48 85 c0 74 74 48 89 f5 e8 64 12 49 00 41 89 c4 85 c0 74 RSP: 0018:ffffabae0220bd48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9ac04f70e410 RCX: 0000000000000001 RDX: ffff9ac044db20c0 RSI: ffff9ac044fa50d0 RDI: ffff9ac04f70e410 RBP: ffff9ac044fa50d0 R08: 1000000100209001 R09: 00000000000002dc R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ac043d54700 R13: ffff9ac043d54700 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f02e30ae000(0000) GS:ffff9afeb2440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000012afca006 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x150 ? __iommu_queue_command_sync+0x80/0xc0 ? exc_page_fault+0x69/0x150 ? asm_exc_page_fault+0x26/0x30 ? __iommu_attach_device+0xc/0xa0 ? __iommu_attach_device+0x1c/0xa0 __iommu_device_set_domain+0x42/0x80 __iommu_group_set_domain_internal+0x5d/0x160 iommu_setup_default_domain+0x318/0x400 iommu_group_store_type+0xb1/0x200 kernfs_fop_write_iter+0x12f/0x1c0 vfs_write+0x2a2/0x3b0 ksys_write+0x63/0xe0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f02e2f14a6f Reorganize the error flow so that the success branch and error branches are clearer. Fixes: 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM") Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-5bd8cc969d9e+1f1-iommu_set_def_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-26 12:13:11 -03:00
goto err_restore_domain;
}
}
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
iommu: Fix crash during syfs iommu_groups/N/type The err_restore_domain flow was accidently inserted into the success path in commit 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM"). It should only happen if iommu_create_device_direct_mappings() fails. This caused the domains the be wrongly changed and freed whenever the sysfs is used, resulting in an oops: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 3417 Comm: avocado Not tainted 6.4.0-rc4-next-20230602 #3 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:__iommu_attach_device+0xc/0xa0 Code: c0 c3 cc cc cc cc 48 89 f0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 8b 47 08 <48> 8b 00 48 85 c0 74 74 48 89 f5 e8 64 12 49 00 41 89 c4 85 c0 74 RSP: 0018:ffffabae0220bd48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9ac04f70e410 RCX: 0000000000000001 RDX: ffff9ac044db20c0 RSI: ffff9ac044fa50d0 RDI: ffff9ac04f70e410 RBP: ffff9ac044fa50d0 R08: 1000000100209001 R09: 00000000000002dc R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ac043d54700 R13: ffff9ac043d54700 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f02e30ae000(0000) GS:ffff9afeb2440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000012afca006 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x150 ? __iommu_queue_command_sync+0x80/0xc0 ? exc_page_fault+0x69/0x150 ? asm_exc_page_fault+0x26/0x30 ? __iommu_attach_device+0xc/0xa0 ? __iommu_attach_device+0x1c/0xa0 __iommu_device_set_domain+0x42/0x80 __iommu_group_set_domain_internal+0x5d/0x160 iommu_setup_default_domain+0x318/0x400 iommu_group_store_type+0xb1/0x200 kernfs_fop_write_iter+0x12f/0x1c0 vfs_write+0x2a2/0x3b0 ksys_write+0x63/0xe0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f02e2f14a6f Reorganize the error flow so that the success branch and error branches are clearer. Fixes: 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM") Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-5bd8cc969d9e+1f1-iommu_set_def_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-26 12:13:11 -03:00
out_free_old:
if (old_dom)
iommu_domain_free(old_dom);
return ret;
err_restore_domain:
if (old_dom)
__iommu_group_set_domain_internal(
group, old_dom, IOMMU_SET_DOMAIN_MUST_SUCCEED);
iommu: Fix crash during syfs iommu_groups/N/type The err_restore_domain flow was accidently inserted into the success path in commit 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM"). It should only happen if iommu_create_device_direct_mappings() fails. This caused the domains the be wrongly changed and freed whenever the sysfs is used, resulting in an oops: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 3417 Comm: avocado Not tainted 6.4.0-rc4-next-20230602 #3 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:__iommu_attach_device+0xc/0xa0 Code: c0 c3 cc cc cc cc 48 89 f0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 8b 47 08 <48> 8b 00 48 85 c0 74 74 48 89 f5 e8 64 12 49 00 41 89 c4 85 c0 74 RSP: 0018:ffffabae0220bd48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9ac04f70e410 RCX: 0000000000000001 RDX: ffff9ac044db20c0 RSI: ffff9ac044fa50d0 RDI: ffff9ac04f70e410 RBP: ffff9ac044fa50d0 R08: 1000000100209001 R09: 00000000000002dc R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ac043d54700 R13: ffff9ac043d54700 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f02e30ae000(0000) GS:ffff9afeb2440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000012afca006 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x150 ? __iommu_queue_command_sync+0x80/0xc0 ? exc_page_fault+0x69/0x150 ? asm_exc_page_fault+0x26/0x30 ? __iommu_attach_device+0xc/0xa0 ? __iommu_attach_device+0x1c/0xa0 __iommu_device_set_domain+0x42/0x80 __iommu_group_set_domain_internal+0x5d/0x160 iommu_setup_default_domain+0x318/0x400 iommu_group_store_type+0xb1/0x200 kernfs_fop_write_iter+0x12f/0x1c0 vfs_write+0x2a2/0x3b0 ksys_write+0x63/0xe0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f02e2f14a6f Reorganize the error flow so that the success branch and error branches are clearer. Fixes: 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM") Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-5bd8cc969d9e+1f1-iommu_set_def_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-26 12:13:11 -03:00
err_restore_def_domain:
if (old_dom) {
iommu_domain_free(dom);
iommu: Fix crash during syfs iommu_groups/N/type The err_restore_domain flow was accidently inserted into the success path in commit 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM"). It should only happen if iommu_create_device_direct_mappings() fails. This caused the domains the be wrongly changed and freed whenever the sysfs is used, resulting in an oops: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 3417 Comm: avocado Not tainted 6.4.0-rc4-next-20230602 #3 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:__iommu_attach_device+0xc/0xa0 Code: c0 c3 cc cc cc cc 48 89 f0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 8b 47 08 <48> 8b 00 48 85 c0 74 74 48 89 f5 e8 64 12 49 00 41 89 c4 85 c0 74 RSP: 0018:ffffabae0220bd48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9ac04f70e410 RCX: 0000000000000001 RDX: ffff9ac044db20c0 RSI: ffff9ac044fa50d0 RDI: ffff9ac04f70e410 RBP: ffff9ac044fa50d0 R08: 1000000100209001 R09: 00000000000002dc R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ac043d54700 R13: ffff9ac043d54700 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f02e30ae000(0000) GS:ffff9afeb2440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000012afca006 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x150 ? __iommu_queue_command_sync+0x80/0xc0 ? exc_page_fault+0x69/0x150 ? asm_exc_page_fault+0x26/0x30 ? __iommu_attach_device+0xc/0xa0 ? __iommu_attach_device+0x1c/0xa0 __iommu_device_set_domain+0x42/0x80 __iommu_group_set_domain_internal+0x5d/0x160 iommu_setup_default_domain+0x318/0x400 iommu_group_store_type+0xb1/0x200 kernfs_fop_write_iter+0x12f/0x1c0 vfs_write+0x2a2/0x3b0 ksys_write+0x63/0xe0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f02e2f14a6f Reorganize the error flow so that the success branch and error branches are clearer. Fixes: 1000dccd5d13 ("iommu: Allow IOMMU_RESV_DIRECT to work on ARM") Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-5bd8cc969d9e+1f1-iommu_set_def_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-26 12:13:11 -03:00
group->default_domain = old_dom;
}
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
return ret;
}
/*
* Changing the default domain through sysfs requires the users to unbind the
* drivers from the devices in the iommu group, except for a DMA -> DMA-FQ
* transition. Return failure if this isn't met.
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
*
* We need to consider the race between this and the device release path.
* group->mutex is used here to guarantee that the device release path
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
* will not be entered at the same time.
*/
static ssize_t iommu_group_store_type(struct iommu_group *group,
const char *buf, size_t count)
{
struct group_device *gdev;
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
int ret, req_type;
if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
return -EACCES;
if (WARN_ON(!group) || !group->default_domain)
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
return -EINVAL;
if (sysfs_streq(buf, "identity"))
req_type = IOMMU_DOMAIN_IDENTITY;
else if (sysfs_streq(buf, "DMA"))
req_type = IOMMU_DOMAIN_DMA;
else if (sysfs_streq(buf, "DMA-FQ"))
req_type = IOMMU_DOMAIN_DMA_FQ;
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
else if (sysfs_streq(buf, "auto"))
req_type = 0;
else
return -EINVAL;
mutex_lock(&group->mutex);
/* We can bring up a flush queue without tearing down the domain. */
if (req_type == IOMMU_DOMAIN_DMA_FQ &&
group->default_domain->type == IOMMU_DOMAIN_DMA) {
ret = iommu_dma_init_fq(group->default_domain);
if (ret)
goto out_unlock;
group->default_domain->type = IOMMU_DOMAIN_DMA_FQ;
ret = count;
goto out_unlock;
}
/* Otherwise, ensure that device exists and no driver is bound. */
if (list_empty(&group->devices) || group->owner_cnt) {
ret = -EPERM;
goto out_unlock;
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
}
ret = iommu_setup_default_domain(group, req_type);
if (ret)
goto out_unlock;
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
/* Make sure dma_ops is appropriatley set */
for_each_group_device(group, gdev)
iommu/dma: Centralise iommu_setup_dma_ops() It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only ever call iommu_setup_dma_ops() after a successful iommu_probe_device(), which means there should be no harm in achieving the same order of operations by running it off the back of iommu_probe_device() itself. This then puts it in line with the x86 and s390 .probe_finalize bodges, letting us pull it all into the main flow properly. As a bonus this lets us fold in and de-scope the PCI workaround setup as well. At this point we can also then pull the call up inside the group mutex, and avoid having to think about whether iommu_group_store_type() could theoretically race and free the domain if iommu_setup_dma_ops() ran just *before* iommu_device_use_default_domain() claims it... Furthermore we replace one .probe_finalize call completely, since the only remaining implementations are now one which only needs to run once for the initial boot-time probe, and two which themselves render that path unreachable. This leaves us a big step closer to realistically being able to unpick the variety of different things that iommu_setup_dma_ops() has been muddling together, and further streamline iommu-dma into core API flows in future. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/bebea331c1d688b34d9862eefd5ede47503961b8.1713523152.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-19 17:54:45 +01:00
iommu_setup_dma_ops(gdev->dev);
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
out_unlock:
mutex_unlock(&group->mutex);
return ret ?: count;
iommu: Add support to change default domain of an iommu group Presently, the default domain of an iommu group is allocated during boot time and it cannot be changed later. So, the device would typically be either in identity (also known as pass_through) mode or the device would be in DMA mode as long as the machine is up and running. There is no way to change the default domain type dynamically i.e. after booting, a device cannot switch between identity mode and DMA mode. But, assume a use case wherein the user trusts the device and believes that the OS is secure enough and hence wants *only* this device to bypass IOMMU (so that it could be high performing) whereas all the other devices to go through IOMMU (so that the system is protected). Presently, this use case is not supported. It will be helpful if there is some way to change the default domain of an iommu group dynamically. Hence, add such support. A privileged user could request the kernel to change the default domain type of a iommu group by writing to "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values are supported 1. identity: all the DMA transactions from the device in this group are *not* translated by the iommu 2. DMA: all the DMA transactions from the device in this group are translated by the iommu 3. auto: change to the type the device was booted with Note: 1. Default domain of an iommu group with two or more devices cannot be changed. 2. The device in the iommu group shouldn't be bound to any driver. 3. The device shouldn't be assigned to user for direct access. 4. The change request will fail if any device in the group has a mandatory default domain type and the requested one conflicts with that. Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more information. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-24 21:06:02 +08:00
}
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
/**
* iommu_device_use_default_domain() - Device driver wants to handle device
* DMA through the kernel DMA API.
* @dev: The device.
*
* The device driver about to bind @dev wants to do DMA through the kernel
* DMA API. Return 0 if it is allowed, otherwise an error.
*/
int iommu_device_use_default_domain(struct device *dev)
{
/* Caller is the driver core during the pre-probe path */
struct iommu_group *group = dev->iommu_group;
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
int ret = 0;
if (!group)
return 0;
mutex_lock(&group->mutex);
2025-02-28 15:46:30 +00:00
/* We may race against bus_iommu_probe() finalising groups here */
if (!group->default_domain) {
ret = -EPROBE_DEFER;
goto unlock_out;
}
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
if (group->owner_cnt) {
if (group->domain != group->default_domain || group->owner ||
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
!xa_empty(&group->pasid_array)) {
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
ret = -EBUSY;
goto unlock_out;
}
}
group->owner_cnt++;
unlock_out:
mutex_unlock(&group->mutex);
return ret;
}
/**
* iommu_device_unuse_default_domain() - Device driver stops handling device
* DMA through the kernel DMA API.
* @dev: The device.
*
* The device driver doesn't want to do DMA through kernel DMA API anymore.
* It must be called after iommu_device_use_default_domain().
*/
void iommu_device_unuse_default_domain(struct device *dev)
{
/* Caller is the driver core during the post-probe path */
struct iommu_group *group = dev->iommu_group;
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
if (!group)
return;
mutex_lock(&group->mutex);
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
if (!WARN_ON(!group->owner_cnt || !xa_empty(&group->pasid_array)))
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
group->owner_cnt--;
mutex_unlock(&group->mutex);
}
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
static int __iommu_group_alloc_blocking_domain(struct iommu_group *group)
{
struct device *dev = iommu_group_first_dev(group);
const struct iommu_ops *ops = dev_iommu_ops(dev);
struct iommu_domain *domain;
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
if (group->blocking_domain)
return 0;
if (ops->blocked_domain) {
group->blocking_domain = ops->blocked_domain;
return 0;
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
}
/*
* For drivers that do not yet understand IOMMU_DOMAIN_BLOCKED create an
* empty PAGING domain instead.
*/
domain = iommu_paging_domain_alloc(dev);
if (IS_ERR(domain))
return PTR_ERR(domain);
group->blocking_domain = domain;
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
return 0;
}
static int __iommu_take_dma_ownership(struct iommu_group *group, void *owner)
{
int ret;
if ((group->domain && group->domain != group->default_domain) ||
!xa_empty(&group->pasid_array))
return -EBUSY;
ret = __iommu_group_alloc_blocking_domain(group);
if (ret)
return ret;
ret = __iommu_group_set_domain(group, group->blocking_domain);
if (ret)
return ret;
group->owner = owner;
group->owner_cnt++;
return 0;
}
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
/**
* iommu_group_claim_dma_owner() - Set DMA ownership of a group
* @group: The group.
* @owner: Caller specified pointer. Used for exclusive ownership.
*
* This is to support backward compatibility for vfio which manages the dma
* ownership in iommu_group level. New invocations on this interface should be
* prohibited. Only a single owner may exist for a group.
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
*/
int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
{
int ret = 0;
if (WARN_ON(!owner))
return -EINVAL;
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
mutex_lock(&group->mutex);
if (group->owner_cnt) {
ret = -EPERM;
goto unlock_out;
}
ret = __iommu_take_dma_ownership(group, owner);
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
unlock_out:
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_group_claim_dma_owner);
/**
* iommu_device_claim_dma_owner() - Set DMA ownership of a device
* @dev: The device.
* @owner: Caller specified pointer. Used for exclusive ownership.
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
*
* Claim the DMA ownership of a device. Multiple devices in the same group may
* concurrently claim ownership if they present the same owner value. Returns 0
* on success and error code on failure
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
*/
int iommu_device_claim_dma_owner(struct device *dev, void *owner)
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
int ret = 0;
if (WARN_ON(!owner))
return -EINVAL;
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
if (!group)
return -ENODEV;
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
mutex_lock(&group->mutex);
if (group->owner_cnt) {
if (group->owner != owner) {
ret = -EPERM;
goto unlock_out;
}
group->owner_cnt++;
goto unlock_out;
}
ret = __iommu_take_dma_ownership(group, owner);
unlock_out:
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_device_claim_dma_owner);
static void __iommu_release_dma_ownership(struct iommu_group *group)
{
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
if (WARN_ON(!group->owner_cnt || !group->owner ||
!xa_empty(&group->pasid_array)))
return;
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
group->owner_cnt = 0;
group->owner = NULL;
iommu: Make __iommu_group_set_domain() handle error unwind Let's try to have a consistent and clear strategy for error handling during domain attach failures. There are two broad categories, the first is callers doing destruction and trying to set the domain back to a previously good domain. These cases cannot handle failure during destruction flows and must succeed, or at least avoid a UAF on the current group->domain which is likely about to be freed. Many of the drivers are well behaved here and will not hit the WARN_ON's or a UAF, but some are doing hypercalls/etc that can fail unpredictably and don't meet the expectations. The second case is attaching a domain for the first time in a failable context, failure should restore the attachment back to group->domain using the above unfailable operation. Have __iommu_group_set_domain_internal() execute a common algorithm that tries to achieve this, and in the worst case, would leave a device "detached" or assigned to a global blocking domain. This relies on some existing common driver behaviors where attach failure will also do detatch and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever fail. Name the first case with __iommu_group_set_domain_nofail() to make it clear. Pull all the error handling and WARN_ON generation into __iommu_group_set_domain_internal(). Avoid the obfuscating use of __iommu_group_for_each_dev() and be more careful about what should happen during failures by only touching devices we've already touched. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-11 01:42:01 -03:00
__iommu_group_set_domain_nofail(group, group->default_domain);
}
iommu: iommu_group_claim_dma_owner() must always assign a domain Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-09 13:19:19 -03:00
/**
* iommu_group_release_dma_owner() - Release DMA ownership of a group
* @group: The group
*
* Release the DMA ownership claimed by iommu_group_claim_dma_owner().
*/
void iommu_group_release_dma_owner(struct iommu_group *group)
{
mutex_lock(&group->mutex);
__iommu_release_dma_ownership(group);
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
mutex_unlock(&group->mutex);
}
EXPORT_SYMBOL_GPL(iommu_group_release_dma_owner);
/**
* iommu_device_release_dma_owner() - Release DMA ownership of a device
* @dev: The device.
*
* Release the DMA ownership claimed by iommu_device_claim_dma_owner().
*/
void iommu_device_release_dma_owner(struct device *dev)
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
mutex_lock(&group->mutex);
if (group->owner_cnt > 1)
group->owner_cnt--;
else
__iommu_release_dma_ownership(group);
mutex_unlock(&group->mutex);
}
EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:50 +08:00
/**
* iommu_group_dma_owner_claimed() - Query group dma ownership status
* @group: The group.
*
* This provides status query on a given group. It is racy and only for
* non-binding status reporting.
*/
bool iommu_group_dma_owner_claimed(struct iommu_group *group)
{
unsigned int user;
mutex_lock(&group->mutex);
user = group->owner_cnt;
mutex_unlock(&group->mutex);
return user;
}
EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
static void iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid,
struct iommu_domain *domain)
{
const struct iommu_ops *ops = dev_iommu_ops(dev);
struct iommu_domain *blocked_domain = ops->blocked_domain;
WARN_ON(blocked_domain->ops->set_dev_pasid(blocked_domain,
dev, pasid, domain));
}
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
static int __iommu_set_group_pasid(struct iommu_domain *domain,
struct iommu_group *group, ioasid_t pasid,
struct iommu_domain *old)
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
{
struct group_device *device, *last_gdev;
int ret;
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
for_each_group_device(group, device) {
2025-05-19 18:19:37 -07:00
if (device->dev->iommu->max_pasids > 0) {
ret = domain->ops->set_dev_pasid(domain, device->dev,
pasid, old);
if (ret)
goto err_revert;
}
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
}
return 0;
err_revert:
last_gdev = device;
for_each_group_device(group, device) {
if (device == last_gdev)
break;
2025-05-19 18:19:37 -07:00
if (device->dev->iommu->max_pasids > 0) {
/*
* If no old domain, undo the succeeded devices/pasid.
* Otherwise, rollback the succeeded devices/pasid to
* the old domain. And it is a driver bug to fail
* attaching with a previously good domain.
*/
if (!old ||
WARN_ON(old->ops->set_dev_pasid(old, device->dev,
pasid, domain)))
2025-05-19 18:19:37 -07:00
iommu_remove_dev_pasid(device->dev, pasid, domain);
}
}
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
return ret;
}
static void __iommu_remove_group_pasid(struct iommu_group *group,
ioasid_t pasid,
struct iommu_domain *domain)
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
{
struct group_device *device;
2025-05-19 18:19:37 -07:00
for_each_group_device(group, device) {
if (device->dev->iommu->max_pasids > 0)
iommu_remove_dev_pasid(device->dev, pasid, domain);
}
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
}
/*
* iommu_attach_device_pasid() - Attach a domain to pasid of device
* @domain: the iommu domain.
* @dev: the attached device.
* @pasid: the pasid of the device.
* @handle: the attach handle.
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
*
* Caller should always provide a new handle to avoid race with the paths
* that have lockless reference to handle if it intends to pass a valid handle.
*
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
* Return: 0 on success, or an error.
*/
int iommu_attach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid,
struct iommu_attach_handle *handle)
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
struct group_device *device;
const struct iommu_ops *ops;
void *entry;
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
int ret;
if (!group)
return -ENODEV;
ops = dev_iommu_ops(dev);
if (!domain->ops->set_dev_pasid ||
!ops->blocked_domain ||
!ops->blocked_domain->ops->set_dev_pasid)
return -EOPNOTSUPP;
iommu: Allow attaching static domains in iommu_attach_device_pasid() The idxd driver attaches the default domain to a PASID of the device to perform kernel DMA using that PASID. The domain is attached to the device's PASID through iommu_attach_device_pasid(), which checks if the domain->owner matches the iommu_ops retrieved from the device. If they do not match, it returns a failure. if (ops != domain->owner || pasid == IOMMU_NO_PASID) return -EINVAL; The static identity domain implemented by the intel iommu driver doesn't specify the domain owner. Therefore, kernel DMA with PASID doesn't work for the idxd driver if the device translation mode is set to passthrough. Generally the owner field of static domains are not set because they are already part of iommu ops. Add a helper domain_iommu_ops_compatible() that checks if a domain is compatible with the device's iommu ops. This helper explicitly allows the static blocked and identity domains associated with the device's iommu_ops to be considered compatible. Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220031 Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/20250422191554.GC1213339@ziepe.ca/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250424034123.2311362-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 11:41:23 +08:00
if (!domain_iommu_ops_compatible(ops, domain) ||
pasid == IOMMU_NO_PASID)
return -EINVAL;
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
mutex_lock(&group->mutex);
for_each_group_device(group, device) {
2025-05-19 18:19:37 -07:00
/*
* Skip PASID validation for devices without PASID support
* (max_pasids = 0). These devices cannot issue transactions
* with PASID, so they don't affect group's PASID usage.
*/
if ((device->dev->iommu->max_pasids > 0) &&
(pasid >= device->dev->iommu->max_pasids)) {
ret = -EINVAL;
goto out_unlock;
}
}
entry = iommu_make_pasid_array_entry(domain, handle);
/*
* Entry present is a failure case. Use xa_insert() instead of
* xa_reserve().
*/
ret = xa_insert(&group->pasid_array, pasid, XA_ZERO_ENTRY, GFP_KERNEL);
if (ret)
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
goto out_unlock;
ret = __iommu_set_group_pasid(domain, group, pasid, NULL);
if (ret) {
xa_release(&group->pasid_array, pasid);
goto out_unlock;
}
/*
* The xa_insert() above reserved the memory, and the group->mutex is
* held, this cannot fail. The new domain cannot be visible until the
* operation succeeds as we cannot tolerate PRIs becoming concurrently
* queued and then failing attach.
*/
WARN_ON(xa_is_err(xa_store(&group->pasid_array,
pasid, entry, GFP_KERNEL)));
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
out_unlock:
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
/**
* iommu_replace_device_pasid - Replace the domain that a specific pasid
* of the device is attached to
* @domain: the new iommu domain
* @dev: the attached device.
* @pasid: the pasid of the device.
* @handle: the attach handle.
*
* This API allows the pasid to switch domains. The @pasid should have been
* attached. Otherwise, this fails. The pasid will keep the old configuration
* if replacement failed.
*
* Caller should always provide a new handle to avoid race with the paths
* that have lockless reference to handle if it intends to pass a valid handle.
*
* Return 0 on success, or an error.
*/
int iommu_replace_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid,
struct iommu_attach_handle *handle)
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
struct iommu_attach_handle *entry;
struct iommu_domain *curr_domain;
void *curr;
int ret;
if (!group)
return -ENODEV;
if (!domain->ops->set_dev_pasid)
return -EOPNOTSUPP;
iommu: Allow attaching static domains in iommu_attach_device_pasid() The idxd driver attaches the default domain to a PASID of the device to perform kernel DMA using that PASID. The domain is attached to the device's PASID through iommu_attach_device_pasid(), which checks if the domain->owner matches the iommu_ops retrieved from the device. If they do not match, it returns a failure. if (ops != domain->owner || pasid == IOMMU_NO_PASID) return -EINVAL; The static identity domain implemented by the intel iommu driver doesn't specify the domain owner. Therefore, kernel DMA with PASID doesn't work for the idxd driver if the device translation mode is set to passthrough. Generally the owner field of static domains are not set because they are already part of iommu ops. Add a helper domain_iommu_ops_compatible() that checks if a domain is compatible with the device's iommu ops. This helper explicitly allows the static blocked and identity domains associated with the device's iommu_ops to be considered compatible. Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220031 Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/20250422191554.GC1213339@ziepe.ca/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250424034123.2311362-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24 11:41:23 +08:00
if (!domain_iommu_ops_compatible(dev_iommu_ops(dev), domain) ||
pasid == IOMMU_NO_PASID || !handle)
return -EINVAL;
mutex_lock(&group->mutex);
entry = iommu_make_pasid_array_entry(domain, handle);
curr = xa_cmpxchg(&group->pasid_array, pasid, NULL,
XA_ZERO_ENTRY, GFP_KERNEL);
if (xa_is_err(curr)) {
ret = xa_err(curr);
goto out_unlock;
}
/*
* No domain (with or without handle) attached, hence not
* a replace case.
*/
if (!curr) {
xa_release(&group->pasid_array, pasid);
ret = -EINVAL;
goto out_unlock;
}
/*
* Reusing handle is problematic as there are paths that refers
* the handle without lock. To avoid race, reject the callers that
* attempt it.
*/
if (curr == entry) {
WARN_ON(1);
ret = -EINVAL;
goto out_unlock;
}
curr_domain = pasid_array_entry_to_domain(curr);
ret = 0;
if (curr_domain != domain) {
ret = __iommu_set_group_pasid(domain, group,
pasid, curr_domain);
if (ret)
goto out_unlock;
}
/*
* The above xa_cmpxchg() reserved the memory, and the
* group->mutex is held, this cannot fail.
*/
WARN_ON(xa_is_err(xa_store(&group->pasid_array,
pasid, entry, GFP_KERNEL)));
out_unlock:
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, "IOMMUFD_INTERNAL");
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
/*
* iommu_detach_device_pasid() - Detach the domain from pasid of device
* @domain: the iommu domain.
* @dev: the attached device.
* @pasid: the pasid of the device.
*
* The @domain must have been attached to @pasid of the @dev with
* iommu_attach_device_pasid().
*/
void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
ioasid_t pasid)
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
mutex_lock(&group->mutex);
__iommu_remove_group_pasid(group, pasid, domain);
xa_erase(&group->pasid_array, pasid);
iommu: Add attach/detach_dev_pasid iommu interfaces Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-31 08:59:09 +08:00
mutex_unlock(&group->mutex);
}
EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
ioasid_t iommu_alloc_global_pasid(struct device *dev)
{
int ret;
/* max_pasids == 0 means that the device does not support PASID */
if (!dev->iommu->max_pasids)
return IOMMU_PASID_INVALID;
/*
* max_pasids is set up by vendor driver based on number of PASID bits
* supported but the IDA allocation is inclusive.
*/
ret = ida_alloc_range(&iommu_global_pasid_ida, IOMMU_FIRST_GLOBAL_PASID,
dev->iommu->max_pasids - 1, GFP_KERNEL);
return ret < 0 ? IOMMU_PASID_INVALID : ret;
}
EXPORT_SYMBOL_GPL(iommu_alloc_global_pasid);
void iommu_free_global_pasid(ioasid_t pasid)
{
if (WARN_ON(pasid == IOMMU_PASID_INVALID))
return;
ida_free(&iommu_global_pasid_ida, pasid);
}
EXPORT_SYMBOL_GPL(iommu_free_global_pasid);
/**
* iommu_attach_handle_get - Return the attach handle
* @group: the iommu group that domain was attached to
* @pasid: the pasid within the group
* @type: matched domain type, 0 for any match
*
* Return handle or ERR_PTR(-ENOENT) on none, ERR_PTR(-EBUSY) on mismatch.
*
* Return the attach handle to the caller. The life cycle of an iommu attach
* handle is from the time when the domain is attached to the time when the
* domain is detached. Callers are required to synchronize the call of
* iommu_attach_handle_get() with domain attachment and detachment. The attach
* handle can only be used during its life cycle.
*/
struct iommu_attach_handle *
iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int type)
{
struct iommu_attach_handle *handle;
void *entry;
xa_lock(&group->pasid_array);
entry = xa_load(&group->pasid_array, pasid);
if (!entry || xa_pointer_tag(entry) != IOMMU_PASID_ARRAY_HANDLE) {
handle = ERR_PTR(-ENOENT);
} else {
handle = xa_untag_pointer(entry);
if (type && handle->domain->type != type)
handle = ERR_PTR(-EBUSY);
}
xa_unlock(&group->pasid_array);
return handle;
}
EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, "IOMMUFD_INTERNAL");
/**
* iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU group
* @domain: IOMMU domain to attach
* @group: IOMMU group that will be attached
* @handle: attach handle
*
* Returns 0 on success and error code on failure.
*
* This is a variant of iommu_attach_group(). It allows the caller to provide
* an attach handle and use it when the domain is attached. This is currently
* used by IOMMUFD to deliver the I/O page faults.
*
* Caller should always provide a new handle to avoid race with the paths
* that have lockless reference to handle.
*/
int iommu_attach_group_handle(struct iommu_domain *domain,
struct iommu_group *group,
struct iommu_attach_handle *handle)
{
void *entry;
int ret;
if (!handle)
return -EINVAL;
mutex_lock(&group->mutex);
entry = iommu_make_pasid_array_entry(domain, handle);
ret = xa_insert(&group->pasid_array,
IOMMU_NO_PASID, XA_ZERO_ENTRY, GFP_KERNEL);
if (ret)
goto out_unlock;
ret = __iommu_attach_group(domain, group);
if (ret) {
xa_release(&group->pasid_array, IOMMU_NO_PASID);
goto out_unlock;
}
/*
* The xa_insert() above reserved the memory, and the group->mutex is
* held, this cannot fail. The new domain cannot be visible until the
* operation succeeds as we cannot tolerate PRIs becoming concurrently
* queued and then failing attach.
*/
WARN_ON(xa_is_err(xa_store(&group->pasid_array,
IOMMU_NO_PASID, entry, GFP_KERNEL)));
out_unlock:
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_NS_GPL(iommu_attach_group_handle, "IOMMUFD_INTERNAL");
/**
* iommu_detach_group_handle - Detach an IOMMU domain from an IOMMU group
* @domain: IOMMU domain to attach
* @group: IOMMU group that will be attached
*
* Detach the specified IOMMU domain from the specified IOMMU group.
* It must be used in conjunction with iommu_attach_group_handle().
*/
void iommu_detach_group_handle(struct iommu_domain *domain,
struct iommu_group *group)
{
mutex_lock(&group->mutex);
__iommu_group_set_core_domain(group);
xa_erase(&group->pasid_array, IOMMU_NO_PASID);
mutex_unlock(&group->mutex);
}
EXPORT_SYMBOL_NS_GPL(iommu_detach_group_handle, "IOMMUFD_INTERNAL");
/**
* iommu_replace_group_handle - replace the domain that a group is attached to
* @group: IOMMU group that will be attached to the new domain
* @new_domain: new IOMMU domain to replace with
* @handle: attach handle
*
* This API allows the group to switch domains without being forced to go to
* the blocking domain in-between. It allows the caller to provide an attach
* handle for the new domain and use it when the domain is attached.
*
* If the currently attached domain is a core domain (e.g. a default_domain),
* it will act just like the iommu_attach_group_handle().
*
* Caller should always provide a new handle to avoid race with the paths
* that have lockless reference to handle.
*/
int iommu_replace_group_handle(struct iommu_group *group,
struct iommu_domain *new_domain,
struct iommu_attach_handle *handle)
{
void *curr, *entry;
int ret;
if (!new_domain || !handle)
return -EINVAL;
mutex_lock(&group->mutex);
entry = iommu_make_pasid_array_entry(new_domain, handle);
ret = xa_reserve(&group->pasid_array, IOMMU_NO_PASID, GFP_KERNEL);
if (ret)
goto err_unlock;
ret = __iommu_group_set_domain(group, new_domain);
if (ret)
goto err_release;
curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, entry, GFP_KERNEL);
WARN_ON(xa_is_err(curr));
mutex_unlock(&group->mutex);
return 0;
err_release:
xa_release(&group->pasid_array, IOMMU_NO_PASID);
err_unlock:
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_NS_GPL(iommu_replace_group_handle, "IOMMUFD_INTERNAL");
iommu: Make iommu_dma_prepare_msi() into a generic operation SW_MSI supports IOMMU to translate an MSI message before the MSI message is delivered to the interrupt controller. On such systems, an iommu_domain must have a translation for the MSI message for interrupts to work. The IRQ subsystem will call into IOMMU to request that a physical page be set up to receive MSI messages, and the IOMMU then sets an IOVA that maps to that physical page. Ultimately the IOVA is programmed into the device via the msi_msg. Generalize this by allowing iommu_domain owners to provide implementations of this mapping. Add a function pointer in struct iommu_domain to allow a domain owner to provide its own implementation. Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in the iommu_get_msi_cookie() path. Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain doesn't change or become freed while running. Races with IRQ operations from VFIO and domain changes from iommufd are possible here. Replace the msi_prepare_lock with a lockdep assertion for the group mutex as documentation. For the dmau_iommu.c each iommu_domain is unique to a group. Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-19 17:31:38 -08:00
#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
/**
* iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain
* @desc: MSI descriptor, will store the MSI page
* @msi_addr: MSI target address to be mapped
*
* The implementation of sw_msi() should take msi_addr and map it to
* an IOVA in the domain and call msi_desc_set_iommu_msi_iova() with the
* mapping information.
*
* Return: 0 on success or negative error code if the mapping failed.
*/
int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
{
struct device *dev = msi_desc_to_dev(desc);
struct iommu_group *group = dev->iommu_group;
int ret = 0;
if (!group)
return 0;
mutex_lock(&group->mutex);
/* An IDENTITY domain must pass through */
if (group->domain && group->domain->type != IOMMU_DOMAIN_IDENTITY) {
switch (group->domain->cookie_type) {
case IOMMU_COOKIE_DMA_MSI:
case IOMMU_COOKIE_DMA_IOVA:
ret = iommu_dma_sw_msi(group->domain, desc, msi_addr);
break;
case IOMMU_COOKIE_IOMMUFD:
ret = iommufd_sw_msi(group->domain, desc, msi_addr);
break;
default:
ret = -EOPNOTSUPP;
break;
}
}
iommu: Make iommu_dma_prepare_msi() into a generic operation SW_MSI supports IOMMU to translate an MSI message before the MSI message is delivered to the interrupt controller. On such systems, an iommu_domain must have a translation for the MSI message for interrupts to work. The IRQ subsystem will call into IOMMU to request that a physical page be set up to receive MSI messages, and the IOMMU then sets an IOVA that maps to that physical page. Ultimately the IOVA is programmed into the device via the msi_msg. Generalize this by allowing iommu_domain owners to provide implementations of this mapping. Add a function pointer in struct iommu_domain to allow a domain owner to provide its own implementation. Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in the iommu_get_msi_cookie() path. Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain doesn't change or become freed while running. Races with IRQ operations from VFIO and domain changes from iommufd are possible here. Replace the msi_prepare_lock with a lockdep assertion for the group mutex as documentation. For the dmau_iommu.c each iommu_domain is unique to a group. Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-19 17:31:38 -08:00
mutex_unlock(&group->mutex);
return ret;
}
#endif /* CONFIG_IRQ_MSI_IOMMU */