linux/drivers/vfio
Alex Williamson c1d9dac0db vfio/pci: Align huge faults to order
The vfio-pci huge_fault handler doesn't make any attempt to insert a
mapping containing the faulting address, it only inserts mappings if the
faulting address and resulting pfn are aligned.  This works in a lot of
cases, particularly in conjunction with QEMU where DMA mappings linearly
fault the mmap.  However, there are configurations where we don't get
that linear faulting and pages are faulted on-demand.

The scenario reported in the bug below is such a case, where the physical
address width of the CPU is greater than that of the IOMMU, resulting in a
VM where guest firmware has mapped device MMIO beyond the address width of
the IOMMU.  In this configuration, the MMIO is faulted on demand and
tracing indicates that occasionally the faults generate a VM_FAULT_OOM.
Given the use case, this results in a "error: kvm run failed Bad address",
killing the VM.

The host is not under memory pressure in this test, therefore it's
suspected that VM_FAULT_OOM is actually the result of a NULL return from
__pte_offset_map_lock() in the get_locked_pte() path from insert_pfn().
This suggests a potential race inserting a pte concurrent to a pmd, and
maybe indicates some deficiency in the mm layer properly handling such a
case.

Nevertheless, Peter noted the inconsistency of vfio-pci's huge_fault
handler where our mapping granularity depends on the alignment of the
faulting address relative to the order rather than aligning the faulting
address to the order to more consistently insert huge mappings.  This
change not only uses the page tables more consistently and efficiently, but
as any fault to an aligned page results in the same mapping, the race
condition suspected in the VM_FAULT_OOM is avoided.

Reported-by: Adolfo <adolfotregosa@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220057
Fixes: 09dfc8a5f2 ("vfio/pci: Fallback huge faults for unaligned pfn")
Cc: stable@vger.kernel.org
Tested-by: Adolfo <adolfotregosa@gmail.com>
Co-developed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20250502224035.3183451-1-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-06 12:59:12 -06:00
..
cdx module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
fsl-mc vfio/fsl-mc: Remove unused variable 'hwirq' 2024-09-03 08:42:06 -06:00
mdev drivers: core: remove device_link argument from class_compat_[create|remove]_link 2025-01-10 15:42:20 +01:00
pci vfio/pci: Align huge faults to order 2025-05-06 12:59:12 -06:00
platform vfio/platform: check the bounds of read/write syscalls 2025-01-23 13:13:27 -07:00
container.c
debugfs.c vfio/migration: Add debugfs to live migration driver 2023-12-04 14:29:08 -07:00
device_cdev.c vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid 2025-03-25 10:18:31 -03:00
group.c make use of anon_inode_getfile_fmode() 2025-02-21 10:25:31 +01:00
iommufd.c vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices 2025-03-25 10:18:31 -03:00
Kconfig vfio/migration: Add debugfs to live migration driver 2023-12-04 14:29:08 -07:00
Makefile vfio/migration: Add debugfs to live migration driver 2023-12-04 14:29:08 -07:00
vfio.h vfio: replace CONFIG_HAVE_KVM with IS_ENABLED(CONFIG_KVM) 2024-02-08 08:45:35 -05:00
vfio_iommu_spapr_tce.c vfio/spapr: Always clear TCEs before unsetting the window 2024-06-28 17:03:39 +10:00
vfio_iommu_type1.c vfio/type1: Use mapping page mask for pfnmaps 2025-02-27 11:55:54 -07:00
vfio_main.c module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
virqfd.c assorted variants of irqfd setup: convert to CLASS(fd) 2024-11-03 01:28:07 -05:00