Commit graph

344 commits

Author SHA1 Message Date
Linus Torvalds
d632ab86af dm-table: fix checking for request-based stackable devices
dm-flakey: fix corrupt_bio_byte setup checks
 
 dm raid: add support for resync w/o metadata devices
 
 dm, dm-mpath, vm-vdo, dm-raid: small code simplification
 
 dm-verity: remove support for asynchronous hashes
 
 dm-zoned-target: close smatch warning
 
 dm-thin: update the documentation
 
 dm-thin: enable inline-crypto passthrough
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCaJC5oRQcbXBhdG9ja2FA
 cmVkaGF0LmNvbQAKCRATAyx9YGnhbX+RAP911rivQBhdJSiktvnSulBvSANzzNWa
 9pNIuHGCmX3RVwEA53xpdUBkWjvGbtrUwk2eu8SK728H9aOFrpaDImKdVws=
 =gUBT
 -----END PGP SIGNATURE-----

Merge tag 'for-6.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - fix checking for request-based stackable devices (dm-table)

 - fix corrupt_bio_byte setup checks (dm-flakey)

 - add support for resync w/o metadata devices (dm raid)

 - small code simplification (dm, dm-mpath, vm-vdo, dm-raid)

 - remove support for asynchronous hashes (dm-verity)

 - close smatch warning (dm-zoned-target)

 - update the documentation and enable inline-crypto passthrough
   (dm-thin)

* tag 'for-6.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: set DM_TARGET_PASSES_CRYPTO feature for dm-thin
  dm-thin: update the documentation
  dm-raid: do not include dm-core.h
  vdo: omit need_resched() before cond_resched()
  md: dm-zoned-target: Initialize return variable r to avoid uninitialized use
  dm-verity: remove support for asynchronous hashes
  dm-mpath: don't print the "loaded" message if registering fails
  dm-mpath: make dm_unregister_path_selector return void
  dm: ima: avoid extra calls to strlen()
  dm: Simplify dm_io_complete()
  dm: Remove unnecessary return in dm_zone_endio()
  dm raid: add support for resync w/o metadata devices
  dm-flakey: Fix corrupt_bio_byte setup checks
  dm-table: fix checking for rq stackable devices
2025-08-04 08:58:53 -07:00
Zhang Yi
2c46eab8da dm: clear unmap write zeroes limits when disabling write zeroes
The unmap write zeroes limits have been set to the stacking queue limits
by default in blk_set_stacking_limits() and blk_stack_limits(), but it
should be cleared if any underlying device does not support it.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/20250619111806.3546162-6-yi.zhang@huaweicloud.com
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-23 12:45:13 +02:00
Benjamin Marzinski
8ca719b819 dm-table: fix checking for rq stackable devices
Due to the semantics of iterate_devices(), the current code allows a
request-based dm table as long as it includes one request-stackable
device. It is supposed to only allow tables where there are no
non-request-stackable devices.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-06-23 11:56:51 +02:00
Linus Torvalds
3c727285f1 - dm: better error handling when reloading a table
- dm-delay: don't busy-wait in kthread
 
 - dm: use use generic disable_* functions instead of open coding them
 
 - dm: lock queue limits when reading them
 
 - dm-verity: use softirq context only when !need_resched()
 
 - dm-bufio: remove maximum age based eviction
 
 - dm: remove unneeded kvfree from alloc_targets
 
 - dm-flakey: various fixes
 
 - dm-mpath: interface for explicit probing of active paths
 
 - dm: fix BLK_FEAT_ATOMIC_WRITES
 
 - dm: pass through operations on wrapped inline crypto keys
 
 - dm vdo indexer: don't read request structure after enqueuing
 
 - dm-zone: Use bdev_*() helper functions where applicable
 
 - dm-mpath: replace spin_lock_irqsave with spin_lock_irq
 
 - dm-mirror: fix a tiny race condition
 
 - dm-verity: fix a memory leak if some arguments are specified multiple times
 
 - dm-stripe: small code cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCaD8vFBQcbXBhdG9ja2FA
 cmVkaGF0LmNvbQAKCRATAyx9YGnhbYGFAQC/V62PzDUa326WSdvwhtYe6jphInlW
 ZSmh37L4MIcV2wD/S8UqIaC9GSakee6jEWBTRiDqNZNEOIWhbd7f6gnTOQ0=
 =eYUe
 -----END PGP SIGNATURE-----

Merge tag 'for-6.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - better error handling when reloading a table

 - use use generic disable_* functions instead of open coding them

 - lock queue limits when reading them

 - remove unneeded kvfree from alloc_targets

 - fix BLK_FEAT_ATOMIC_WRITES

 - pass through operations on wrapped inline crypto keys

 - dm-verity:
     - use softirq context only when !need_resched()
     - fix a memory leak if some arguments are specified multiple times

 - dm-mpath:
    - interface for explicit probing of active paths
    - replace spin_lock_irqsave with spin_lock_irq

 - dm-delay: don't busy-wait in kthread

 - dm-bufio: remove maximum age based eviction

 - dm-flakey: various fixes

 - vdo indexer: don't read request structure after enqueuing

 - dm-zone: Use bdev_*() helper functions where applicable

 - dm-mirror: fix a tiny race condition

 - dm-stripe: small code cleanup

* tag 'for-6.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
  dm-stripe: small code cleanup
  dm-verity: fix a memory leak if some arguments are specified multiple times
  dm-mirror: fix a tiny race condition
  dm-table: check BLK_FEAT_ATOMIC_WRITES inside limits_lock
  dm mpath: replace spin_lock_irqsave with spin_lock_irq
  dm-mpath: Don't grab work_mutex while probing paths
  dm-zone: Use bdev_*() helper functions where applicable
  dm vdo indexer: don't read request structure after enqueuing
  dm: pass through operations on wrapped inline crypto keys
  blk-crypto: export wrapped key functions
  dm-table: Set BLK_FEAT_ATOMIC_WRITES for target queue limits
  dm mpath: Interface for explicit probing of active paths
  dm: Allow .prepare_ioctl to handle ioctls directly
  dm-flakey: make corrupting read bios work
  dm-flakey: remove useless ERROR_READS check in flakey_end_io
  dm-flakey: error all IOs when num_features is absent
  dm-flakey: Clean up parsing messages
  dm: remove unneeded kvfree from alloc_targets
  dm-bufio: remove maximum age based eviction
  dm-verity: use softirq context only when !need_resched()
  ...
2025-06-03 15:54:46 -07:00
Benjamin Marzinski
85f6d5b729 dm-table: check BLK_FEAT_ATOMIC_WRITES inside limits_lock
dm_set_device_limits() should check q->limits.features for
BLK_FEAT_ATOMIC_WRITES while holding q->limits_lock, like it does for
the rest of the queue limits.

Fixes: b7c18b17a1 ("dm-table: Set BLK_FEAT_ATOMIC_WRITES for target queue limits")
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-06-02 14:24:51 +02:00
Bart Van Assche
241b9b584d dm-zone: Use bdev_*() helper functions where applicable
Improve code readability by using bdev_is_zone_aligned() and
bdev_offset_from_zone_start() where applicable. No functionality
has been changed.

This patch is a reworked version of a patch from Pankaj Raghav.

See also https://lore.kernel.org/linux-block/20220923173618.6899-11-p.raghav@samsung.com/.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-15 15:55:07 +02:00
Eric Biggers
e93912786e dm: pass through operations on wrapped inline crypto keys
Make the device-mapper layer pass through the derive_sw_secret,
import_key, generate_key, and prepare_key blk-crypto operations when all
underlying devices support hardware-wrapped inline crypto keys and are
passing through inline crypto support.

Commit ebc4176551 ("blk-crypto: add basic hardware-wrapped key
support") already made BLK_CRYPTO_KEY_TYPE_HW_WRAPPED be passed through
in the same way that the other crypto capabilities are.  But the wrapped
key support also includes additional operations in blk_crypto_ll_ops,
and the dm layer needs to implement those to pass them through.
derive_sw_secret is needed by fscrypt, while the other operations are
needed for the new blk-crypto ioctls to work on device-mapper devices
and not just the raw partitions.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-06 19:08:20 +02:00
Tudor Ambarus
f1aff4bc19 dm: fix copying after src array boundaries
The blammed commit copied to argv the size of the reallocated argv,
instead of the size of the old_argv, thus reading and copying from
past the old_argv allocated memory.

Following BUG_ON was hit:
[    3.038929][    T1] kernel BUG at lib/string_helpers.c:1040!
[    3.039147][    T1] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
...
[    3.056489][    T1] Call trace:
[    3.056591][    T1]  __fortify_panic+0x10/0x18 (P)
[    3.056773][    T1]  dm_split_args+0x20c/0x210
[    3.056942][    T1]  dm_table_add_target+0x13c/0x360
[    3.057132][    T1]  table_load+0x110/0x3ac
[    3.057292][    T1]  dm_ctl_ioctl+0x424/0x56c
[    3.057457][    T1]  __arm64_sys_ioctl+0xa8/0xec
[    3.057634][    T1]  invoke_syscall+0x58/0x10c
[    3.057804][    T1]  el0_svc_common+0xa8/0xdc
[    3.057970][    T1]  do_el0_svc+0x1c/0x28
[    3.058123][    T1]  el0_svc+0x50/0xac
[    3.058266][    T1]  el0t_64_sync_handler+0x60/0xc4
[    3.058452][    T1]  el0t_64_sync+0x1b0/0x1b4
[    3.058620][    T1] Code: f800865e a9bf7bfd 910003fd 941f48aa (d4210000)
[    3.058897][    T1] ---[ end trace 0000000000000000 ]---
[    3.059083][    T1] Kernel panic - not syncing: Oops - BUG: Fatal exception

Fix it by copying the size of src, and not the size of dst, as it was.

Fixes: 5a2a6c4281 ("dm: always update the array size in realloc_argv on success")
Cc: stable@vger.kernel.org
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-06 14:06:59 +02:00
John Garry
b7c18b17a1 dm-table: Set BLK_FEAT_ATOMIC_WRITES for target queue limits
Feature flag BLK_FEAT_ATOMIC_WRITES is not being properly set for the
target queue limits, and this means that atomic writes are not being
enabled for any dm personalities.

When calling dm_set_device_limits() -> blk_stack_limits() ->
... -> blk_stack_atomic_writes_limits(), the bottom device limits
(which corresponds to intermediate target queue limits) does not have
BLK_FEAT_ATOMIC_WRITES set, and so atomic writes can never be enabled.

Typically such a flag would be inherited from the stacked device in
dm_set_device_limits() -> blk_stack_limits() via BLK_FEAT_INHERIT_MASK,
but BLK_FEAT_ATOMIC_WRITES is not inherited as it's preferred to manually
enable on a per-personality basis.

Set BLK_FEAT_ATOMIC_WRITES manually for the intermediate target queue
limits from the stacked device to get atomic writes working.

Fixes: 3194e36488 ("dm-table: atomic writes support")
Cc: stable@vger.kernel.org	# v6.14
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-04 12:02:39 +02:00
Benjamin Marzinski
d90e7a500c dm: remove unneeded kvfree from alloc_targets
alloc_targets() is always called with a newly initialized table where
t->highs == NULL.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-04 11:35:05 +02:00
Mikulas Patocka
abb4cf2f4c dm: lock limits when reading them
Lock queue limits when reading them, so that we don't read halfway
modified values.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
2025-05-04 11:35:05 +02:00
Benjamin Marzinski
121218bef4 dm: limit swapping tables for devices with zone write plugs
dm_revalidate_zones() only allowed new or previously unzoned devices to
call blk_revalidate_disk_zones(). If the device was already zoned,
disk->nr_zones would always equal md->nr_zones, so dm_revalidate_zones()
returned without doing any work. This would make the zoned settings for
the device not match the new table. If the device had zone write plug
resources, it could run into errors like bdev_zone_is_seq() reading
invalid memory because disk->conv_zones_bitmap was the wrong size.

If the device doesn't have any zone write plug resources, calling
blk_revalidate_disk_zones() will always correctly update device.  If
blk_revalidate_disk_zones() fails, it can still overwrite or clear the
current disk->nr_zones value. In this case, DM must restore the previous
value of disk->nr_zones, so that the zoned settings will continue to
match the previous value that it fell back to.

If the device already has zone write plug resources,
blk_revalidate_disk_zones() will not correctly update them, if it is
called for arbitrary zoned device changes.  Since there is not much need
for this ability, the easiest solution is to disallow any table reloads
that change the zoned settings, for devices that already have zone plug
resources.  Specifically, if a device already has zone plug resources
allocated, it can only switch to another zoned table that also emulates
zone append.  Also, it cannot change the device size or the zone size. A
device can switch to an error target.

Fixes: bb37d77239 ("dm: introduce zone append emulation")
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-04 11:35:05 +02:00
Dan Carpenter
650266ac4c dm: add missing unlock on in dm_keyslot_evict()
We need to call dm_put_live_table() even if dm_get_live_table() returns
NULL.

Fixes: 9355a9eb21 ("dm: support key eviction from keyslot managers of underlying devices")
Cc: stable@vger.kernel.org	# v5.12+
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-04-30 18:17:43 +02:00
Benjamin Marzinski
5a2a6c4281 dm: always update the array size in realloc_argv on success
realloc_argv() was only updating the array size if it was called with
old_argv already allocated. The first time it was called to create an
argv array, it would allocate the array but return the array size as
zero. dm_split_args() would think that it couldn't store any arguments
in the array and would call realloc_argv() again, causing it to
reallocate the initial slots (this time using GPF_KERNEL) and finally
return a size. Aside from being wasteful, this could cause deadlocks on
targets that need to process messages without starting new IO. Instead,
realloc_argv should always update the allocated array size on success.

Fixes: a065192655 ("dm table: don't copy from a NULL pointer in realloc_argv()")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-04-28 13:11:33 +02:00
Benjamin Marzinski
4ea30ec6fb dm: handle failures in dm_table_set_restrictions
If dm_table_set_restrictions() fails while swapping tables,
device-mapper will continue using the previous table. It must be sure to
leave the mapped_device in it's previous state on failure.  Otherwise
device-mapper could end up using the old table with settings from the
unused table.

Do not update the mapped device in dm_set_zones_restrictions(). Wait
till after dm_table_set_restrictions() is sure to succeed to update the
md zoned settings. Do the same with the dax settings, and if
dm_revalidate_zones() fails, restore the original queue limits.

Fixes: 7f91ccd8a6 ("dm: Call dm_revalidate_zones() after setting the queue limits")
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-04-11 13:38:53 +02:00
Andy Shevchenko
7bd47be161 dm table: Fix W=1 build warning when mempool_needs_integrity is unused
The mempool_needs_integrity is unused. This, in particular, prevents
kernel builds with Clang, `make W=1` and CONFIG_WERROR=y:

drivers/md/dm-table.c:1052:7: error: variable 'mempool_needs_integrity' set but not used [-Werror,-Wunused-but-set-variable]
 1052 |         bool mempool_needs_integrity = t->integrity_supported;
      |              ^

Fix this by removing the leftover.

Fixes: 105ca2a2c2 ("block: split struct bio_integrity_payload")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-04-09 15:56:59 +02:00
Linus Torvalds
5014bebee0 - dm-crypt: switch to using the crc32 library
- dm-verity, dm-integrity, dm-crypt: documentation improvement
 
 - dm-vdo fixes
 
 - dm-stripe: enable inline crypto passthrough
 
 - dm-integrity: set ti->error on memory allocation failure
 
 - dm-bufio: remove unused return value
 
 - dm-verity: do forward error correction on metadata I/O errors
 
 - dm: fix unconditional IO throttle caused by REQ_PREFLUSH
 
 - dm cache: prevent BUG_ON by blocking retries on failed device resumes
 
 - dm cache: support shrinking the origin device
 
 - dm: restrict dm device size to 2^63-512 bytes
 
 - dm-delay: support zoned devices
 
 - dm-verity: support block number limits for different ioprio classes
 
 - dm-integrity: fix non-constant-time tag verification (security bug)
 
 - dm-verity, dm-ebs: fix prefetch-vs-suspend race
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCZ+u7shQcbXBhdG9ja2FA
 cmVkaGF0LmNvbQAKCRATAyx9YGnhbZ0JAQDVhbl77u9jjPWjxJvFodMAqw+KPXGC
 MNzkyzG0lu7oPAEA33vt5pHQtr7F3SJj/sDBuZ+rb5bvUtgxeGqpJOQpTAk=
 =tj00
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - dm-crypt: switch to using the crc32 library

 - dm-verity, dm-integrity, dm-crypt: documentation improvement

 - dm-vdo fixes

 - dm-stripe: enable inline crypto passthrough

 - dm-integrity: set ti->error on memory allocation failure

 - dm-bufio: remove unused return value

 - dm-verity: do forward error correction on metadata I/O errors

 - dm: fix unconditional IO throttle caused by REQ_PREFLUSH

 - dm cache: prevent BUG_ON by blocking retries on failed device resumes

 - dm cache: support shrinking the origin device

 - dm: restrict dm device size to 2^63-512 bytes

 - dm-delay: support zoned devices

 - dm-verity: support block number limits for different ioprio classes

 - dm-integrity: fix non-constant-time tag verification (security bug)

 - dm-verity, dm-ebs: fix prefetch-vs-suspend race

* tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits)
  dm-ebs: fix prefetch-vs-suspend race
  dm-verity: fix prefetch-vs-suspend race
  dm-integrity: fix non-constant-time tag verification
  dm-verity: support block number limits for different ioprio classes
  dm-delay: support zoned devices
  dm: restrict dm device size to 2^63-512 bytes
  dm cache: support shrinking the origin device
  dm cache: prevent BUG_ON by blocking retries on failed device resumes
  dm vdo indexer: reorder uds_request to reduce padding
  dm: fix unconditional IO throttle caused by REQ_PREFLUSH
  dm vdo: rework processing of loaded refcount byte arrays
  dm vdo: remove remaining ring references
  dm-verity: do forward error correction on metadata I/O errors
  dm-bufio: remove unused return value
  dm-integrity: set ti->error on memory allocation failure
  dm: Enable inline crypto passthrough for striped target
  dm vdo slab-depot: read refcount blocks in large chunks at load time
  dm vdo vio-pool: allow variable-sized metadata vios
  dm vdo vio-pool: support pools with multiple data blocks per vio
  dm vdo vio-pool: add a pool pointer to pooled_vio
  ...
2025-04-02 21:27:59 -07:00
Mikulas Patocka
45fc728515 dm: restrict dm device size to 2^63-512 bytes
The devices with size >= 2^63 bytes can't be used reliably by userspace
because the type off_t is a signed 64-bit integer.

Therefore, we limit the maximum size of a device mapper device to
2^63-512 bytes.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-03-14 13:58:33 +01:00
Christoph Hellwig
105ca2a2c2 block: split struct bio_integrity_payload
Many of the fields in struct bio_integrity_payload are only needed for
the default integrity buffer in the block layer, and the variable
sized array at the end of the structure makes it very hard to embed
into caller allocated structures.

Reduce struct bio_integrity_payload to the minimal structure needed in
common code and create two separate containing structures for the
automatically generated payload and the caller allocated payload.
The latter is a simple wrapper for struct bio_integrity_payload and
the bvecs, while the former contains the additional fields moved out
of struct bio_integrity_payload.

Always use a dedicated mempool for automatic integrity metadata
instead of depending on bio_set that is submitter controlled and thus
often doesn't have the mempool initialized and stop using mempools for
the submitter buffers as they aren't in the NOIO I/O submission path
where we need to guarantee forward progress.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20250225154449.422989-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-03 11:17:52 -07:00
Eric Biggers
ebc4176551 blk-crypto: add basic hardware-wrapped key support
To prevent keys from being compromised if an attacker acquires read
access to kernel memory, some inline encryption hardware can accept keys
which are wrapped by a per-boot hardware-internal key.  This avoids
needing to keep the raw keys in kernel memory, without limiting the
number of keys that can be used.  Such hardware also supports deriving a
"software secret" for cryptographic tasks that can't be handled by
inline encryption; this is needed for fscrypt to work properly.

To support this hardware, allow struct blk_crypto_key to represent a
hardware-wrapped key as an alternative to a raw key, and make drivers
set flags in struct blk_crypto_profile to indicate which types of keys
they support.  Also add the ->derive_sw_secret() low-level operation,
which drivers supporting wrapped keys must implement.

For more information, see the detailed documentation which this patch
adds to Documentation/block/inline-encryption.rst.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650
Link: https://lore.kernel.org/r/20250204060041.409950-2-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-10 09:54:19 -07:00
John Garry
3194e36488 dm-table: atomic writes support
Support stacking atomic write limits for DM devices.

All the pre-existing code in blk_stack_atomic_writes_limits() already takes
care of finding the aggregrate limits from the bottom devices.

Feature flag DM_TARGET_ATOMIC_WRITES is introduced so that atomic writes
can be enabled on personalities selectively. This is to ensure that atomic
writes are only enabled when verified to be working properly (for a
specific personality). In addition, it just may not make sense to enable
atomic writes on some personalities (so this flag also helps there).

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-01-17 22:23:47 +01:00
Dr. David Alan Gilbert
ad9266118c dm: Remove unused dm_table_bio_based
dm_table_bio_based() is unused since commit
29dec90a0f ("dm: fix bio_set allocation")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-11-20 11:38:04 +01:00
Mikulas Patocka
617069741d dm: introduce the target flag mempool_needs_integrity
This commit introduces the dm target flag mempool_needs_integrity. When
the flag is set, device mapper will call bioset_integrity_create on it's
bio sets. The target can then call bio_integrity_alloc on the bios
allocated from the table's mempool.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-07-12 12:39:06 -04:00
Benjamin Marzinski
a21f9edb13 dm: factor out helper function from dm_get_device
Factor out a helper function, dm_devt_from_path(), from dm_get_device()
for use in dm targets.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-07-10 13:10:06 +02:00
Mikulas Patocka
aaa53168cb dm: optimize flushes
Device mapper sends flush bios to all the targets and the targets send it
to the underlying device. That may be inefficient, for example if a table
contains 10 linear targets pointing to the same physical device, then
device mapper would send 10 flush bios to that device - despite the fact
that only one bio would be sufficient.

This commit optimizes the flush behavior. It introduces a per-target
variable flush_bypasses_map - it is set when the target supports flush
optimization - currently, the dm-linear and dm-stripe targets support it.
When all the targets in a table have flush_bypasses_map,
flush_bypasses_map on the table is set. __send_empty_flush tests if the
table has flush_bypasses_map - and if it has, no flush bios are sent to
the targets via the "map" method and the list dm_table->devices is
iterated and the flush bios are sent to each member of the list.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Suggested-by: Yang Yang <yang.yang@vivo.com>
2024-06-26 11:32:39 -04:00
Jens Axboe
e821bcecdf Merge branch 'for-6.11/block-limits' into for-6.11/block
Merge in queue limits cleanups.

* for-6.11/block-limits:
  block: move the raid_partial_stripes_expensive flag into the features field
  block: remove the discard_alignment flag
  block: move the misaligned flag into the features field
  block: renumber and rename the cache disabled flag
  block: fix spelling and grammar for in writeback_cache_control.rst
  block: remove the unused blk_bounce enum
2024-06-20 06:55:20 -06:00
Christoph Hellwig
4cac3d3a71 block: remove the discard_alignment flag
queue_limits.discard_alignment is never read except in the places
where it is stacked into another limit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240619154623.450048-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-20 06:53:14 -06:00
Jens Axboe
69c34f07e4 Merge branch 'for-6.11/block-limits' into for-6.11/block
Merge in last round of queue limits changes from Christoph.

* for-6.11/block-limits: (26 commits)
  block: move the bounce flag into the features field
  block: move the skip_tagset_quiesce flag to queue_limits
  block: move the pci_p2pdma flag to queue_limits
  block: move the zone_resetall flag to queue_limits
  block: move the zoned flag into the features field
  block: move the poll flag to queue_limits
  block: move the dax flag to queue_limits
  block: move the nowait flag to queue_limits
  block: move the synchronous flag to queue_limits
  block: move the stable_writes flag to queue_limits
  block: move the io_stat flag setting to queue_limits
  block: move the add_random flag to queue_limits
  block: move the nonrot flag to queue_limits
  block: move cache control settings out of queue->flags
  block: remove blk_flush_policy
  block: freeze the queue in queue_attr_store
  nbd: move setting the cache control flags to __nbd_set_size
  virtio_blk: remove virtblk_update_cache_mode
  loop: fold loop_update_rotational into loop_reconfigure_limits
  loop: also use the default block size from an underlying block device
  ...

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 08:14:49 -06:00
Christoph Hellwig
b1fc937a55 block: move the zoned flag into the features field
Move the zoned flags into the features field to reclaim a little
bit of space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
8023e144f9 block: move the poll flag to queue_limits
Move the poll flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Stacking drivers are simplified in that they now can simply set the
flag, and blk_stack_limits will clear it when the features is not
supported by any of the underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
f467fee48d block: move the dax flag to queue_limits
Move the dax flag into the queue_limits feature field so that it can be
set atomically with the queue frozen.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-21-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
f76af42f8b block: move the nowait flag to queue_limits
Move the nowait flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Stacking drivers are simplified in that they now can simply set the
flag, and blk_stack_limits will clear it when the features is not
supported by any of the underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
1a02f3a73f block: move the stable_writes flag to queue_limits
Move the stable_writes flag into the queue_limits feature field so that
it can be set atomically with the queue frozen.

The flag is now inherited by blk_stack_limits, which greatly simplifies
the code in dm, and fixed md which previously did not pass on the flag
set on lower devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
cdb2497918 block: move the io_stat flag setting to queue_limits
Move the io_stat flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Simplify md and dm to set the flag unconditionally instead of avoiding
setting a simple flag for cases where it already is set by other means,
which is a bit pointless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
39a9f1c334 block: move the add_random flag to queue_limits
Move the add_random flag into the queue_limits feature field so that it
can be set atomically with the queue frozen.

Note that this also removes code from dm to clear the flag based on
the underlying devices, which can't be reached as dm devices will
always start out without the flag set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
bd4a633b6f block: move the nonrot flag to queue_limits
Move the nonrot flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Use the chance to switch to defaulting to non-rotational and require
the driver to opt into rotational, which matches the polarity of the
sysfs interface.

For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new
rotational flag is not set as they clearly are not rotational despite
this being a behavior change.  There are some other drivers that
unconditionally set the rotational flag to keep the existing behavior
as they arguably can be used on rotational devices even if that is
probably not their main use today (e.g. virtio_blk and drbd).

The flag is automatically inherited in blk_stack_limits matching the
existing behavior in dm and md.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
1122c0c1cc block: move cache control settings out of queue->flags
Move the cache control settings into the queue_limits so that the flags
can be set atomically with the device queue frozen.

Add new features and flags field for the driver set flags, and internal
(usually sysfs-controlled) flags in the block layer.  Note that we'll
eventually remove enough field from queue_limits to bring it back to the
previous size.

The disable flag is inverted compared to the previous meaning, which
means it now survives a rescan, similar to the max_sectors and
max_discard_sectors user limits.

The FLUSH and FUA flags are now inherited by blk_stack_limits, which
simplified the code in dm a lot, but also causes a slight behavior
change in that dm-switch and dm-unstripe now advertise a write cache
despite setting num_flush_bios to 0.  The I/O path will handle this
gracefully, but as far as I can tell the lack of num_flush_bios
and thus flush support is a pre-existing data integrity bug in those
targets that really needs fixing, after which a non-zero num_flush_bios
should be required in dm for targets that map to underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Damien Le Moal
7f91ccd8a6 dm: Call dm_revalidate_zones() after setting the queue limits
dm_revalidate_zones() is called from dm_set_zone_restrictions() when the
mapped device queue limits are not yet set. However,
dm_revalidate_zones() calls blk_revalidate_disk_zones() and this
function consults and modifies the mapped device queue limits. Thus,
currently, blk_revalidate_disk_zones() operates on limits that are not
yet initialized.

Fix this by moving the call to dm_revalidate_zones() out of
dm_set_zone_restrictions() and into dm_table_set_restrictions() after
executing queue_limits_set().

To further cleanup dm_set_zones_restrictions(), the message about the
type of zone append (native or emulated) is also moved inside
dm_revalidate_zones().

Fixes: 1c0e720228 ("dm: use queue_limits_set")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240611023639.89277-3-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-15 20:42:20 -06:00
Christoph Hellwig
c6e56cf6b2 block: move integrity information into queue_limits
Move the integrity information into the queue limits so that it can be
set atomically with other queue limits, and that the sysfs changes to
the read_verify and write_generate flags are properly synchronized.
This also allows to provide a more useful helper to stack the integrity
fields, although it still is separate from the main stacking function
as not all stackable devices want to inherit the integrity settings.
Even with that it greatly simplifies the code in md and dm.

Note that the integrity field is moved as-is into the queue limits.
While there are good arguments for removing the separate blk_integrity
structure, this would cause a lot of churn and might better be done at a
later time if desired.  However the integrity field in the queue_limits
structure is now unconditional so that various ifdefs can be avoided or
replaced with IS_ENABLED().  Given that tiny size of it that seems like
a worthwhile trade off.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240613084839.1044015-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14 10:20:07 -06:00
Christoph Hellwig
c8c1f7012b dm: make dm_set_zones_restrictions work on the queue limits
Don't stuff the values directly into the queue without any
synchronization, but instead delay applying the queue limits in
the caller and let dm_set_zones_restrictions work on the limit
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240527123634.1116952-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-27 09:16:39 -06:00
Christoph Hellwig
d9780064b1 dm: move setting zoned_enabled to dm_table_set_restrictions
Keep it together with the rest of the zoned code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240527123634.1116952-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-27 09:16:39 -06:00
Linus Torvalds
4f8b6f25eb - Add a dm-crypt optional "high_priority" flag that enables the crypt
workqueues to use WQ_HIGHPRI.
 
 - Export dm-crypt workqueues via sysfs (by enabling WQ_SYSFS) to allow
   for improved visibility and controls over IO and crypt workqueues.
 
 - Fix dm-crypt to no longer constrain max_segment_size to PAGE_SIZE.
   This limit isn't needed given that the block core provides late bio
   splitting if bio exceeds underlying limits (e.g. max_segment_size).
 
 - Fix dm-crypt crypt_queue's use of WQ_UNBOUND to not use
   WQ_CPU_INTENSIVE because it is meaningless with WQ_UNBOUND.
 
 - Fix various issues with dm-delay target (ranging from a resource
   teardown fix, a fix for hung task when using kthread mode, and other
   improvements that followed from code inspection).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmZDiggACgkQxSPxCi2d
 A1oQrwf7BUHy7ehwCjRrVlFTteIlx0ULTpPxictakN/S+xZfcZUcbE20OjNVzdk9
 m5dx4Gn557rlMkiC4NDlHazVEVM5BbTpih27rvUgvX2nchUUdfHIT1OvU0isT4Yi
 h9g/o7i9DnBPjvyNjpXjP9YE7Xg8u2X9mxpv8DyU5M+QpFuofwzsfkCP7g14B0g2
 btGxT3AZ5Bo8A/csKeSqHq13Nbq/dcAZZ3IvjIg1xSXjm6CoQ04rfO0TN6SKfsFJ
 GXteBS2JT1MMcXf3oKweAeQduTE+psVFea7gt/8haKFldnV+DpPNg1gU/7rza5Os
 eL1+L1iPY5fuEJIkaqPjBVRGkxQqHg==
 =+OhH
 -----END PGP SIGNATURE-----

Merge tag 'for-6.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Add a dm-crypt optional "high_priority" flag that enables the crypt
   workqueues to use WQ_HIGHPRI.

 - Export dm-crypt workqueues via sysfs (by enabling WQ_SYSFS) to allow
   for improved visibility and controls over IO and crypt workqueues.

 - Fix dm-crypt to no longer constrain max_segment_size to PAGE_SIZE.
   This limit isn't needed given that the block core provides late bio
   splitting if bio exceeds underlying limits (e.g. max_segment_size).

 - Fix dm-crypt crypt_queue's use of WQ_UNBOUND to not use
   WQ_CPU_INTENSIVE because it is meaningless with WQ_UNBOUND.

 - Fix various issues with dm-delay target (ranging from a resource
   teardown fix, a fix for hung task when using kthread mode, and other
   improvements that followed from code inspection).

* tag 'for-6.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-delay: remove timer_lock
  dm-delay: change locking to avoid contention
  dm-delay: fix max_delay calculations
  dm-delay: fix hung task introduced by kthread mode
  dm-delay: fix workqueue delay_timer race
  dm-crypt: don't set WQ_CPU_INTENSIVE for WQ_UNBOUND crypt_queue
  dm: use queue_limits_set
  dm-crypt: stop constraining max_segment_size to PAGE_SIZE
  dm-crypt: export sysfs of all workqueues
  dm-crypt: add the optional "high_priority" flag
2024-05-14 18:34:19 -07:00
Damien Le Moal
44cccb3027 dm: Check that a zoned table leads to a valid mapped device
Using targets such as dm-linear, a mapped device can be created to
contain only conventional zones. Such device should not be treated as
zoned as it does not contain any mandatory sequential write required
zone. Since such device can be randomly written, we can modify
dm_set_zones_restrictions() to set the mapped device zoned queue limit
to false to expose it as a regular block device. The function
dm_check_zoned() does this after counting the number of conventional
zones of the mapped device and comparing it to the total number of zones
reported. The special dm_check_zoned_cb() report zones callback function
is used to count conventional zones.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Link: https://lore.kernel.org/r/20240501110907.96950-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-01 08:08:42 -06:00
Christoph Hellwig
1c0e720228 dm: use queue_limits_set
Use queue_limits_set which validates the limits and takes care of
updating the readahead settings instead of directly assigning them to
the queue.  For that make sure all limits are actually updated before
the assignment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-04-23 11:20:16 -04:00
Linus Torvalds
bff4b74625 Revert "dm: use queue_limits_set"
This reverts commit 8e0ef41286.

It's broken, and causes the boot to fail on encrypted volumes.

Reported-and-bisected-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/all/20240311235023.GA1205@cmpxchg.org/
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-11 17:11:28 -07:00
Christoph Hellwig
8e0ef41286 dm: use queue_limits_set
Use queue_limits_set which validates the limits and takes care of
updating the readahead settings instead of directly assigning them to
the queue.  For that make sure all limits are actually updated before
the assignment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20240228225653.947152-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01 08:54:42 -07:00
Mikulas Patocka
bd504bcfec dm: limit the number of targets and parameter size area
The kvmalloc function fails with a warning if the size is larger than
INT_MAX. The warning was triggered by a syscall testing robot.

In order to avoid the warning, this commit limits the number of targets to
1048576 and the size of the parameter area to 1073741824.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-01-30 14:05:02 -05:00
Christoph Hellwig
7437bb73f0 block: remove support for the host aware zone model
When zones were first added the SCSI and ATA specs, two different
models were supported (in addition to the drive managed one that
is invisible to the host):

 - host managed where non-conventional zones there is strict requirement
   to write at the write pointer, or else an error is returned
 - host aware where a write point is maintained if writes always happen
   at it, otherwise it is left in an under-defined state and the
   sequential write preferred zones behave like conventional zones
   (probably very badly performing ones, though)

Not surprisingly this lukewarm model didn't prove to be very useful and
was finally removed from the ZBC and SBC specs (NVMe never implemented
it).  Due to to the easily disappearing write pointer host software
could never rely on the write pointer to actually be useful for say
recovery.

Fortunately only a few HDD prototypes shipped using this model which
never made it to mass production.  Drop the support before it is too
late.  Note that any such host aware prototype HDD can still be used
with Linux as we'll now treat it as a conventional HDD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-19 20:17:43 -07:00
Damien Le Moal
a951104333 dm error: Add support for zoned block devices
dm-error is used in several test cases in the xfstests test suite to
check the handling of IO errors in file systems. However, with several
file systems getting native support for zoned block devices (e.g.
btrfs and f2fs), dm-error's lack of zoned block device support creates
problems as the file system attempts executing zone commands (e.g. a
zone append operation) against a dm-error non-zoned block device,
which causes various issues in the block layer (e.g. WARN_ON
triggers).

This commit adds supports for zoned block devices to dm-error, allowing
a DM device table containing an error target to be exposed as a zoned
block device (if all targets have a compatible zoned model support and
mapping). This is done as follows:
1) Allow passing 2 arguments to an error target, similar to dm-linear:
   a backing device and a start sector. These arguments are optional and
   dm-error retains its characteristics if the arguments are not
   specified.
2) Implement the iterate_devices method so that dm-core can normally
   check the zone support and restrictions (e.g. zone alignment of the
   targets). When the backing device arguments are not specified, the
   iterate_devices method never calls the fn() argument.
When no backing device is specified, as before, we assume that the DM
device is not zoned. When the backing device arguments are specified,
the zoned model of the DM device will depend on the backing device
type:
 - If the backing device is zoned and its model and mapping is
   compatible with other targets of the device, the resulting device
   will be zoned, with the dm-error mapped portion always returning
   errors (similar to the default non-zoned case).
 - If the backing device is not zoned, then the DM device will not be
   either.

This zone support for dm-error requires the definition of a functional
report_zones operation so that dm_revalidate_zones() can operate
correctly and resources for emulating zone append operations
initialized. This is necessary for cases where dm-error is used to
partially map a device and have an overall correct handling of zone
append. This means that dm-error does not fail report zones operations.

Two changes that are not obvious are included to avoid issues:
1) dm_table_supports_zoned_model() is changed to directly check if
   the backing device of a wildcard target (= dm-error target) is
   zoned. Otherwise, we wouldn't be able to catch the invalid setup of
   dm-error without a backing device (non zoned case) being combined
   with zoned targets.
2) dm_table_supports_dax() is modified to return false if the wildcard
   target is found. Otherwise, when dm-error is set without a backing
   device, we end up with a NULL pointer dereference in
   set_dax_synchronous (dax_dev is NULL). This is consistent with the
   current behavior because dm_table_supports_dax() always returned
   false for targets that do not define the iterate_devices method.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-10-31 11:06:21 -04:00
Mikulas Patocka
f6007dce0c dm: fix a race condition in retrieve_deps
There's a race condition in the multipath target when retrieve_deps
races with multipath_message calling dm_get_device and dm_put_device.
retrieve_deps walks the list of open devices without holding any lock
but multipath may add or remove devices to the list while it is
running. The end result may be memory corruption or use-after-free
memory access.

See this description of a UAF with multipath_message():
https://listman.redhat.com/archives/dm-devel/2022-October/052373.html

Fix this bug by introducing a new rw semaphore "devices_lock". We grab
devices_lock for read in retrieve_deps and we grab it for write in
dm_get_device and dm_put_device.

Reported-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Tested-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-09-14 11:18:29 -04:00