linux/fs/btrfs
Filipe Manana 12a824dc67 btrfs: speedup checking for extent sharedness during fiemap
One of the most expensive tasks performed during fiemap is to check if
an extent is shared. This task has two major steps:

1) Check if the data extent is shared. This implies checking the extent
   item in the extent tree, checking delayed references, etc. If we
   find the data extent is directly shared, we terminate immediately;

2) If the data extent is not directly shared (its extent item has a
   refcount of 1), then it may be shared if we have snapshots that share
   subtrees of the inode's subvolume b+tree. So we check if the leaf
   containing the file extent item is shared, then its parent node, then
   the parent node of the parent node, etc, until we reach the root node
   or we find one of them is shared - in which case we stop immediately.

During fiemap we process the extents of a file from left to right, from
file offset 0 to EOF. This means that we iterate b+tree leaves from left
to right, and has the implication that we keep repeating that second step
above several times for the same b+tree path of the inode's subvolume
b+tree.

For example, if we have two file extent items in leaf X, and the path to
leaf X is A -> B -> C -> X, then when we try to determine if the data
extent referenced by the first extent item is shared, we check if the data
extent is shared - if it's not, then we check if leaf X is shared, if not,
then we check if node C is shared, if not, then check if node B is shared,
if not than check if node A is shared. When we move to the next file
extent item, after determining the data extent is not shared, we repeat
the checks for X, C, B and A - doing all the expensive searches in the
extent tree, delayed refs, etc. If we have thousands of tile extents, then
we keep repeating the sharedness checks for the same paths over and over.

On a file that has no shared extents or only a small portion, it's easy
to see that this scales terribly with the number of extents in the file
and the sizes of the extent and subvolume b+trees.

This change eliminates the repeated sharedness check on extent buffers
by caching the results of the last path used. The results can be used as
long as no snapshots were created since they were cached (for not shared
extent buffers) or no roots were dropped since they were cached (for
shared extent buffers). This greatly reduces the time spent by fiemap for
files with thousands of extents and/or large extent and subvolume b+trees.

Example performance test:

    $ cat fiemap-perf-test.sh
    #!/bin/bash

    DEV=/dev/sdi
    MNT=/mnt/sdi

    mkfs.btrfs -f $DEV
    mount -o compress=lzo $DEV $MNT

    # 40G gives 327680 128K file extents (due to compression).
    xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar

    umount $MNT
    mount -o compress=lzo $DEV $MNT

    start=$(date +%s%N)
    filefrag $MNT/foobar
    end=$(date +%s%N)
    dur=$(( (end - start) / 1000000 ))
    echo "fiemap took $dur milliseconds (metadata not cached)"

    start=$(date +%s%N)
    filefrag $MNT/foobar
    end=$(date +%s%N)
    dur=$(( (end - start) / 1000000 ))
    echo "fiemap took $dur milliseconds (metadata cached)"

    umount $MNT

Before this patch:

    $ ./fiemap-perf-test.sh
    (...)
    /mnt/sdi/foobar: 327680 extents found
    fiemap took 3597 milliseconds (metadata not cached)
    /mnt/sdi/foobar: 327680 extents found
    fiemap took 2107 milliseconds (metadata cached)

After this patch:

    $ ./fiemap-perf-test.sh
    (...)
    /mnt/sdi/foobar: 327680 extents found
    fiemap took 1646 milliseconds (metadata not cached)
    /mnt/sdi/foobar: 327680 extents found
    fiemap took 698 milliseconds (metadata cached)

That's about 2.2x faster when no metadata is cached, and about 3x faster
when all metadata is cached. On a real filesystem with many other files,
data, directories, etc, the b+trees will be 2 or 3 levels higher,
therefore this optimization will have a higher impact.

Several reports of a slow fiemap show up often, the two Link tags below
refer to two recent reports of such slowness. This patch, together with
the next ones in the series, is meant to address that.

Link: https://lore.kernel.org/linux-btrfs/21dd32c6-f1f9-f44a-466a-e18fdc6788a7@virtuozzo.com/
Link: https://lore.kernel.org/linux-btrfs/Ysace25wh5BbLd5f@atmark-techno.com/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26 12:28:01 +02:00
..
tests btrfs: remove use btrfs_remove_free_space_cache instead of variant 2022-09-26 12:27:58 +02:00
acl.c btrfs: reserve correct number of items for inode creation 2022-05-16 17:03:08 +02:00
async-thread.c btrfs: simplify WQ_HIGHPRI handling in struct btrfs_workqueue 2022-05-16 17:03:15 +02:00
async-thread.h btrfs: remove unused typedefs get_extent_t and btrfs_work_func_t 2022-07-25 17:45:36 +02:00
backref.c btrfs: speedup checking for extent sharedness during fiemap 2022-09-26 12:28:01 +02:00
backref.h btrfs: speedup checking for extent sharedness during fiemap 2022-09-26 12:28:01 +02:00
block-group.c btrfs: enhance unsupported compat RO flags handling 2022-09-26 12:28:00 +02:00
block-group.h btrfs: get rid of block group caching progress logic 2022-09-26 12:27:58 +02:00
block-rsv.c btrfs: don't save block group root into super block 2022-09-26 12:28:00 +02:00
block-rsv.h btrfs: use enum for btrfs_block_rsv::type 2022-07-25 17:45:40 +02:00
btrfs_inode.h btrfs: add optimized btrfs_ino() version for 64 bits systems 2022-07-25 17:45:41 +02:00
check-integrity.c fs/btrfs: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:32 -06:00
check-integrity.h btrfs: check-integrity: split submit_bio from btrfsic checking 2022-05-16 17:03:12 +02:00
compression.c btrfs: give struct btrfs_bio a real end_io handler 2022-09-26 12:27:59 +02:00
compression.h for-5.20-tag 2022-08-03 14:54:52 -07:00
ctree.c btrfs: fix lockdep splat with reloc root extent buffers 2022-08-17 16:19:12 +02:00
ctree.h btrfs: speedup checking for extent sharedness during fiemap 2022-09-26 12:28:01 +02:00
delalloc-space.c btrfs: convert count_max_extents() to use fs_info->max_extent_size 2022-07-25 17:45:41 +02:00
delalloc-space.h
delayed-inode.c btrfs: use delayed items when logging a directory 2022-09-26 12:27:57 +02:00
delayed-inode.h btrfs: use delayed items when logging a directory 2022-09-26 12:27:57 +02:00
delayed-ref.c btrfs: switch btrfs_block_rsv::full to bool 2022-07-25 17:45:40 +02:00
delayed-ref.h btrfs: remove btrfs_delayed_extent_op::is_data 2022-05-16 17:17:31 +02:00
dev-replace.c btrfs: don't take a bio_counter reference for cloned bios 2022-09-26 12:27:58 +02:00
dev-replace.h
dir-item.c btrfs: use btrfs_for_each_slot in btrfs_search_dir_index_item 2022-05-16 17:03:07 +02:00
discard.c
discard.h
disk-io.c btrfs: separate BLOCK_GROUP_TREE compat RO flag from EXTENT_TREE_V2 2022-09-26 12:28:00 +02:00
disk-io.h btrfs: separate BLOCK_GROUP_TREE compat RO flag from EXTENT_TREE_V2 2022-09-26 12:28:00 +02:00
export.c
export.h
extent-io-tree.h btrfs: move btrfs_bio allocation to volumes.c 2022-09-26 12:27:58 +02:00
extent-tree.c btrfs: speedup checking for extent sharedness during fiemap 2022-09-26 12:28:01 +02:00
extent_io.c btrfs: speedup checking for extent sharedness during fiemap 2022-09-26 12:28:01 +02:00
extent_io.h btrfs: move btrfs_bio allocation to volumes.c 2022-09-26 12:27:58 +02:00
extent_map.c btrfs: assert we have a write lock when removing and replacing extent maps 2022-03-14 13:13:50 +01:00
extent_map.h btrfs: defrag: don't use merged extent map for their generation check 2022-02-23 17:43:13 +01:00
file-item.c btrfs: rename btrfs_insert_file_extent() to btrfs_insert_hole_extent() 2022-09-26 12:27:54 +02:00
file.c btrfs: make hole and data seeking a lot more efficient 2022-09-26 12:28:00 +02:00
free-space-cache.c btrfs: remove use btrfs_remove_free_space_cache instead of variant 2022-09-26 12:27:58 +02:00
free-space-cache.h btrfs: remove use btrfs_remove_free_space_cache instead of variant 2022-09-26 12:27:58 +02:00
free-space-tree.c btrfs: get rid of block group caching progress logic 2022-09-26 12:27:58 +02:00
free-space-tree.h
inode-item.c
inode-item.h
inode.c btrfs: properly flush delalloc when entering fiemap 2022-09-26 12:28:00 +02:00
ioctl.c btrfs: use fs_info->max_extent_size in get_extent_max_capacity() 2022-07-25 17:45:41 +02:00
Kconfig btrfs: use generic Kconfig option for 256kB page size limit 2022-01-20 08:52:55 +02:00
locking.c btrfs: fix lockdep splat with reloc root extent buffers 2022-08-17 16:19:12 +02:00
locking.h btrfs: fix lockdep splat with reloc root extent buffers 2022-08-17 16:19:12 +02:00
lzo.c btrfs: replace kmap() with kmap_local_page() in lzo.c 2022-07-25 17:45:33 +02:00
Makefile Kbuild: add -Wno-shift-negative-value where -Wextra is used 2022-03-13 17:30:31 +09:00
misc.h
ordered-data.c btrfs: add lockdep annotations for the ordered extents wait event 2022-09-26 12:27:53 +02:00
ordered-data.h btrfs: remove the finish_func argument to btrfs_mark_ordered_io_finished 2022-07-25 17:45:37 +02:00
orphan.c
print-tree.c btrfs: unify the error handling pattern for read_tree_block() 2022-03-14 13:13:53 +01:00
print-tree.h
props.c btrfs: remove the unnecessary result variables 2022-09-26 12:28:00 +02:00
props.h btrfs: move common inode creation code into btrfs_create_new_inode() 2022-05-16 17:03:08 +02:00
qgroup.c btrfs: fix race between quota enable and quota rescan ioctl 2022-09-26 12:27:58 +02:00
qgroup.h btrfs: avoid blocking on space revervation when doing nowait dio writes 2022-05-16 17:03:10 +02:00
raid56.c btrfs: properly abstract the parity raid bio handling 2022-09-26 12:27:59 +02:00
raid56.h btrfs: properly abstract the parity raid bio handling 2022-09-26 12:27:59 +02:00
rcu-string.h
ref-verify.c
ref-verify.h
reflink.c btrfs: clean up chained assignments 2022-07-25 17:45:39 +02:00
reflink.h
relocation.c btrfs: fix lockdep splat with reloc root extent buffers 2022-08-17 16:19:12 +02:00
root-tree.c btrfs: simplify error handling at btrfs_del_root_ref() 2022-09-26 12:27:58 +02:00
scrub.c btrfs: properly abstract the parity raid bio handling 2022-09-26 12:27:59 +02:00
send.c btrfs: send: fix failures when processing inodes with no links 2022-09-26 12:27:57 +02:00
send.h btrfs: send: add support for fs-verity 2022-09-26 12:27:55 +02:00
space-info.c btrfs: dump all space infos if we abort transaction due to ENOSPC 2022-09-26 12:27:59 +02:00
space-info.h btrfs: dump all space infos if we abort transaction due to ENOSPC 2022-09-26 12:27:59 +02:00
struct-funcs.c btrfs: remove redundant check in up check_setget_bounds 2022-07-25 17:45:33 +02:00
subpage.c btrfs: remove extent writepage address space operation 2022-07-25 17:45:37 +02:00
subpage.h btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine 2022-05-16 17:03:11 +02:00
super.c btrfs: enhance unsupported compat RO flags handling 2022-09-26 12:28:00 +02:00
sysfs.c btrfs: remove the unnecessary result variables 2022-09-26 12:28:00 +02:00
sysfs.h
transaction.c btrfs: don't save block group root into super block 2022-09-26 12:28:00 +02:00
transaction.h btrfs: pass btrfs_fs_info for deleting snapshots and cleaner 2022-03-14 13:13:52 +01:00
tree-checker.c btrfs: tree-checker: check for overlapping extent items 2022-08-17 16:20:25 +02:00
tree-checker.h btrfs: tree-checker: check extent buffer owner against owner rootid 2022-05-16 17:03:09 +02:00
tree-defrag.c
tree-log.c btrfs: simplify adding and replacing references during log replay 2022-09-26 12:27:57 +02:00
tree-log.h btrfs: use delayed items when logging a directory 2022-09-26 12:27:57 +02:00
tree-mod-log.c
tree-mod-log.h
ulist.c
ulist.h
uuid-tree.c
verity.c btrfs: send: add support for fs-verity 2022-09-26 12:27:55 +02:00
volumes.c btrfs: check superblock to ensure the fs was not modified at thaw time 2022-09-26 12:27:59 +02:00
volumes.h btrfs: give struct btrfs_bio a real end_io handler 2022-09-26 12:27:59 +02:00
xattr.c btrfs: check if root is readonly while setting security xattr 2022-08-22 18:06:30 +02:00
xattr.h
zlib.c btrfs: zlib: replace kmap() with kmap_local_page() in zlib_decompress_bio() 2022-07-25 17:45:41 +02:00
zoned.c btrfs: get rid of block group caching progress logic 2022-09-26 12:27:58 +02:00
zoned.h btrfs: zoned: activate metadata block group on flush_space 2022-07-25 17:45:42 +02:00
zstd.c btrfs: zstd: replace kmap() with kmap_local_page() 2022-07-25 17:45:40 +02:00