linux/fs/btrfs
Josef Bacik 59c7b566a3 btrfs: index free space entries on size
Currently we index free space on offset only, because usually we have a
hint from the allocator that we want to honor for locality reasons.
However if we fail to use this hint we have to go back to a brute force
search through the free space entries to find a large enough extent.

With sufficiently fragmented free space this becomes quite expensive, as
we have to linearly search all of the free space entries to find if we
have a part that's long enough.

To fix this add a cached rb tree to index based on free space entry
bytes.  This will allow us to quickly look up the largest chunk in the
free space tree for this block group, and stop searching once we've
found an entry that is too small to satisfy our allocation.  We simply
choose to use this tree if we're searching from the beginning of the
block group, as we know we do not care about locality at that point.

I wrote an allocator test that creates a 10TiB ram backed null block
device and then fallocates random files until the file system is full.
I think go through and delete all of the odd files.  Then I spawn 8
threads that fallocate 64MiB files (1/2 our extent size cap) until the
file system is full again.  I use bcc's funclatency to measure the
latency of find_free_extent.  The baseline results are

     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 10356    |****                                    |
       512 -> 1023       : 58242    |*************************               |
      1024 -> 2047       : 74418    |********************************        |
      2048 -> 4095       : 90393    |****************************************|
      4096 -> 8191       : 79119    |***********************************     |
      8192 -> 16383      : 35614    |***************                         |
     16384 -> 32767      : 13418    |*****                                   |
     32768 -> 65535      : 12811    |*****                                   |
     65536 -> 131071     : 17090    |*******                                 |
    131072 -> 262143     : 26465    |***********                             |
    262144 -> 524287     : 40179    |*****************                       |
    524288 -> 1048575    : 55469    |************************                |
   1048576 -> 2097151    : 48807    |*********************                   |
   2097152 -> 4194303    : 26744    |***********                             |
   4194304 -> 8388607    : 35351    |***************                         |
   8388608 -> 16777215   : 13918    |******                                  |
  16777216 -> 33554431   : 21       |                                        |

avg = 908079 nsecs, total: 580889071441 nsecs, count: 639690

And the patch results are

     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 6883     |**                                      |
       512 -> 1023       : 54346    |*********************                   |
      1024 -> 2047       : 79170    |********************************        |
      2048 -> 4095       : 98890    |****************************************|
      4096 -> 8191       : 81911    |*********************************       |
      8192 -> 16383      : 27075    |**********                              |
     16384 -> 32767      : 14668    |*****                                   |
     32768 -> 65535      : 13251    |*****                                   |
     65536 -> 131071     : 15340    |******                                  |
    131072 -> 262143     : 26715    |**********                              |
    262144 -> 524287     : 43274    |*****************                       |
    524288 -> 1048575    : 53870    |*********************                   |
   1048576 -> 2097151    : 55368    |**********************                  |
   2097152 -> 4194303    : 41036    |****************                        |
   4194304 -> 8388607    : 24927    |**********                              |
   8388608 -> 16777215   : 33       |                                        |
  16777216 -> 33554431   : 9        |                                        |

avg = 623599 nsecs, total: 397259314759 nsecs, count: 637042

There's a little variation in the amount of calls done because of timing
of the threads with metadata requirements, but the avg, total, and
count's are relatively consistent between runs (usually within 2-5% of
each other).  As you can see here we have around a 30% decrease in
average latency with a 30% decrease in overall time spent in
find_free_extent.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03 15:09:46 +01:00
..
tests btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
acl.c overlayfs update for 5.15 2021-09-02 09:21:27 -07:00
async-thread.c btrfs: fix memory ordering between normal and ordered work functions 2021-11-16 16:50:23 +01:00
async-thread.h
backref.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
backref.h btrfs: remove ignore_offset argument from btrfs_find_all_roots() 2021-08-23 13:19:01 +02:00
block-group.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
block-group.h btrfs: fix deadlock between chunk allocation and chunk btree modifications 2021-10-26 19:08:07 +02:00
block-rsv.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
block-rsv.h btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
btrfs_inode.h btrfs: only copy dir index keys when logging a directory 2022-01-03 15:09:42 +01:00
check-integrity.c btrfs: check-integrity: stop storing the block device name in btrfsic_dev_state 2021-10-26 19:08:07 +02:00
check-integrity.h
compression.c for-5.16-tag 2021-11-01 12:48:25 -07:00
compression.h btrfs: determine stripe boundary at bio allocation time in btrfs_submit_compressed_write 2021-10-26 19:08:04 +02:00
ctree.c btrfs: rename btrfs_item_end_nr to btrfs_item_data_end 2022-01-03 15:09:43 +01:00
ctree.h btrfs: get rid of root->orphan_cleanup_state 2022-01-03 15:09:45 +01:00
delalloc-space.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
delalloc-space.h
delayed-inode.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
delayed-inode.h
delayed-ref.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
delayed-ref.h btrfs: make btrfs_ref::real_root optional 2021-10-26 19:08:06 +02:00
dev-replace.c btrfs: zoned: cache reported zone during mount 2022-01-03 15:09:44 +01:00
dev-replace.h
dir-item.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
discard.c
discard.h
disk-io.c btrfs: get rid of root->orphan_cleanup_state 2022-01-03 15:09:45 +01:00
disk-io.h btrfs: make btrfs_super_block size match BTRFS_SUPER_INFO_SIZE 2021-10-26 19:08:07 +02:00
export.c
export.h
extent-io-tree.h
extent-tree.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
extent_io.c btrfs: remove unnecessary @nr_written parameters 2022-01-03 15:09:45 +01:00
extent_io.h btrfs: cleanup for extent_write_locked_range() 2021-10-26 19:08:04 +02:00
extent_map.c btrfs: rename btrfs_bio to btrfs_io_context 2021-10-26 19:08:02 +02:00
extent_map.h
file-item.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
file.c btrfs: fix deadlock due to page faults during direct IO reads and writes 2021-11-09 13:46:07 +01:00
free-space-cache.c btrfs: index free space entries on size 2022-01-03 15:09:46 +01:00
free-space-cache.h btrfs: index free space entries on size 2022-01-03 15:09:46 +01:00
free-space-tree.c btrfs: fix invalid delayed ref after subvolume creation failure 2021-12-15 17:07:33 +01:00
free-space-tree.h
inode-item.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
inode.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
ioctl.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
Kconfig
locking.c
locking.h btrfs: assert that extent buffers are write locked instead of only locked 2021-10-26 19:08:02 +02:00
lzo.c for-5.16-rc2-tag 2021-11-26 11:24:32 -08:00
Makefile btrfs: initial fsverity support 2021-08-23 13:19:09 +02:00
misc.h btrfs: use correct header for div_u64 in misc.h 2021-09-07 14:29:50 +02:00
ordered-data.c btrfs: zoned: fix double counting of split ordered extent 2021-09-07 14:30:41 +02:00
ordered-data.h btrfs: remove uptodate parameter from btrfs_dec_test_first_ordered_pending 2021-08-23 13:19:02 +02:00
orphan.c
print-tree.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
print-tree.h
props.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
props.h
qgroup.c btrfs: fix deadlock between quota enable and other quota operations 2022-01-03 15:09:41 +01:00
qgroup.h
raid56.c btrfs: remove btrfs_raid_bio::fs_info member 2021-10-26 19:08:03 +02:00
raid56.h btrfs: remove btrfs_raid_bio::fs_info member 2021-10-26 19:08:03 +02:00
rcu-string.h
reada.c btrfs: rename btrfs_bio to btrfs_io_context 2021-10-26 19:08:02 +02:00
ref-verify.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
ref-verify.h
reflink.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
reflink.h
relocation.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
root-tree.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
scrub.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
send.c btrfs: send: remove unused type parameter to iterate_inode_ref_t 2022-01-03 15:09:44 +01:00
send.h btrfs: send: prepare for v2 protocol 2021-10-29 12:38:43 +02:00
space-info.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
space-info.h btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
struct-funcs.c btrfs: add special case to setget helpers for 64k pages 2021-08-23 13:18:58 +02:00
subpage.c btrfs: handle page locking in btrfs_page_end_writer_lock with no writers 2021-10-26 19:08:05 +02:00
subpage.h btrfs: rework page locking in __extent_writepage() 2021-10-26 19:08:05 +02:00
super.c btrfs: add a BTRFS_FS_ERROR helper 2021-10-26 19:08:05 +02:00
sysfs.c btrfs: sysfs: convert scnprintf and snprintf to sysfs_emit 2021-10-26 19:08:07 +02:00
sysfs.h
transaction.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
transaction.h
tree-checker.c btrfs: rename btrfs_item_end_nr to btrfs_item_data_end 2022-01-03 15:09:43 +01:00
tree-checker.h
tree-defrag.c
tree-log.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
tree-log.h btrfs: change error handling for btrfs_delete_*_in_log 2021-10-26 19:08:05 +02:00
tree-mod-log.c
tree-mod-log.h
ulist.c
ulist.h
uuid-tree.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
verity.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
volumes.c btrfs: zoned: cache reported zone during mount 2022-01-03 15:09:44 +01:00
volumes.h btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls 2021-10-26 19:08:07 +02:00
xattr.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
xattr.h
zlib.c Revert "btrfs: compression: drop kmap/kunmap from zlib" 2021-10-29 13:03:05 +02:00
zoned.c btrfs: zoned: cache reported zone during mount 2022-01-03 15:09:44 +01:00
zoned.h btrfs: zoned: cache reported zone during mount 2022-01-03 15:09:44 +01:00
zstd.c lib: zstd: Add kernel-specific API 2021-11-08 16:55:21 -08:00