linux/fs/btrfs
Filipe Manana 8eaf40c0e2 Btrfs: fix race between block group removal and block group allocation
If a task is removing the block group that currently has the highest start
offset amongst all existing block groups, there is a short time window
where it races with a concurrent block group allocation, resulting in a
transaction abort with an error code of EEXIST.

The following diagram explains the race in detail:

      Task A                                                        Task B

 btrfs_remove_block_group(bg offset X)

   remove_extent_mapping(em offset X)
     -> removes extent map X from the
        tree of extent maps
        (fs_info->mapping_tree), so the
        next call to find_next_chunk()
        will return offset X

                                                   btrfs_alloc_chunk()
                                                     find_next_chunk()
                                                       --> returns offset X

                                                     __btrfs_alloc_chunk(offset X)
                                                       btrfs_make_block_group()
                                                         btrfs_create_block_group_cache()
                                                           --> creates btrfs_block_group_cache
                                                               object with a key corresponding
                                                               to the block group item in the
                                                               extent, the key is:
                                                               (offset X, BTRFS_BLOCK_GROUP_ITEM_KEY, 1G)

                                                         --> adds the btrfs_block_group_cache object
                                                             to the list new_bgs of the transaction
                                                             handle

                                                   btrfs_end_transaction(trans handle)
                                                     __btrfs_end_transaction()
                                                       btrfs_create_pending_block_groups()
                                                         --> sees the new btrfs_block_group_cache
                                                             in the new_bgs list of the transaction
                                                             handle
                                                         --> its call to btrfs_insert_item() fails
                                                             with -EEXIST when attempting to insert
                                                             the block group item key
                                                             (offset X, BTRFS_BLOCK_GROUP_ITEM_KEY, 1G)
                                                             because task A has not removed that key yet
                                                         --> aborts the running transaction with
                                                             error -EEXIST

   btrfs_del_item()
     -> removes the block group's key from
        the extent tree, key is
        (offset X, BTRFS_BLOCK_GROUP_ITEM_KEY, 1G)

A sample transaction abort trace:

  [78912.403537] ------------[ cut here ]------------
  [78912.403811] BTRFS: Transaction aborted (error -17)
  [78912.404082] WARNING: CPU: 2 PID: 20465 at fs/btrfs/extent-tree.c:10551 btrfs_create_pending_block_groups+0x196/0x250 [btrfs]
  (...)
  [78912.405642] CPU: 2 PID: 20465 Comm: btrfs Tainted: G        W         5.0.0-btrfs-next-46 #1
  [78912.405941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
  [78912.406586] RIP: 0010:btrfs_create_pending_block_groups+0x196/0x250 [btrfs]
  (...)
  [78912.407636] RSP: 0018:ffff9d3d4b7e3b08 EFLAGS: 00010282
  [78912.407997] RAX: 0000000000000000 RBX: ffff90959a3796f0 RCX: 0000000000000006
  [78912.408369] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff909636b16860
  [78912.408746] RBP: ffff909626758a58 R08: 0000000000000000 R09: 0000000000000000
  [78912.409144] R10: ffff9095ff462400 R11: 0000000000000000 R12: ffff90959a379588
  [78912.409521] R13: ffff909626758ab0 R14: ffff9095036c0000 R15: ffff9095299e1158
  [78912.409899] FS:  00007f387f16f700(0000) GS:ffff909636b00000(0000) knlGS:0000000000000000
  [78912.410285] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [78912.410673] CR2: 00007f429fc87cbc CR3: 000000014440a004 CR4: 00000000003606e0
  [78912.411095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [78912.411496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [78912.411898] Call Trace:
  [78912.412318]  __btrfs_end_transaction+0x5b/0x1c0 [btrfs]
  [78912.412746]  btrfs_inc_block_group_ro+0xcf/0x160 [btrfs]
  [78912.413179]  scrub_enumerate_chunks+0x188/0x5b0 [btrfs]
  [78912.413622]  ? __mutex_unlock_slowpath+0x100/0x2a0
  [78912.414078]  btrfs_scrub_dev+0x2ef/0x720 [btrfs]
  [78912.414535]  ? __sb_start_write+0xd4/0x1c0
  [78912.414963]  ? mnt_want_write_file+0x24/0x50
  [78912.415403]  btrfs_ioctl+0x17fb/0x3120 [btrfs]
  [78912.415832]  ? lock_acquire+0xa6/0x190
  [78912.416256]  ? do_vfs_ioctl+0xa2/0x6f0
  [78912.416685]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [78912.417116]  do_vfs_ioctl+0xa2/0x6f0
  [78912.417534]  ? __fget+0x113/0x200
  [78912.417954]  ksys_ioctl+0x70/0x80
  [78912.418369]  __x64_sys_ioctl+0x16/0x20
  [78912.418812]  do_syscall_64+0x60/0x1b0
  [78912.419231]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [78912.419644] RIP: 0033:0x7f3880252dd7
  (...)
  [78912.420957] RSP: 002b:00007f387f16ed68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [78912.421426] RAX: ffffffffffffffda RBX: 000055f5becc1df0 RCX: 00007f3880252dd7
  [78912.421889] RDX: 000055f5becc1df0 RSI: 00000000c400941b RDI: 0000000000000003
  [78912.422354] RBP: 0000000000000000 R08: 00007f387f16f700 R09: 0000000000000000
  [78912.422790] R10: 00007f387f16f700 R11: 0000000000000246 R12: 0000000000000000
  [78912.423202] R13: 00007ffda49c266f R14: 0000000000000000 R15: 00007f388145e040
  [78912.425505] ---[ end trace eb9bfe7c426fc4d3 ]---

Fix this by calling remove_extent_mapping(), at btrfs_remove_block_group(),
only at the very end, after removing the block group item key from the
extent tree (and removing the free space tree entry if we are using the
free space tree feature).

Fixes: 04216820fe ("Btrfs: fix race between fs trimming and block group remove/allocation")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-06-12 15:55:42 +02:00
..
tests btrfs: get fs_info from block group in search_free_space_info 2019-04-29 19:02:46 +02:00
acl.c btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
async-thread.c btrfs: simplify workqueue name when allocating 2019-02-25 14:13:24 +01:00
async-thread.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
backref.c Btrfs: do not start a transaction during fiemap 2019-04-29 19:02:51 +02:00
backref.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
btrfs_inode.h Btrfs: improve performance on fsync of files with multiple hardlinks 2019-04-29 19:02:52 +02:00
check-integrity.c btrfs: Fix typos in comments and strings 2018-12-17 14:51:50 +01:00
check-integrity.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
compression.c btrfs: Check the compression level before getting a workspace 2019-05-03 18:21:25 +02:00
compression.h btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
ctree.c btrfs: ctree: Dump the leaf before BUG_ON in btrfs_set_item_key_safe 2019-04-29 19:02:52 +02:00
ctree.h btrfs: track DIO bytes in flight 2019-04-29 19:25:37 +02:00
dedupe.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
delayed-inode.c btrfs: get fs_info from eb in btrfs_leaf_free_space 2019-04-29 19:02:30 +02:00
delayed-inode.h Btrfs: delayed-inode: use rb_first_cached for ins_root and del_root 2018-10-15 17:23:33 +02:00
delayed-ref.c btrfs: remove unused parameter fs_info from btrfs_add_delayed_extent_op 2019-04-29 19:02:51 +02:00
delayed-ref.h btrfs: remove unused parameter fs_info from btrfs_add_delayed_extent_op 2019-04-29 19:02:51 +02:00
dev-replace.c btrfs: Ensure replaced device doesn't have pending chunk allocation 2019-05-28 18:54:00 +02:00
dev-replace.h btrfs: get fs_info from trans in btrfs_run_dev_replace 2019-04-29 19:02:43 +02:00
dir-item.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
disk-io.c btrfs: track DIO bytes in flight 2019-04-29 19:25:37 +02:00
disk-io.h btrfs: get fs_info from trans in btrfs_create_tree 2019-04-29 19:02:41 +02:00
export.c btrfs: Remove 'objectid' member from struct btrfs_root 2018-10-15 17:23:25 +02:00
export.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
extent-tree.c Btrfs: fix race between block group removal and block group allocation 2019-06-12 15:55:42 +02:00
extent_io.c btrfs: remove unused parameter fs_info from emit_last_fiemap_cache 2019-04-29 19:02:51 +02:00
extent_io.h btrfs: Remove bio_offset argument from submit_bio_hook 2019-04-29 19:02:47 +02:00
extent_map.c btrfs: Optimize unallocated chunks discard 2019-04-29 19:02:38 +02:00
extent_map.h btrfs: Remove impossible condition from mergable_maps 2019-02-25 14:13:21 +01:00
file-item.c btrfs: Document btrfs_csum_one_bio 2019-04-29 19:02:52 +02:00
file.c Btrfs: fix race between ranged fsync and writeback of adjacent ranges 2019-05-16 14:31:13 +02:00
free-space-cache.c btrfs: get fs_info from block group in btrfs_find_space_cluster 2019-04-29 19:02:46 +02:00
free-space-cache.h btrfs: get fs_info from block group in btrfs_find_space_cluster 2019-04-29 19:02:46 +02:00
free-space-tree.c btrfs: get fs_info from block group in search_free_space_info 2019-04-29 19:02:46 +02:00
free-space-tree.h btrfs: get fs_info from block group in search_free_space_info 2019-04-29 19:02:46 +02:00
inode-item.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
inode-map.c btrfs: prune unused includes 2018-08-06 13:12:43 +02:00
inode-map.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
inode.c Btrfs: fix wrong ctime and mtime of a directory after log replay 2019-05-28 19:16:16 +02:00
ioctl.c btrfs: drop local copy of inode i_mode 2019-04-29 19:02:53 +02:00
Kconfig btrfs: add SPDX header to Kconfig 2018-04-12 16:29:55 +02:00
locking.c btrfs: trace: Introduce trace events for all btrfs tree locking events 2019-04-29 19:02:43 +02:00
locking.h btrfs: merge btrfs_set_lock_blocking_rw with it's caller 2019-02-25 14:13:28 +01:00
lzo.c btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
Makefile
math.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
ordered-data.c btrfs: track DIO bytes in flight 2019-04-29 19:25:37 +02:00
ordered-data.h btrfs: Remove redundant inode argument from btrfs_add_ordered_sum 2019-04-29 19:02:40 +02:00
orphan.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
print-tree.c btrfs: get fs_info from eb in btrfs_leaf_free_space 2019-04-29 19:02:30 +02:00
print-tree.h btrfs: print-tree: debugging output enhancement 2018-04-20 19:18:16 +02:00
props.c btrfs: use the existing reserved items for our first prop for inheritance 2019-05-09 11:18:14 +02:00
props.h btrfs: delete unused function btrfs_set_prop_trans 2019-04-29 19:02:54 +02:00
qgroup.c btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer dereference 2019-05-28 18:54:10 +02:00
qgroup.h btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record 2019-02-25 14:13:39 +01:00
raid56.c for-5.1-rc2-tag 2019-03-26 10:32:13 -07:00
raid56.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
rcu-string.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
reada.c btrfs: dev-replace: open code trivial locking helpers 2018-12-17 14:51:45 +01:00
ref-verify.c btrfs: ref-verify: Use btrfs_ref to refactor btrfs_ref_tree_mod() 2019-04-29 19:02:49 +02:00
ref-verify.h btrfs: ref-verify: Use btrfs_ref to refactor btrfs_ref_tree_mod() 2019-04-29 19:02:49 +02:00
relocation.c btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON() 2019-05-28 18:54:10 +02:00
root-tree.c Btrfs: do not abort transaction at btrfs_update_root() after failure to COW path 2019-05-09 11:25:27 +02:00
scrub.c btrfs: get fs_info from device in btrfs_scrub_cancel_dev 2019-04-29 19:02:47 +02:00
send.c Btrfs: incremental send, fix emission of invalid clone operations 2019-05-28 18:54:10 +02:00
send.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
struct-funcs.c btrfs: prune unused includes 2018-08-06 13:12:43 +02:00
super.c btrfs: drop unused parameter in mount_subvol 2019-04-29 19:02:35 +02:00
sysfs.c btrfs: sysfs: don't leak memory when failing add fsid 2019-05-16 14:31:12 +02:00
sysfs.h btrfs: drop extra enum initialization where using defaults 2018-12-17 14:51:43 +01:00
transaction.c Btrfs: remove no longer used member num_dirty_bgs from transaction 2019-04-29 19:02:43 +02:00
transaction.h Btrfs: remove no longer used member num_dirty_bgs from transaction 2019-04-29 19:02:43 +02:00
tree-checker.c Btrfs: tree-checker: detect file extent items with overlapping ranges 2019-05-16 14:33:51 +02:00
tree-checker.h btrfs: get fs_info from eb in btrfs_check_chunk_valid 2019-04-29 19:02:39 +02:00
tree-defrag.c btrfs: open code now trivial btrfs_set_lock_blocking 2019-02-25 14:13:27 +01:00
tree-log.c Btrfs: fix race updating log root item during fsync 2019-05-28 19:26:46 +02:00
tree-log.h btrfs: get fs_info from trans in btrfs_set_log_full_commit 2019-04-29 19:02:41 +02:00
ulist.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
ulist.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
uuid-tree.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
volumes.c btrfs: get fs_info from device in btrfs_rm_dev_replace_free_srcdev 2019-04-29 19:02:48 +02:00
volumes.h btrfs: get fs_info from device in btrfs_rm_dev_replace_free_srcdev 2019-04-29 19:02:48 +02:00
xattr.c btrfs: start transaction in xattr_handler_set_prop 2019-04-29 19:02:54 +02:00
xattr.h btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
zlib.c btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
zstd.c btrfs: correct zstd workspace manager lock to use spin_lock_bh() 2019-05-28 18:54:09 +02:00