linux/fs
Qu Wenruo c28ea613fa btrfs: subpage: fix the false data csum mismatch error
[BUG]
When running fstresss, we can hit strange data csum mismatch where the
on-disk data is in fact correct (passes scrub).

With some extra debug info added, we have the following traces:

  0482us: btrfs_do_readpage: root=5 ino=284 offset=393216, submit force=0 pgoff=0 iosize=8192
  0494us: btrfs_do_readpage: root=5 ino=284 offset=401408, submit force=0 pgoff=8192 iosize=4096
  0498us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=393216 len=8192
  0591us: btrfs_do_readpage: root=5 ino=284 offset=405504, submit force=0 pgoff=12288 iosize=36864
  0594us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=401408 len=4096
  0863us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=405504 len=36864
  0933us: btrfs_verify_data_csum: root=5 ino=284 offset=393216 len=8192
  0967us: btrfs_do_readpage: root=5 ino=284 offset=442368, skip beyond isize pgoff=49152 iosize=16384
  1047us: btrfs_verify_data_csum: root=5 ino=284 offset=401408 len=4096
  1163us: btrfs_verify_data_csum: root=5 ino=284 offset=405504 len=36864
  1290us: check_data_csum: !!! root=5 ino=284 offset=438272 pg_off=45056 !!!
  7387us: end_bio_extent_readpage: root=5 ino=284 before pending_read_bios=0

[CAUSE]
Normally we expect all submitted bio reads to only touch the range we
specified, and under subpage context, it means we should only touch the
range specified in each bvec.

But in data read path, inside end_bio_extent_readpage(), we have page
zeroing which only takes regular page size into consideration.

This means for subpage if we have an inode whose content looks like below:

  0       16K     32K     48K     64K
  |///////|       |///////|       |

  |//| = data needs to be read from disk
  |  | = hole

And i_size is 64K initially.

Then the following race can happen:

		T1		|		T2
--------------------------------+--------------------------------
btrfs_do_readpage()		|
|- isize = 64K;			|
|  At this time, the isize is 	|
|  64K				|
|				|
|- submit_extent_page()		|
|  submit previous assembled bio|
|  assemble bio for [0, 16K)	|
|				|
|- submit_extent_page()		|
   submit read bio for [0, 16K) |
   assemble read bio for	|
   [32K, 48K)			|
 				|
				| btrfs_setsize()
				| |- i_size_write(, 16K);
				|    Now i_size is only 16K
end_io() for [0K, 16K)		|
|- end_bio_extent_readpage()	|
   |- btrfs_verify_data_csum()  |
   |  No csum error		|
   |- i_size = 16K;		|
   |- zero_user_segment(16K,	|
      PAGE_SIZE);		|
      !!! We zeroed range	|
      !!! [32K, 48K)		|
				| end_io for [32K, 48K)
				| |- end_bio_extent_readpage()
				|    |- btrfs_verify_data_csum()
				|       ! CSUM MISMATCH !
				|       ! As the range is zeroed now !

[FIX]
To fix the problem, make end_bio_extent_readpage() to only zero the
range of bvec.

The bug only affects subpage read-write support, as for full read-only
mount we can't change i_size thus won't hit the race condition.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 17:48:00 +01:00
..
9p 9p for 5.11-rc1 2020-12-21 10:28:02 -08:00
adfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
affs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
afs rxrpc: Fix deadlock around release of dst cached on udp tunnel 2021-01-29 21:38:11 -08:00
autofs file: Replace ksys_close with close_fd 2020-12-10 12:42:59 -06:00
befs [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
bfs bfs: don't use WARNING: string when it's just info. 2020-12-15 22:46:18 -08:00
btrfs btrfs: subpage: fix the false data csum mismatch error 2021-03-02 17:48:00 +01:00
cachefiles cachefiles: Drop superfluous readpages aops NULL check 2021-01-20 11:33:51 -08:00
ceph libceph, ceph: disambiguate ceph_connection_operations handlers 2021-01-04 17:31:32 +01:00
cifs cifs: report error instead of invalid when revalidating a dentry fails 2021-02-05 13:17:48 -06:00
coda
configfs configfs: fix kernel-doc markup issue 2020-11-14 10:22:45 +01:00
cramfs [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
crypto f2fs-for-5.11-rc1 2020-12-17 11:18:00 -08:00
debugfs debugfs: remove return value of debugfs_create_devm_seqfile() 2020-10-30 08:37:39 +01:00
devpts
dlm fs: dlm: check on existing node address 2020-11-10 12:14:20 -06:00
ecryptfs ecryptfs: fix uid translation for setxattr on security.capability 2021-01-26 01:47:14 +00:00
efivarfs efivarfs: revert "fix memory leak in efivarfs_create()" 2020-11-25 16:55:02 +01:00
efs [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
erofs erofs: avoid using generic_block_bmap 2020-12-10 11:07:40 +08:00
exfat exfat: Avoid allocating upcase table using kcalloc() 2020-12-22 12:31:17 +09:00
exportfs exportfs: Add a function to return the raw output from fh_to_dentry() 2020-12-09 09:39:38 -05:00
ext2 ext2: Fix fall-through warnings for Clang 2020-11-23 10:36:53 +01:00
ext4 A number of bug fixes for ext4: 2021-01-15 14:54:24 -08:00
f2fs f2fs-for-5.11-rc1 2020-12-17 11:18:00 -08:00
fat [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
freevxfs
fscache
fuse fuse: fix bad inode 2020-12-10 15:33:14 +01:00
gfs2 gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs only 2020-12-03 17:04:41 +01:00
hfs fs: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
hfsplus fs: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
hostfs fix hostfs_open() use of ->f_path.dentry 2020-12-21 21:42:29 -05:00
hpfs [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
hugetlbfs mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page 2021-02-05 11:03:47 -08:00
iomap iomap: support REQ_OP_ZONE_APPEND 2021-02-09 00:52:19 +01:00
isofs fs: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
jbd2 jbd2: add a helper to find out number of fast commit blocks 2020-12-17 13:30:45 -05:00
jffs2 jffs2: Fix NULL pointer dereference in rp_size fs option parsing 2020-12-13 21:57:21 +01:00
jfs jfs: Fix array index bounds check in dbAdjTree 2020-11-13 16:03:07 -06:00
kernfs kernfs: wire up ->splice_read and ->splice_write 2021-01-21 18:30:28 +01:00
lockd fs/lockd: convert comma to semicolon 2020-12-16 07:57:37 -05:00
minix [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
nfs pNFS/NFSv4: Improve rejection of out-of-order layouts 2021-01-24 20:52:31 -05:00
nfs_common nfs_common: need lock during iterate through the list 2020-12-09 09:38:34 -05:00
nfsd nfsd4: readdirplus shouldn't return parent of export 2021-01-12 08:54:14 -05:00
nilfs2 fs/nilfs2: remove some unused macros to tame gcc 2020-12-15 22:46:17 -08:00
nls
notify fanotify: Fix sys_fanotify_mark() on native x86-32 2020-12-28 11:58:59 +01:00
ntfs fs/ntfs: remove unused variable attr_len 2020-12-15 12:13:37 -08:00
ocfs2 ocfs2: ratelimit the 'max lookup times reached' notice 2020-12-15 12:13:37 -08:00
omfs fs: omfs: use kmemdup() rather than kmalloc+memcpy 2020-09-22 23:39:45 -04:00
openpromfs
orangefs orangefs: add splice file operations 2020-12-16 16:14:08 -05:00
overlayfs ovl: implement volatile-specific fsync error behaviour 2021-01-28 10:22:48 +01:00
proc proc_sysctl: fix oops caused by incorrect command parameters 2021-01-24 10:34:53 -08:00
pstore Tracing updates for 5.11 2020-12-17 13:22:17 -08:00
qnx4 [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
qnx6 [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
quota \n 2020-12-17 11:00:37 -08:00
ramfs ramfs: fix nommu mmap with gaps in the page cache 2020-10-16 11:11:22 -07:00
reiserfs reiserfs: add check for an invalid ih_entry_count 2020-11-26 16:57:28 +01:00
romfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
squashfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
sysfs sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output 2020-10-02 12:02:30 +02:00
sysv [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
tracefs
ubifs This pull request contains changes for JFFS2, UBI and UBIFS: 2020-12-17 17:46:34 -08:00
udf udf: fix the problem that the disc content is not displayed 2021-01-18 12:06:33 +01:00
ufs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
unicode unicode: Add utf8_casefold_hash 2020-09-10 14:03:31 -07:00
vboxsf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2020-10-15 15:11:56 -07:00
verity Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-12-14 12:18:19 -08:00
xfs New code for 5.11: 2020-12-18 12:50:18 -08:00
zonefs zonefs: select CONFIG_CRC32 2021-01-04 09:06:42 +09:00
aio.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 12:53:37 -08:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
binfmt_elf_fdpic.c binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot 2020-10-16 11:11:21 -07:00
binfmt_em86.c
binfmt_flat.c binfmt_flat: revert "binfmt_flat: don't offset the data start" 2020-08-24 08:49:13 +10:00
binfmt_misc.c
binfmt_script.c
block_dev.c Revert "block: simplify set_init_blocksize" to regain lost performance 2021-01-27 09:14:12 -07:00
buffer.c for-5.11/block-2020-12-14 2020-12-16 12:57:51 -08:00
char_dev.c
compat_binfmt_elf.c elf: Expose ELF header on arch_setup_additional_pages() 2020-10-26 13:46:47 +01:00
coredump.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
d_path.c fs: fix NULL dereference due to data race in prepend_path() 2020-10-14 14:54:45 -07:00
dax.c mm: simplify follow_pte{,pmd} 2020-12-15 22:46:19 -08:00
dcache.c fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set 2020-12-10 17:33:17 -05:00
dcookies.c
direct-io.c \n 2020-10-15 15:03:10 -07:00
drop_caches.c
eventfd.c eventfd: Export eventfd_ctx_do_read() 2020-11-15 09:49:10 -05:00
eventpoll.c epoll: add syscall epoll_pwait2 2020-12-19 11:18:38 -08:00
exec.c Merge branch 'parisc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2020-12-16 12:10:40 -08:00
fcntl.c fcntl: Fix potential deadlock in send_sig{io, urg}() 2020-11-05 07:44:15 -05:00
fhandle.c
file.c kernel/io_uring: cancel io_uring before task works 2020-12-30 19:36:54 -07:00
file_table.c epoll: take epitem list out of struct file 2020-10-25 20:02:08 -04:00
filesystems.c
fs-writeback.c fs: fix lazytime expiration handling in __writeback_single_inode() 2021-01-13 17:26:21 +01:00
fs_context.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
fs_parser.c fs_parse: mark fs_param_bad_value() as static 2020-10-13 18:38:27 -07:00
fs_pin.c
fs_struct.c vfs: Use sequence counter with associated spinlock 2020-07-29 16:14:27 +02:00
fs_types.c
fsopen.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
init.c init: add an init_dup helper 2020-08-04 21:02:38 -04:00
inode.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-12-25 10:54:29 -08:00
internal.h for-5.11/block-2020-12-14 2020-12-16 12:57:51 -08:00
io-wq.c io-wq: kill now unused io_wq_cancel_all() 2020-12-20 10:47:42 -07:00
io-wq.h io-wq: kill now unused io_wq_cancel_all() 2020-12-20 10:47:42 -07:00
io_uring.c io_uring: drop mm/files between task_work_submit 2021-02-04 12:42:58 -07:00
ioctl.c fs: remove ksys_ioctl 2020-07-31 08:16:01 +02:00
Kconfig tmpfs: support 64-bit inums per-sb 2020-08-07 11:33:24 -07:00
Kconfig.binfmt
kernel_read_file.c fs/kernel_file_read: Add "offset" arg for partial reads 2020-10-05 13:37:04 +02:00
libfs.c f2fs-for-5.11-rc1 2020-12-17 11:18:00 -08:00
locks.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
Makefile Refactored code for 5.10: 2020-10-23 11:33:41 -07:00
mbcache.c
mount.h mnt: Use generic ns_common::count 2020-08-19 14:14:19 +02:00
mpage.c
namei.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-12-25 10:54:29 -08:00
namespace.c umount(2): move the flag validity checks first 2021-01-04 15:31:58 -05:00
no-block.c
nsfs.c
open.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
pipe.c fs/pipe: allow sendfile() to pipe again 2021-01-25 12:32:26 -08:00
pnode.c
pnode.h fs/namespace.c: WARN if mnt_count has become negative 2020-12-10 17:33:17 -05:00
posix_acl.c
proc_namespace.c proc mountinfo: make splice available again 2020-12-27 12:00:36 -08:00
read_write.c Refactored code for 5.10: 2020-10-23 11:33:41 -07:00
readdir.c fs: remove ksys_getdents64 2020-07-31 08:16:00 +02:00
remap_range.c vfs: verify source area in vfs_dedupe_file_range_one() 2020-12-14 15:26:13 +01:00
select.c poll: fix performance regression due to out-of-line __put_user() 2021-01-08 11:06:29 -08:00
seq_file.c fix return values of seq_read_iter() 2020-11-15 22:12:53 -05:00
signalfd.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
splice.c io_uring-5.10-2020-10-24 2020-10-24 12:40:18 -07:00
stack.c
stat.c fs: remove KSTAT_QUERY_FLAGS 2020-09-26 22:55:05 -04:00
statfs.c block: remove i_bdev 2020-12-01 14:53:39 -07:00
super.c block: remove i_bdev 2020-12-01 14:53:39 -07:00
sync.c
timerfd.c
userfaultfd.c userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob 2020-12-15 12:13:46 -08:00
utimes.c fs: expose utimes_common 2020-07-31 08:16:01 +02:00
xattr.c vfs: move cap_convert_nscap() call into vfs_setxattr() 2020-12-14 15:26:13 +01:00