Let's drop the inode from the donation list when there is no other
open file.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The new mount api will execute .parse_param, .init_fs_context, .get_tree
and will call .remount if remount happened. So we add the necessary
functions for the fs_context_operations. If .init_fs_context is added,
the old .mount should remove.
See Documentation/filesystems/mount_api.rst for more information.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[hongbo: context modified]
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The handle_mount_opt() helper is used to parse mount parameters,
and so we can rename this function to f2fs_parse_param() and set
it as .param_param in fs_context_operations.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The new mount api separates option parsing and super block setup
into two distinct steps and so we need to separate the options
parsing out of the parse_options().
In order to achieve this, here we handle the mount options with
three steps:
- Firstly, we move sb/sbi out of handle_mount_opt.
As the former patch introduced f2fs_fs_context, so we record
the changed mount options in this context. In handle_mount_opt,
sb/sbi is null, so we should move all relative code out of
handle_mount_opt (thus, some check case which use sb/sbi should
move out).
- Secondly, we introduce the some check helpers to keep the option
consistent.
During filling superblock period, sb/sbi are ready. So we check
the f2fs_fs_context which holds the mount options base on sb/sbi.
- Thirdly, we apply the new mount options to sb/sbi.
After checking the f2fs_fs_context, all changed on mount options
are valid. So we can apply them to sb/sbi directly.
After do these, option parsing and super block setting have been
decoupled. Also it should have retained the original execution
flow.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port, minor fixes and updates]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[hongbo: minor fixes]
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
At the parsing phase of mouont in the new mount api, options
value will be recorded with the context, and then it will be
used in fill_super and other helpers.
Note that, this is a temporary status, we want remove the sb
and sbi usages in handle_mount_opt. So here the f2fs_fs_context
only records the mount options, it will be copied in sb/sbi in
later process. (At this point in the series, mount options are
temporarily not set during mount.)
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port, minor fixes and updates]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[hongbo: minor cleanup]
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
At the parsing phase of the new mount api, sbi will not be
available. So here allows sbi to be NULL in f2fs log helpers
and use that in handle_mount_opt().
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In handle_mount_opt, we use fs_parameter to parse each option.
However we're still using the old API to get the options string.
Using fsparams parse_options allows us to remove many of the Opt_
enums, so remove them.
The checkpoint disable cap (or percent) involves rather complex
parsing; we retain the old match_table mechanism for this, which
handles it well.
There are some changes about parsing options:
1. For `active_logs`, `inline_xattr_size` and `fault_injection`,
we use s32 type according the internal structure to record the
option's value.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port, minor fixes and updates]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[hongbo: minor cleanup]
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use an array of `fs_parameter_spec` called f2fs_param_specs to
hold the mount option specifications for the new mount api.
Add constant_table structures for several options to facilitate
parsing.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port, minor fixes and updates, more fsparam_enum]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes minor typos in comments in f2fs.
Signed-off-by: Swarna Prabhu <s.prabhu@samsung.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces /sys/fs/f2fs/<dev>/reserved_pin_section for tuning
@needed parameter of has_not_enough_free_secs(), if we configure it w/
zero, it can avoid f2fs_gc() as much as possible while fallocating on
pinned file.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: wangzijie <wangzijie1@honor.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Introduce a new fault type FAULT_VMALLOC to simulate no memory error in
f2fs_vmalloc().
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
mapping_read_folio_gfp() will return a folio, it should always be
uptodate, let's check folio uptodate status to detect any potenial
bug.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since __f2fs_crc32() now calls crc32() directly, it no longer uses its
sbi argument. Remove that, and simplify its callers accordingly.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This fixes an analogus bug that was fixed in modern filesystems:
a) xfs in commit 4b8d867ca6 ("xfs: don't over-report free space or
inodes in statvfs")
b) ext4 in commit f87d3af741 ("ext4: don't over-report free space
or inodes in statvfs")
where statfs can report misleading / incorrect information where
project quota is enabled, and the free space is less than the
remaining quota.
This commit will resolve a test failure in generic/762 which tests
for this bug.
generic/762 - output mismatch (see /share/git/fstests/results//generic/762.out.bad)
--- tests/generic/762.out 2025-04-15 10:21:53.371067071 +0800
+++ /share/git/fstests/results//generic/762.out.bad 2025-05-13 16:13:37.000000000 +0800
@@ -6,8 +6,10 @@
root blocks2 is in range
dir blocks2 is in range
root bavail2 is in range
-dir bavail2 is in range
+dir bavail2 has value of 1539066
+dir bavail2 is NOT in range 304734.87 .. 310891.13
root blocks3 is in range
...
(Run 'diff -u /share/git/fstests/tests/generic/762.out /share/git/fstests/results//generic/762.out.bad' to see the entire diff)
HINT: You _MAY_ be missing kernel fix:
XXXXXXXXXXXXXX xfs: don't over-report free space or inodes in statvfs
Cc: stable@kernel.org
Fixes: ddc34e328d ("f2fs: introduce f2fs_statfs_project")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
folio_index is only needed for mixed usage of page cache and swap
cache, for pure page cache usage, the caller can just use
folio->index instead.
It can't be a swap cache folio here. Swap mapping may only call into fs
through `swap_rw` but f2fs does not use that method for swap.
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org> (maintainer:F2FS FILE SYSTEM)
Cc: Chao Yu <chao@kernel.org> (maintainer:F2FS FILE SYSTEM)
Cc: linux-f2fs-devel@lists.sourceforge.net (open list:F2FS FILE SYSTEM)
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Support to inject a timeout fault into function, currently it only
support to inject timeout to commit_atomic_write flow to reproduce
inconsistent bug, like the bug fixed by commit f098aeba04 ("f2fs:
fix to avoid atomicity corruption of atomic file").
By default, the new type fault will inject 1000ms timeout, and the
timeout process can be interrupted by SIGKILL.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
A zoned device can has both conventional zones and sequential zones,
so we should not treat first segment of zoned device as first_zoned_segno,
instead, we need to check zone type for each zone during traversing zoned
device to find first_zoned_segno.
Otherwise, for below case, first_zoned_segno will be 0, which could be
wrong.
create_null_blk 512 2 1024 1024
mkfs.f2fs -m /dev/nullb0
Testcase:
export SCRIPTS_PATH=/share/git/scripts
test multiple devices w/ zoned device
for ((i=0;i<8;i++)) do {
zonesize=$((2<<$i))
conzone=$((4096/$zonesize))
seqzone=$((4096/$zonesize))
$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
mkfs.f2fs -f -m /dev/vdb -c /dev/nullb0
mount /dev/vdb /mnt/f2fs
touch /mnt/f2fs/file
f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
stat /mnt/f2fs/file
df
cat /proc/fs/f2fs/vdb/segment_info
umount /mnt/f2fs
$SCRIPTS_PATH/nullblk_remove.sh 0
} done
test single zoned device
for ((i=0;i<8;i++)) do {
zonesize=$((2<<$i))
conzone=$((4096/$zonesize))
seqzone=$((4096/$zonesize))
$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
mkfs.f2fs -f -m /dev/nullb0
mount /dev/nullb0 /mnt/f2fs
touch /mnt/f2fs/file
f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
stat /mnt/f2fs/file
df
cat /proc/fs/f2fs/nullb0/segment_info
umount /mnt/f2fs
$SCRIPTS_PATH/nullblk_remove.sh 0
} done
Fixes: 9703d69d9d ("f2fs: support file pinning for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
w/ below testcase, resize will generate a corrupted image which
contains inconsistent metadata, so when mounting such image, it
will trigger kernel panic:
touch img
truncate -s $((512*1024*1024*1024)) img
mkfs.f2fs -f img $((256*1024*1024))
resize.f2fs -s -i img -t $((1024*1024*1024))
mount img /mnt/f2fs
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.h:863!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 11 UID: 0 PID: 3922 Comm: mount Not tainted 6.15.0-rc1+ #191 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:f2fs_ra_meta_pages+0x47c/0x490
Call Trace:
f2fs_build_segment_manager+0x11c3/0x2600
f2fs_fill_super+0xe97/0x2840
mount_bdev+0xf4/0x140
legacy_get_tree+0x2b/0x50
vfs_get_tree+0x29/0xd0
path_mount+0x487/0xaf0
__x64_sys_mount+0x116/0x150
do_syscall_64+0x82/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fdbfde1bcfe
The reaseon is:
sit_i->bitmap_size is 192, so size of sit bitmap is 192*8=1536, at maximum
there are 1536 sit blocks, however MAIN_SEGS is 261893, so that sit_blk_cnt
is 4762, build_sit_entries() -> current_sit_addr() tries to access
out-of-boundary in sit_bitmap at offset from [1536, 4762), once sit_bitmap
and sit_bitmap_mirror is not the same, it will trigger f2fs_bug_on().
Let's add sanity check in f2fs_sanity_check_ckpt() to avoid panic.
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Support arbitrary size folios and remove a few hidden calls to
compound_head(). Also remove an unnecessary test of the uptodaate flag;
if mapping_read_folio_gfp() cannot bring the folio uptodate, it will
return an error.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
1. After we start atomic write in a database file, before committing
all data, we'd better not set inode w/ vfs dirty status to avoid
redundant updates, instead, we only set inode w/ atomic dirty status.
2. After we commit all data, before committing metadata, we need to
clear atomic dirty status, and set vfs dirty status to allow vfs flush
dirty inode.
Cc: Daeho Jeong <daehojeong@google.com>
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When we update inject type via sysfs, it shows wrong rate value as
below, there is a same problem when we update inject rate, fix it.
Before:
F2FS-fs (vdd): build fault injection attr: rate: 0, type: 0xffff
F2FS-fs (vdd): build fault injection attr: rate: 1, type: 0x0
After:
F2FS-fs (vdd): build fault injection type: 0x1
F2FS-fs (vdd): build fault injection rate: 1
Meanwhile, let's avoid turning on all fault types when we enable fault
injection via fault_injection mount option, it will lead to shutdown
filesystem or fail the mount() easily.
mount -o fault_injection=4 /dev/vdd /mnt/f2fs
F2FS-fs (vdd): build fault injection attr: rate: 4, type: 0x7fffff
F2FS-fs (vdd): inject kmalloc in f2fs_kmalloc of f2fs_fill_super+0xbdf/0x27c0
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In the case of the following call stack for an atomic file,
FI_DIRTY_INODE is set, but FI_ATOMIC_DIRTIED is not subsequently set.
f2fs_file_write_iter
f2fs_map_blocks
f2fs_reserve_new_blocks
inc_valid_block_count
__mark_inode_dirty(dquot)
f2fs_dirty_inode
If FI_ATOMIC_DIRTIED is not set, atomic file can encounter corruption
due to a mismatch between old file size and new data.
To resolve this issue, I changed to set FI_ATOMIC_DIRTIED when
FI_DIRTY_INODE is set. This ensures that FI_DIRTY_INODE, which was
previously cleared by the Writeback thread during the commit atomic, is
set and i_size is updated.
Cc: <stable@vger.kernel.org>
Fixes: fccaa81de8 ("f2fs: prevent atomic file from being dirtied before commit")
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With the new mount API the sb will not be available during initial option
parsing, which will happen before fill_super reads sb from disk.
Now that the sb is no longer directly referenced in parse_options, switch
it to use sbi.
(Note that all calls to f2fs_sb_has_* originating from parse_options will
need to be deferred to later before we can use the new mount API.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With the new mount api we will not have the superblock available during
option parsing. Prepare for this by passing *sbi rather than *sb.
For now, we are parsing after fill_super has been done, so sbi->sb will
exist. Under the new mount API this will require more care, but do the
simple change for now.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Defer the readonly-vs-norecovery check until after option parsing is done
so that option parsing does not require an active superblock for the test.
Add a helpful message, while we're at it.
(I think could be moved back into parsing after we switch to the new mount
API if desired, as the fs context will have RO state available.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This removes another sb instance from parse_options()
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Set LAZYTIME into sbi during parsing, and transfer it to the sb in
fill_super, so that an sb is not required during option parsing.
(Note: While lazytime is normally handled via mount flag in the vfs,
some f2fs users do expect to be able to use it as an explicit mount
option string via the mount syscall, so this option must remain.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Set INLINECRYPT into sbi during parsing, and transfer it to the sb in
fill_super, so that an sb is not required during option parsing.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The current options parsing function both parses options and validates
them - factor the validation out to reduce the size of the function and
make transition to the new mount API possible, because under the new mount
API, options are parsed one at a time, and cannot all be tested at the end
of the parsing function.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When certain build-time options are disabled, some mount options are not
accepted. For quota and compression, all related options are dismissed
with a single error message. For xattr, acl, and fault injection, each
option is handled individually. In addition, inline_xattr_size was missed
when CONFIG_F2FS_FS_XATTR was disabled.
Collapse xattr, acl, and fault injection errors into a single string, for
simplicity, and handle the missing inline_xattr_size case.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Rather than using F2FS_HAS_FEATURE directly, use f2fs_sb_has_device_alias
macro during option parsing for consistency.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For several zoned storage devices, vendors will provide extra space
which was used for device level GC than specs and F2FS can use this
space for filesystem level GC. To do that, we can reserve the space
using reserved_blocks. However, it is not enough, since this extra
space should not be shown to users. So, with this new sysfs node,
we can hide the space by substracting reserved_blocks from total
bytes.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Introduce a new mount option "nat_bits" to control nat_bits feature,
by default nat_bits feature is disabled.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs_recover_quota_begin() and f2fs_recover_quota_end() should be called
in pair, there is some cases we may skip calling f2fs_recover_quota_end(),
fix it.
Fixes: e1bb7d3d9c ("f2fs: fix to recover quota data correctly")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If checkpoint was disabled, we missed to fix the write pointers.
Cc: <stable@vger.kernel.org>
Fixes: 1015035609 ("f2fs: fix changing cursegs if recovery fails on zoned device")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch records POSIX_FADV_NOREUSE ranges for users to reclaim the caches
instantly off from LRU.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
To show call stack, so that we can see who causes critical error, note
that it won't call dump_stack() for shutdown path.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
F2FS-fs (dm-59): checkpoint=enable has some unwritten data.
------------[ cut here ]------------
WARNING: CPU: 6 PID: 8013 at fs/quota/dquot.c:691 dquot_writeback_dquots+0x2fc/0x308
pc : dquot_writeback_dquots+0x2fc/0x308
lr : f2fs_quota_sync+0xcc/0x1c4
Call trace:
dquot_writeback_dquots+0x2fc/0x308
f2fs_quota_sync+0xcc/0x1c4
f2fs_write_checkpoint+0x3d4/0x9b0
f2fs_issue_checkpoint+0x1bc/0x2c0
f2fs_sync_fs+0x54/0x150
f2fs_do_sync_file+0x2f8/0x814
__f2fs_ioctl+0x1960/0x3244
f2fs_ioctl+0x54/0xe0
__arm64_sys_ioctl+0xa8/0xe4
invoke_syscall+0x58/0x114
checkpoint and f2fs_remount may race as below, resulting triggering warning
in dquot_writeback_dquots().
atomic write remount
- do_remount
- down_write(&sb->s_umount);
- f2fs_remount
- ioctl
- f2fs_do_sync_file
- f2fs_sync_fs
- f2fs_write_checkpoint
- block_operations
- locked = down_read_trylock(&sbi->sb->s_umount)
: fail to lock due to the write lock was held by remount
- up_write(&sb->s_umount);
- f2fs_quota_sync
- dquot_writeback_dquots
- WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount))
: trigger warning because s_umount lock was unlocked by remount
If checkpoint comes from mount/umount/remount/freeze/quotactl, caller of
checkpoint has already held s_umount lock, calling dquot_writeback_dquots()
in the context should be safe.
So let's record task to sbi->umount_lock_holder, so that checkpoint can
know whether the lock has held in the context or not by checking current
w/ it.
In addition, in order to not misrepresent caller of checkpoint, we should
not allow to trigger async checkpoint for those callers: mount/umount/remount/
freeze/quotactl.
Fixes: af033b2aa8 ("f2fs: guarantee journalled quota data by checkpoint")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now that the crc32() library function takes advantage of
architecture-specific optimizations, it is unnecessary to go through the
crypto API. Just use crc32(). This is much simpler, and it improves
performance due to eliminating the crypto API overhead.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-19-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Piergiorgio reported a bug in bugzilla as below:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330
RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Call Trace:
__issue_discard_cmd+0x1ca/0x350 [f2fs]
issue_discard_thread+0x191/0x480 [f2fs]
kthread+0xcf/0x100
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1a/0x30
w/ below testcase, it can reproduce this bug quickly:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs
The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.
This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().
Cc: stable@vger.kernel.org
Fixes: 35ec7d5748 ("f2fs: split discard command in prior to block layer")
Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
SBI_POR_DOING can be cleared after recovery is completed, so that
changes made before recovery can be persistent, and subsequent
errors can be recorded into cp/sb.
Signed-off-by: Song Feng <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fsync data recovery attempts to check and fix write pointer consistency
of cursegs and all other zones. If the write pointers of cursegs are
unaligned, cursegs are changed to new sections.
If recovery fails, zone write pointers are still checked and fixed,
but the latest checkpoint cannot be written back. Additionally, retry-
mount skips recovery and rolls back to reuse the old cursegs whose
zones are already finished. This can lead to unaligned write later.
This patch addresses the issue by leaving writer pointers untouched if
recovery fails. When retry-mount is performed, cursegs and other zones
are checked and fixed after skipping recovery.
Signed-off-by: Song Feng <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The unusable cap value must be adjusted before checking whether
checkpoint=disable is feasible.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It missed to cast variable to unsigned long long type before
bit shift, which will cause overflow, fix it.
Fixes: f7ef9b83b5 ("f2fs: introduce macros to convert bytes and blocks in f2fs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
strcpy is deprecated. Kernel docs recommend replacing strcpy with
strscpy. The function strcpy() return value isn't used so there
shouldn't be an issue replacing with the safer alternative strscpy.
Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>