Commit graph

939 commits

Author SHA1 Message Date
Linus Torvalds
0974f486f3 f2fs-for-6.17-rc1
In this round, we've mainly updated three parts: 1) folio conversion by Matthew,
 2) switch to a new mount API by Hongbo and Eric, and 3) several sysfs entries
 to tune GCs for ZUFS with finer granularity by Daeho. There are also patches
 to address bugs and issues in the existing features such as GCs, file pinning,
 write-while-dio-read, contingous block allocation, and memory access violations.
 
 Enhancement:
  - switch to new mount API and folio conversion
  - add sysfs nodes to controle F2FS GCs for ZUFS
  - improve performance on the nat entry cache
  - drop inode from the donation list when the last file is closed
  - avoid splitting bio when reading multiple pages
 
 Bug fix:
  - fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
  - make sure zoned device GC to use FG_GC in shortage of free section
  - fix to calculate dirty data during has_not_enough_free_secs()
  - fix to update upper_p in __get_secs_required() correctly
  - wait for inflight dio completion, excluding pinned files read using dio
  - don't break allocation when crossing contiguous sections
  - vm_unmap_ram() may be called from an invalid context
  - fix to avoid out-of-boundary access in dnode page
  - fix to avoid panic in f2fs_evict_inode
  - fix to avoid UAF in f2fs_sync_inode_meta()
  - fix to use f2fs_is_valid_blkaddr_raw() in do_write_page()
  - fix UAF of f2fs_inode_info in f2fs_free_dic
  - fix to avoid invalid wait context issue
  - fix bio memleak when committing super block
  - handle nat.blkaddr corruption in f2fs_get_node_info()
 
 In addition, there are also clean-ups and minor bug fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmiRJ+oACgkQQBSofoJI
 UNInMA//ekJJCf/0UyMYiPA9ag4KBb/VA0VaVJbw6BA/DoT5ZII6+lCIfllyELbk
 78+ZppTrKq5OyImwiajcNijEwyDbh/asfUu+uNVsC85fjoboiBgDGVHbUEtSQ20Q
 5JVXIL5PhDDVGdVNPh57ijYK/PxhzBPaFNuaGECYrqnWhkQEb//HmN20KRfzcOjZ
 19QnOyEh0HED/izMjLhtZaCBQP53kfB7VjhTxMdY86l6IZ22gJHPRrnqBQHRTfyb
 iHcMJj4WRd7SpvbD/6bSdnUfpxOYPIm3GwQHdG46cHBEH1scnyQxx2OULlSLUbz6
 yeiG36jcuQQWOev8ikBjNzfAozD0VvUAulPpfIbAoHc5jBYkA1sP3N7JOiao1H4Z
 FnPgw/FyIQE+d9NkbyeVW+6f9WfmKlJlIJ4zKoURbZvARYCZKmiPiI9vPWWe18qV
 nchWniQMJ45TYsABUGmGJwTEe/SFaOkgLpLjAlzCy7ZY9/6LKVUlnxR0E1ZDcjSp
 5/E5fXQhds0Nn7F1jQXV3afxkECW+MNOLS/31ggL+ym6Pce3HPJCxBeRU4XaKrvA
 O0wP7n3g5jhVVWce0PBghF0mwTVVBwohTaUhL7lIIJMxKGkr4A8kH1j8tLLBdD3b
 hqcesDCtqqOZhogbwHXEgUDSikak4/1R1gDXnK0KhL1gg0Z6wR4=
 =XIPU
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "Three main updates: folio conversion by Matthew, switch to a new mount
  API by Hongbo and Eric, and several sysfs entries to tune GCs for ZUFS
  with finer granularity by Daeho.

  There are also patches to address bugs and issues in the existing
  features such as GCs, file pinning, write-while-dio-read, contingous
  block allocation, and memory access violations.

  Enhancements:
   - switch to new mount API and folio conversion
   - add sysfs nodes to controle F2FS GCs for ZUFS
   - improve performance on the nat entry cache
   - drop inode from the donation list when the last file is closed
   - avoid splitting bio when reading multiple pages

  Bug fixes:
   - fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
   - make sure zoned device GC to use FG_GC in shortage of free section
   - fix to calculate dirty data during has_not_enough_free_secs()
   - fix to update upper_p in __get_secs_required() correctly
   - wait for inflight dio completion, excluding pinned files read using dio
   - don't break allocation when crossing contiguous sections
   - vm_unmap_ram() may be called from an invalid context
   - fix to avoid out-of-boundary access in dnode page
   - fix to avoid panic in f2fs_evict_inode
   - fix to avoid UAF in f2fs_sync_inode_meta()
   - fix to use f2fs_is_valid_blkaddr_raw() in do_write_page()
   - fix UAF of f2fs_inode_info in f2fs_free_dic
   - fix to avoid invalid wait context issue
   - fix bio memleak when committing super block
   - handle nat.blkaddr corruption in f2fs_get_node_info()

  In addition, there are also clean-ups and minor bug fixes"

* tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (109 commits)
  f2fs: drop inode from the donation list when the last file is closed
  f2fs: add gc_boost_gc_greedy sysfs node
  f2fs: add gc_boost_gc_multiple sysfs node
  f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
  f2fs: fix to calculate dirty data during has_not_enough_free_secs()
  f2fs: fix to update upper_p in __get_secs_required() correctly
  f2fs: directly add newly allocated pre-dirty nat entry to dirty set list
  f2fs: avoid redundant clean nat entry move in lru list
  f2fs: zone: wait for inflight dio completion, excluding pinned files read using dio
  f2fs: ignore valid ratio when free section count is low
  f2fs: don't break allocation when crossing contiguous sections
  f2fs: remove unnecessary tracepoint enabled check
  f2fs: merge the two conditions to avoid code duplication
  f2fs: vm_unmap_ram() may be called from an invalid context
  f2fs: fix to avoid out-of-boundary access in dnode page
  f2fs: switch to the new mount api
  f2fs: introduce fs_context_operation structure
  f2fs: separate the options parsing and options checking
  f2fs: Add f2fs_fs_context to record the mount options
  f2fs: Allow sbi to be NULL in f2fs_printk
  ...
2025-08-04 16:27:21 -07:00
Chao Yu
1005a3ca28 f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
w/ "mode=lfs" mount option, generic/299 will cause system panic as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2835!
Call Trace:
 <TASK>
 f2fs_allocate_data_block+0x6f4/0xc50
 f2fs_map_blocks+0x970/0x1550
 f2fs_iomap_begin+0xb2/0x1e0
 iomap_iter+0x1d6/0x430
 __iomap_dio_rw+0x208/0x9a0
 f2fs_file_write_iter+0x6b3/0xfa0
 aio_write+0x15d/0x2e0
 io_submit_one+0x55e/0xab0
 __x64_sys_io_submit+0xa5/0x230
 do_syscall_64+0x84/0x2f0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0010:new_curseg+0x70f/0x720

The root cause of we run out-of-space is: in f2fs_map_blocks(), f2fs may
trigger foreground gc only if it allocates any physical block, it will be
a little bit later when there is multiple threads writing data w/
aio/dio/bufio method in parallel, since we always use OPU in lfs mode, so
f2fs_map_blocks() does block allocations aggressively.

In order to fix this issue, let's give a chance to trigger foreground
gc in prior to block allocation in f2fs_map_blocks().

Fixes: 36abef4e79 ("f2fs: introduce mode=lfs mount option")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-28 16:36:54 +00:00
Chao Yu
f0a7adfedc f2fs: don't break allocation when crossing contiguous sections
Commit 0638a3197c ("f2fs: avoid unused block when dio write in LFS
mode") has fixed unused block issue for dio write in lfs mode.

However, f2fs_map_blocks() may break and return smaller extent when
last allocated block locates in the end of section, even allocator
can allocate contiguous blocks across sections.

Actually, for the case that allocator returns a block address which is
not contiguous w/ current extent, we can record the block address in
iomap->private, in the next round, skip reallocating for the last
allocated block, then we can fix unused block issue, meanwhile, also,
we can allocates contiguous physical blocks as much as possible for dio
write in lfs mode.

Testcase:
- mkfs.f2fs -f /dev/vdb
- mount -o mode=lfs /dev/vdb /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=3; sync;
- dd if=/dev/zero of=/mnt/f2fs/dio bs=2M count=1 oflag=direct;
- umount /mnt/f2fs

Before:
f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 0, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0
f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 256, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0
f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 512, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0
f2fs_map_blocks: dev = (253,16), ino = 5, file offset = 0, start blkaddr = 0x4700, len = 0x100, flags = 3, seg_type = 1, may_create = 1, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 5, file offset = 256, start blkaddr = 0x4800, len = 0x100, flags = 3, seg_type = 1, may_create = 1, multidevice = 0, flag = 3, err = 0

After:
f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 0, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0
f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 256, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0
f2fs_map_blocks: dev = (253,16), ino = 4, file offset = 512, start blkaddr = 0x0, len = 0x100, flags = 1, seg_type = 8, may_create = 1, multidevice = 0, flag = 5, err = 0
f2fs_map_blocks: dev = (253,16), ino = 5, file offset = 0, start blkaddr = 0x4700, len = 0x200, flags = 3, seg_type = 1, may_create = 1, multidevice = 0, flag = 3, err = 0

Cc: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-24 20:18:59 +00:00
Jan Prusakowski
08a7efc5b0 f2fs: vm_unmap_ram() may be called from an invalid context
When testing F2FS with xfstests using UFS backed virtual disks the
kernel complains sometimes that f2fs_release_decomp_mem() calls
vm_unmap_ram() from an invalid context. Example trace from
f2fs/007 test:

f2fs/007 5s ...  [12:59:38][    8.902525] run fstests f2fs/007
[   11.468026] BUG: sleeping function called from invalid context at mm/vmalloc.c:2978
[   11.471849] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 68, name: irq/22-ufshcd
[   11.475357] preempt_count: 1, expected: 0
[   11.476970] RCU nest depth: 0, expected: 0
[   11.478531] CPU: 0 UID: 0 PID: 68 Comm: irq/22-ufshcd Tainted: G        W           6.16.0-rc5-xfstests-ufs-g40f92e79b0aa #9 PREEMPT(none)
[   11.478535] Tainted: [W]=WARN
[   11.478536] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   11.478537] Call Trace:
[   11.478543]  <TASK>
[   11.478545]  dump_stack_lvl+0x4e/0x70
[   11.478554]  __might_resched.cold+0xaf/0xbe
[   11.478557]  vm_unmap_ram+0x21/0xb0
[   11.478560]  f2fs_release_decomp_mem+0x59/0x80
[   11.478563]  f2fs_free_dic+0x18/0x1a0
[   11.478565]  f2fs_finish_read_bio+0xd7/0x290
[   11.478570]  blk_update_request+0xec/0x3b0
[   11.478574]  ? sbitmap_queue_clear+0x3b/0x60
[   11.478576]  scsi_end_request+0x27/0x1a0
[   11.478582]  scsi_io_completion+0x40/0x300
[   11.478583]  ufshcd_mcq_poll_cqe_lock+0xa3/0xe0
[   11.478588]  ufshcd_sl_intr+0x194/0x1f0
[   11.478592]  ufshcd_threaded_intr+0x68/0xb0
[   11.478594]  ? __pfx_irq_thread_fn+0x10/0x10
[   11.478599]  irq_thread_fn+0x20/0x60
[   11.478602]  ? __pfx_irq_thread_fn+0x10/0x10
[   11.478603]  irq_thread+0xb9/0x180
[   11.478605]  ? __pfx_irq_thread_dtor+0x10/0x10
[   11.478607]  ? __pfx_irq_thread+0x10/0x10
[   11.478609]  kthread+0x10a/0x230
[   11.478614]  ? __pfx_kthread+0x10/0x10
[   11.478615]  ret_from_fork+0x7e/0xd0
[   11.478619]  ? __pfx_kthread+0x10/0x10
[   11.478621]  ret_from_fork_asm+0x1a/0x30
[   11.478623]  </TASK>

This patch modifies in_task() check inside f2fs_read_end_io() to also
check if interrupts are disabled. This ensures that pages are unmapped
asynchronously in an interrupt handler.

Fixes: bff139b49d ("f2fs: handle decompress only post processing in softirq")
Signed-off-by: Jan Prusakowski <jprusakowski@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-24 20:09:50 +00:00
Matthew Wilcox (Oracle)
5fb60c0365 f2fs: Pass a folio to __has_merged_page()
All three callers have a folio so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:08 +00:00
Matthew Wilcox (Oracle)
06e42bf432 f2fs: Pass a folio to f2fs_submit_merged_write_cond()
Most callers pass NULL, and the one that passes a page already has a
folio.  Also convert __submit_merged_write_cond() to take a folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:05 +00:00
Matthew Wilcox (Oracle)
7695f8ccf6 f2fs: Remove use of page from f2fs_write_single_data_page()
Both remaining uses of page now have a folio equivalent.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:03 +00:00
Matthew Wilcox (Oracle)
6974b21f70 f2fs: Remove clear_page_private_all()
All callers can simply call folio_detach_private().  This was the
only way that clear_page_private_data() could be called, so remove
that too.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:00 +00:00
Matthew Wilcox (Oracle)
0f54eec0cb f2fs: Use F2FS_F_SB() in f2fs_read_end_io()
Get the folio from the bio instead of the page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:57 +00:00
Matthew Wilcox (Oracle)
9e3d138737 f2fs: Pass a folio to f2fs_is_compressed_page()
All callers now have a folio so pass it in.  Also remove the test for
the private flag; it is redundant with checking folio->private for being
NULL.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:38 +00:00
Matthew Wilcox (Oracle)
cabda16223 f2fs: Use a folio iterator in f2fs_verify_bio()
Change from bio_for_each_segment_all() to bio_for_each_folio_all()
to iterate over each folio instead of each page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:35 +00:00
Matthew Wilcox (Oracle)
587b2df524 f2fs: Pass a folio to f2fs_end_read_compressed_page()
Both callers now have a folio so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:32 +00:00
Matthew Wilcox (Oracle)
a9249a2671 f2fs: Use a folio iterator in f2fs_handle_step_decompress()
Change from bio_for_each_segment_all() to bio_for_each_folio_all()
to iterate over each folio instead of each page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:30 +00:00
Matthew Wilcox (Oracle)
d6966e7ed2 f2fs: Pass a folio to WB_DATA_TYPE() and f2fs_is_cp_guaranteed()
All callers now have a folio so pass it in.  Removes a call to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:27 +00:00
Matthew Wilcox (Oracle)
fec9035417 f2fs: Use a bio in f2fs_submit_page_write()
Convert bio_page to bio_folio and use it throughout.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:24 +00:00
Matthew Wilcox (Oracle)
5e2a00e6e0 f2fs: Use a folio in f2fs_merge_page_bio()
We have two folios to deal with here; one carries the metadata and the
other points to the data.  They may be the same, but if it's compressed,
the data_folio will differ from the metadata folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:21 +00:00
Matthew Wilcox (Oracle)
ca8049c99f f2fs: Pass a folio to f2fs_compress_write_end_io()
The only caller has a folio so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:18 +00:00
Matthew Wilcox (Oracle)
a824388d91 f2fs: Use a folio in f2fs_is_cp_guaranteed()
Convert the passed page to a folio and use it throughout.  Removes
a use of fscrypt_is_bounce_page(), which we're trying to remove.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:09 +00:00
Matthew Wilcox (Oracle)
4ecaf580ee f2fs: Add folio counterparts to page_private_flags functions
Name these new functions folio_test_f2fs_*(), folio_set_f2fs_*() and
folio_clear_f2fs_*().  Convert all callers which currently have a folio
and cast back to a page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:05 +00:00
Matthew Wilcox (Oracle)
ad38574a8e f2fs: Pass a folio to ADDRS_PER_PAGE()
All callers now have a folio so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:56:59 +00:00
Matthew Wilcox (Oracle)
d342b7adad f2fs: Add fio->folio
Put fio->page insto a union with fio->folio.  This lets us remove a
lot of folio->page and page->folio conversions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:56:39 +00:00
Matthew Wilcox (Oracle)
a63f2de2dd f2fs: Pass a folio to nid_of_node()
All callers have a folio so pass it in.  Also make the argument const
as the function does not modify it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:55:50 +00:00
Matthew Wilcox (Oracle)
28fde0d7ff f2fs: Pass a folio to ino_of_node()
All callers have a folio so pass it in.  Also make the argument const
as the function does not modify it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:55:47 +00:00
Taotao Chen
e9d8e2bf23
fs: change write_begin/write_end interface to take struct kiocb *
Change the address_space_operations callbacks write_begin() and
write_end() to take struct kiocb * as the first argument instead of
struct file *.

Update all affected function prototypes, implementations, call sites,
and related documentation across VFS, filesystems, and block layer.

Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.

Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-16 14:48:18 +02:00
Jianan Huang
185f203a69 f2fs: avoid splitting bio when reading multiple pages
When fewer pages are read, nr_pages may be smaller than nr_cpages. Due
to the nr_vecs limit, the compressed pages will be split into multiple
bios and then merged at the block level. In this case, nr_cpages should
be used to pre-allocate bvecs.
To handle this case, align max_nr_pages to cluster_size, which should be
enough for all compressed pages.

Signed-off-by: Jianan Huang <huangjianan@xiaomi.com>
Signed-off-by: Sheng Yong <shengyong1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-01 16:22:07 +00:00
Chao Yu
68e7f31eec f2fs: clean up to check bi_status w/ BLK_STS_OK
Check bi_status w/ BLK_STS_OK instead of 0 for cleanup.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 16:03:39 +00:00
Chao Yu
019a891242 f2fs: introduce is_{meta,node}_folio
Just cleanup, no changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 16:03:26 +00:00
Christoph Hellwig
84c5d16711 f2fs: always unlock the page in f2fs_write_single_data_page
Consolidate the code to unlock the page in f2fs_write_single_data_page
instead of leaving it to the callers for the AOP_WRITEPAGE_ACTIVATE case.
Replace AOP_WRITEPAGE_ACTIVATE with a positive return of 1 as this case
now doesn't match the historic ->writepage special return code that is
on it's way out now that ->writepage has been removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-08 15:23:18 +00:00
Christoph Hellwig
402dd9f02c f2fs: remove wbc->for_reclaim handling
Since commits 7ff0104a80 ("f2fs: Remove f2fs_write_node_page()") and
3b47398d98 ("f2fs: Remove f2fs_write_meta_page()'), f2fs can't be
called from reclaim context any more.  Remove all code keyed of the
wbc->for_reclaim flag, which is now only set for writing out swap or
shmem pages inside the swap code, but never passed to file systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-08 15:22:45 +00:00
Kairui Song
0427e811c9 f2fs: drop usage of folio_index
folio_index is only needed for mixed usage of page cache and swap
cache, for pure page cache usage, the caller can just use
folio->index instead.

It can't be a swap cache folio here.  Swap mapping may only call into fs
through `swap_rw` but f2fs does not use that method for swap.

Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org> (maintainer:F2FS FILE SYSTEM)
Cc: Chao Yu <chao@kernel.org> (maintainer:F2FS FILE SYSTEM)
Cc: linux-f2fs-devel@lists.sourceforge.net (open list:F2FS FILE SYSTEM)
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06 15:46:55 +00:00
Chao Yu
dc6d9ef57f f2fs: zone: fix to calculate first_zoned_segno correctly
A zoned device can has both conventional zones and sequential zones,
so we should not treat first segment of zoned device as first_zoned_segno,
instead, we need to check zone type for each zone during traversing zoned
device to find first_zoned_segno.

Otherwise, for below case, first_zoned_segno will be 0, which could be
wrong.

create_null_blk 512 2 1024 1024
mkfs.f2fs -m /dev/nullb0

Testcase:

export SCRIPTS_PATH=/share/git/scripts

test multiple devices w/ zoned device
for ((i=0;i<8;i++)) do {
	zonesize=$((2<<$i))
	conzone=$((4096/$zonesize))
	seqzone=$((4096/$zonesize))
	$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
	mkfs.f2fs -f -m /dev/vdb -c /dev/nullb0
	mount /dev/vdb /mnt/f2fs
	touch /mnt/f2fs/file
	f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
	stat /mnt/f2fs/file
	df
	cat /proc/fs/f2fs/vdb/segment_info
	umount /mnt/f2fs
	$SCRIPTS_PATH/nullblk_remove.sh 0
} done

test single zoned device
for ((i=0;i<8;i++)) do {
	zonesize=$((2<<$i))
	conzone=$((4096/$zonesize))
	seqzone=$((4096/$zonesize))
	$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
	mkfs.f2fs -f -m /dev/nullb0
	mount /dev/nullb0 /mnt/f2fs
	touch /mnt/f2fs/file
	f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
	stat /mnt/f2fs/file
	df
	cat /proc/fs/f2fs/nullb0/segment_info
	umount /mnt/f2fs
	$SCRIPTS_PATH/nullblk_remove.sh 0
} done

Fixes: 9703d69d9d ("f2fs: support file pinning for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Chao Yu
aa1be8dd64 f2fs: fix to detect gcing page in f2fs_is_cp_guaranteed()
Jan Prusakowski reported a f2fs bug as below:

f2fs/007 will hang kernel during testing w/ below configs:

kernel 6.12.18 (from pixel-kernel/android16-6.12)
export MKFS_OPTIONS="-O encrypt -O extra_attr -O project_quota -O quota"
export F2FS_MOUNT_OPTIONS="test_dummy_encryption,discard,fsync_mode=nobarrier,reserve_root=32768,checkpoint_merge,atgc"

cat /proc/<umount_proc_id>/stack
f2fs_wait_on_all_pages+0xa3/0x130
do_checkpoint+0x40c/0x5d0
f2fs_write_checkpoint+0x258/0x550
kill_f2fs_super+0x14f/0x190
deactivate_locked_super+0x30/0xb0
cleanup_mnt+0xba/0x150
task_work_run+0x59/0xa0
syscall_exit_to_user_mode+0x12d/0x130
do_syscall_64+0x57/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e

cat /sys/kernel/debug/f2fs/status

  - IO_W (CP: -256, Data:  256, Flush: (   0    0    1), Discard: (   0    0)) cmd:    0 undiscard:   0

CP IOs reference count becomes negative.

The root cause is:

After 4961acdd65 ("f2fs: fix to tag gcing flag on page during block
migration"), we will tag page w/ gcing flag for raw page of cluster
during its migration.

However, if the inode is both encrypted and compressed, during
ioc_decompress(), it will tag page w/ gcing flag, and it increase
F2FS_WB_DATA reference count:
- f2fs_write_multi_page
 - f2fs_write_raw_page
  - f2fs_write_single_page
   - do_write_page
    - f2fs_submit_page_write
     - WB_DATA_TYPE(bio_page, fio->compressed_page)
     : bio_page is encrypted, so mapping is NULL, and fio->compressed_page
       is NULL, it returns F2FS_WB_DATA
     - inc_page_count(.., F2FS_WB_DATA)

Then, during end_io(), it decrease F2FS_WB_CP_DATA reference count:
- f2fs_write_end_io
 - f2fs_compress_write_end_io
  - fscrypt_pagecache_folio
  : get raw page from encrypted page
  - WB_DATA_TYPE(&folio->page, false)
  : raw page has gcing flag, it returns F2FS_WB_CP_DATA
  - dec_page_count(.., F2FS_WB_CP_DATA)

In order to fix this issue, we need to detect gcing flag in raw page
in f2fs_is_cp_guaranteed().

Fixes: 4961acdd65 ("f2fs: fix to tag gcing flag on page during block migration")
Reported-by: Jan Prusakowski <jprusakowski@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Chao Yu
0c708e35cf f2fs: clean up w/ fscrypt_is_bounce_page()
Just cleanup, no logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Matthew Wilcox (Oracle)
963da02bc1 f2fs: Convert fsync_node_entry->page to folio
Convert all callers to set/get a folio instead of a page.  Removes
five calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle)
7d28f13c58 f2fs: Pass a folio to get_dnode_addr()
All callers except __get_inode_rdev() and __set_inode_rdev() now have a
folio, but the only callers of those two functions do have a folio, so
pass the folio to them and then into get_dnode_addr().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle)
6f7ec66180 f2fs: Convert dnode_of_data->node_page to node_folio
All assignments to this struct member are conversions from a folio
so convert it to be a folio and convert all users.  At the same time,
convert data_blkaddr() to take a folio as all callers now have a folio.
Remove eight calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle)
b02a903218 f2fs: Use a folio in f2fs_encrypt_one_page()
Fetch a folio from the page cache instead of a page.  Removes two calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle)
842974808a f2fs: Convert f2fs_load_compressed_page() to f2fs_load_compressed_folio()
The only caller already has a folio, so pass it in.  Copy the entire
size of the folio to support large block sizes.  Remove two calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle)
1d6bf61778 f2fs: Convert f2fs_put_page_dic() to f2fs_put_folio_dic()
The only caller has a folio, so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:45 +00:00
Matthew Wilcox (Oracle)
848839ce05 f2fs: Pass a folio to f2fs_do_read_inline_data()
All callers now have a folio, so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:43 +00:00
Matthew Wilcox (Oracle)
f1d54e07a9 f2fs: Convert dnode_of_data->inode_page to inode_folio
Also rename inode_page_locked to inode_folio_locked.  Removes five
calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:43 +00:00
Matthew Wilcox (Oracle)
6023048cf6 f2fs: Convert f2fs_convert_inline_page() to f2fs_convert_inline_folio()
Both callers have a folio, so pass it in.  Removes seven calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:39 +00:00
Matthew Wilcox (Oracle)
214235c224 f2fs: Pass folios to set_new_dnode()
Removes a lot of conversions of folios into pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:39 +00:00
Matthew Wilcox (Oracle)
bdbf142204 f2fs: Pass a folio to make_empty_dir()
Pass the folio into make_empty_dir() and then into
f2fs_get_new_data_folio().  Removes a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
0e1717dd92 f2fs: Use a folio in __find_data_block()
Remove a call to f2fs_get_inode_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
c68b0bcb29 f2fs: Use a folio in prepare_write_begin
Remove a call to f2fs_get_inode_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:38 +00:00
Matthew Wilcox (Oracle)
514163f699 f2fs: Use a folio in f2fs_xattr_fiemap()
Remove four hidden calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:34 +00:00
Matthew Wilcox (Oracle)
d2eb6d86e0 f2fs: Remove f2fs_get_new_data_page()
All callers have been converted to call f2fs_get_new_data_folio()
so delete this wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:34 +00:00
Matthew Wilcox (Oracle)
48b6894305 f2fs: Add f2fs_get_new_data_folio()
Convert f2fs_get_new_data_page() into f2fs_get_new_data_folio() and
add a f2fs_get_new_data_page() wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:33 +00:00
Matthew Wilcox (Oracle)
38f273c504 f2fs: Use a folio in f2fs_migrate_blocks()
Get a folio from the pagecache and use it throughout.  Removes two
calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:32 +00:00