linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-08-05 16:54:27 +00:00

Author	SHA1	Message	Date
Qu Wenruo	cc38d178ff	btrfs: enable large data folio support under CONFIG_BTRFS_EXPERIMENTAL With all the preparation patches already merged, it's pretty easy to enable large data folios: - Remove the ASSERT() on folio size in btrfs_end_repair_bio() - Add a helper to properly set the max folio order Currently due to several call sites that are fetching the bitmap content directly into an unsigned long, we can only support BITS_PER_LONG blocks for each bitmap. - Call the helper when reading/creating an inode The support has the following limitations: - No large folios for data reloc inode The relocation code still requires page sized folio. But it's not that hot nor common compared to regular buffered ios. Will be improved in the future. - Requires CONFIG_BTRFS_EXPERIMENTAL - Will require all folio related operations to check if it needs the extra btrfs_subpage structure Now any folio larger than block size will need btrfs_subpage structure handling. Unfortunately I do not have a physical machine for performance test, but if everything goes like XFS/EXT4, it should mostly bring single digits percentage performance improvement in the real world. Although I believe there are still quite some optimizations to be done, let's focus on testing the current large data folio support first. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-07-21 23:53:30 +02:00
Qu Wenruo	582cd4bad4	btrfs: rename btrfs_subpage structure With the incoming large data folios support, the structure name btrfs_subpage is no longer correct, as for we can have multiple blocks inside a large folio, and the block size is still page size. So to follow the schema of iomap, rename btrfs_subpage to btrfs_folio_state, along with involved enums. There are still exported functions with "btrfs_subpage_" prefix, and I believe for metadata the name "subpage" will stay forever as we will never allocate a folio larger than nodesize anyway. The full cleanup of the word "subpage" will happen in much smaller steps in the future. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-07-21 23:53:27 +02:00
Qu Wenruo	1e17738d6b	btrfs: add comments on the extra btrfs specific subpage bitmaps Unlike the iomap_folio_state structure, the btrfs_subpage structure has a lot of extra sub-bitmaps, namely: - writeback sub-bitmap - locked sub-bitmap iomap_folio_state uses an atomic for writeback tracking, while it has no per-block locked tracking. This is because iomap always locks a single folio, and submits dirty blocks with that folio locked. But btrfs has async delalloc ranges (for compression), which are queued with their range locked, until the compression is done, then marks the involved range writeback and unlocked. This means a range can be unlocked and marked writeback at seemingly random timing, thus it needs the extra tracking. This needs a huge rework on the lifespan of async delalloc range before we can remove/simplify these two sub-bitmaps. - ordered sub-bitmap - checked sub-bitmap These are for COW-fixup, but as I mentioned in the past, the COW-fixup is not really needed anymore and these two flags are already marked deprecated, and will be removed in the near future after comprehensive tests. Add related comments to indicate we're actively trying to align the sub-bitmaps to the iomap ones. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-07-21 23:53:27 +02:00
Qu Wenruo	a416637f90	btrfs: replace PAGE_SIZE with folio_size for subpage.[ch] Since we can no longer assume all data filemap folios are page sized, use proper folio_size() calls to determine the folio size, as a preparation for future large data filemap folios. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:52 +01:00
Qu Wenruo	cb3c11d2f5	btrfs: add a size parameter to btrfs_alloc_subpage() Since we can no longer assume page sized folio for data filemap folios, allow btrfs_alloc_subpage() to accept a new parameter, @fsize, indicating the folio size. This doesn't follow the regular behavior of passing a folio directly, because this function is shared by both data and metadata folios, and for metadata folios we have extra allocation policy to ensure no large folios whose sizes are larger than nodesize (unless it's page sized). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:52 +01:00
Qu Wenruo	4c14d5c855	btrfs: subpage: make btrfs_is_subpage() check against a folio To support large data folios, we can no longer assume every filemap folio is page sized. So btrfs_is_subpage() check must be done against a folio. Thankfully for metadata folios, we have the full control and ensure a large folio will not be large than nodesize, so btrfs_meta_is_subpage() doesn't need this change. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:52 +01:00
Qu Wenruo	306a75e647	btrfs: allow debug builds to accept 2K block size Currently we only support two block sizes, 4K and PAGE_SIZE. This means on the most common architecture x86_64, we have no way to test subpage block size. And that's exactly I have an aarch64 machine dedicated for subpage tests. But this is still a hurdle for a lot of btrfs developers, and to improve the test coverage mostly on x86_64, here we enable debug builds to accept 2K block size. This involves: - Introduce a dedicated minimal block size macro BTRFS_MIN_BLOCKSIZE, which depends on if CONFIG_BTRFS_DEBUG is set. If so it's 2K, otherwise it's 4K as usual. - Allow 4K, PAGE_SIZE and BTRFS_MIN_BLOCKSIZE as block size - Update subpage block size checks to be based on BTRFS_MIN_BLOCKSIZE - Export the new supported blocksize through sysfs interfaces As most of the subpage support is already pretty mature, there is no extra work needed to support the extra 2K block size. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:49 +01:00
David Sterba	b7226ce6c4	btrfs: simplify parameters of metadata folio helpers Unlike folio helpers for date the ones for metadata always take the extent buffer start and length, so they can be simplified to take the eb only. The fs_info can be obtained from eb too so it can be dropped as parameter. Added in patch "btrfs: use metadata specific helpers to simplify extent buffer helpers". Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:47 +01:00
Qu Wenruo	fcc384be06	btrfs: require strict data/metadata split for subpage checks Since we have btrfs_meta_is_subpage(), we should make btrfs_is_subpage() to be data inode specific. This change involves: - Simplify btrfs_is_subpage() Now we only need to do a very simple sectorsize check against PAGE_SIZE. And since the function is pretty simple now, just make it an inline function. - Add an extra ASSERT() to make sure btrfs_is_subpage() is only called on data inode mapping - Migrate btree_csum_one_bio() to use btrfs_meta_folio_() helpers - Migrate alloc_extent_buffer() to use btrfs_meta_folio_() helpers - Migrate end_bbio_meta_write() to use btrfs_meta_folio_() helpers Or we will trigger the ASSERT() due to calling btrfs_folio_() on metadata folios. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:42 +01:00
Qu Wenruo	7895817b31	btrfs: simplify subpage handling of btrfs_clear_buffer_dirty() The function btrfs_clear_buffer_dirty() is called on dirty extent buffer that will not be written back. The function will call btree_clear_folio_dirty() to clear the folio dirty flag and also clear PAGECACHE_TAG_DIRTY flag. And we split the subpage and regular handling, as for subpage cases we should only clear PAGECACHE_TAG_DIRTY if the last dirty extent buffer in the page is cleared. So here we can simplify the function by: - Use the newly introduced btrfs_meta_folio_clear_and_test_dirty() helper The helper will return true if we cleared the folio dirty flag. With that we can use the same helper for both subpage and regular cases. - Rename btree_clear_folio_dirty() to btree_clear_folio_dirty_tag() As we move the folio dirty clearing in the btrfs_clear_buffer_dirty(). - Call btrfs_meta_folio_clear_and_test_dirty() to clear the dirty flags for both regular and subpage metadata cases - Only call btree_clear_folio_dirty_tag() when the folio is no longer dirty - Update the comment inside set_extent_buffer_dirty() As there is no separate clear_subpage_extent_buffer_dirty() anymore. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:42 +01:00
Qu Wenruo	ee76e5a742	btrfs: use metadata specific helpers to simplify extent buffer helpers The following functions are doing metadata specific checks: - set_extent_buffer_uptodate() - clear_extent_buffer_uptodate() The reason why we do not use btrfs_folio_() helpers for those helpers is, btrfs_is_subpage() cannot handle dummy extent buffer if nodesize >= PAGE_SIZE but block size < PAGE_SIZE. In that case, we do not need to attach extra bitmaps to the extent buffer folio. But since dummy extent buffer folios are not attached to btree inode, btrfs_is_subpage() will return true, causing problems. And the following are using btrfs_folio_() helpers for metadata, but in theory we should use metadata specific checks: - set_extent_buffer_dirty() This is not causing problems because a dummy extent buffer should never be marked dirty. To make code simpler, introduce btrfs_meta_folio_*() helpers, to do the metadata specific handling, so that we do not to open-code such checks in above involved functions. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:41 +01:00
Qu Wenruo	57a3212674	btrfs: make subpage attach and detach handle metadata properly Currently subpage attach/detach is not doing proper dummy extent buffer subpage check, as btrfs_is_subpage() is not reliable for dummy extent buffer folios. Since we have a metadata specific check now, use that for btrfs_attach_subpage() first. Then enhance btrfs_detach_subpage() to accept a type parameter, so that we can do extra checks for dummy extent buffers properly. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:41 +01:00
Qu Wenruo	f64e818153	btrfs: factor out metadata subpage detection into a dedicated helper Currently we have only one btrfs_is_subpage() to cover both data and metadata. But there is a special case for metadata: - dummy extent buffer, sector size < PAGE_SIZE and node size >= PAGE_SIZE In such case, btrfs_is_subpage() will return true for extent buffer folio. But that is not correct, and that's exactly why we have some open-coded checks for functions like set_extent_buffer_uptodate() and clear_extent_buffer_uptodate(). Just extract the metadata specific checks into a helper, and replace those call sites. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:41 +01:00
Qu Wenruo	c2b47df81c	btrfs: do proper folio cleanup when run_delalloc_nocow() failed [BUG] With CONFIG_DEBUG_VM set, test case generic/476 has some chance to crash with the following VM_BUG_ON_FOLIO(): BTRFS error (device dm-3): cow_file_range failed, start 1146880 end 1253375 len 106496 ret -28 BTRFS error (device dm-3): run_delalloc_nocow failed, start 1146880 end 1253375 len 106496 ret -28 page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664 aops:btrfs_aops [btrfs] ino:101 dentry name(?):"f1774" flags: 0x2fffff80004028(uptodate\|lru\|private\|node=0\|zone=2\|lastcpupid=0xfffff) page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio)) ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2992! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ #87 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : folio_clear_dirty_for_io+0x128/0x258 lr : folio_clear_dirty_for_io+0x128/0x258 Call trace: folio_clear_dirty_for_io+0x128/0x258 btrfs_folio_clamp_clear_dirty+0x80/0xd0 [btrfs] __process_folios_contig+0x154/0x268 [btrfs] extent_clear_unlock_delalloc+0x5c/0x80 [btrfs] run_delalloc_nocow+0x5f8/0x760 [btrfs] btrfs_run_delalloc_range+0xa8/0x220 [btrfs] writepage_delalloc+0x230/0x4c8 [btrfs] extent_writepage+0xb8/0x358 [btrfs] extent_write_cache_pages+0x21c/0x4e8 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x178/0x3a8 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] The first two lines of extra debug messages show the problem is caused by the error handling of run_delalloc_nocow(). E.g. we have the following dirtied range (4K blocksize 4K page size): 0 16K 32K \|//////////////////////////////////////\| \| Pre-allocated \| And the range [0, 16K) has a preallocated extent. - Enter run_delalloc_nocow() for range [0, 16K) Which found range [0, 16K) is preallocated, can do the proper NOCOW write. - Enter fallback_to_fow() for range [16K, 32K) Since the range [16K, 32K) is not backed by preallocated extent, we have to go COW. - cow_file_range() failed for range [16K, 32K) So cow_file_range() will do the clean up by clearing folio dirty, unlock the folios. Now the folios in range [16K, 32K) is unlocked. - Enter extent_clear_unlock_delalloc() from run_delalloc_nocow() Which is called with PAGE_START_WRITEBACK to start page writeback. But folios can only be marked writeback when it's properly locked, thus this triggered the VM_BUG_ON_FOLIO(). Furthermore there is another hidden but common bug that run_delalloc_nocow() is not clearing the folio dirty flags in its error handling path. This is the common bug shared between run_delalloc_nocow() and cow_file_range(). [FIX] - Clear folio dirty for range [@start, @cur_offset) Introduce a helper, cleanup_dirty_folios(), which will find and lock the folio in the range, clear the dirty flag and start/end the writeback, with the extra handling for the @locked_folio. - Introduce a helper to clear folio dirty, start and end writeback - Introduce a helper to record the last failed COW range end This is to trace which range we should skip, to avoid double unlocking. - Skip the failed COW range for the error handling CC: stable@vger.kernel.org Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-01-13 15:52:17 +01:00
Qu Wenruo	0f71202665	btrfs: rename btrfs_folio_(set\|start\|end)_writer_lock() Since there is no user of reader locks, rename the writer locks into a more generic name, by removing the "_writer" part from the name. And also rename btrfs_subpage::writer into btrfs_subpage::locked. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Qu Wenruo	336e69f302	btrfs: unify to use writer locks for subpage locking Since commit `d7172f52e9` ("btrfs: use per-buffer locking for extent_buffer reading"), metadata read no longer relies on the subpage reader locking. This means we do not need to maintain a different metadata/data split for locking, so we can convert the existing reader lock users by: - add_ra_bio_pages() Convert to btrfs_folio_set_writer_lock() - end_folio_read() Convert to btrfs_folio_end_writer_lock() - begin_folio_read() Convert to btrfs_folio_set_writer_lock() - folio_range_has_eb() Remove the subpage->readers checks, since it is always 0. - Remove btrfs_subpage_start_reader() and btrfs_subpage_end_reader() Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Qu Wenruo	8511074c42	btrfs: remove unused btrfs_folio_start_writer_lock() This function is not really suitable to lock a folio, as it lacks the proper mapping checks, thus the locked folio may not even belong to btrfs. And due to the above reason, the last user inside lock_delalloc_folios() is already removed, and we can remove this function. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Qu Wenruo	2bca8eb077	btrfs: move the delalloc range bitmap search into extent_io.c Currently for subpage (sector size < page size) cases, we reuse subpage locked bitmap to find out all delalloc ranges we have locked, and run all those found ranges. However such reuse is not perfect, e.g.: 0 32K 64K 96K 128K \| \|////////\|\|///////\| \|////\| 120K For above range, writepage_delalloc() for page 0 will handle the range [32K, 96k), note delalloc range can be beyond the page boundary. But writepage_delalloc() for page 64K will only handle range [120K, 128K), as the previous run on page 0 has already handled range [64K, 96K). Meanwhile for the writeback we should expect range [64K, 96K) to also be locked, this leads to the mismatch from locked bitmap and delalloc range. This is not causing problems yet, but it's still an inconsistent behavior. So instead of relying on the subpage locked bitmap, move the delalloc range search using local @delalloc_bitmap, so that we can remove the existing btrfs_folio_find_writer_locked(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:12 +01:00
Qu Wenruo	bd610c0937	btrfs: only unlock the to-be-submitted ranges inside a folio [SUBPAGE COMPRESSION LIMITS] Currently inside writepage_delalloc(), if a delalloc range is going to be submitted asynchronously (inline or compression, the page dirty/writeback/unlock are all handled in at different time, not at the submission time), then we return 1 and extent_writepage() will skip the submission. This is fine if every sector matches page size, but if a sector is smaller than page size (aka, subpage case), then it can be very problematic, for example for the following 64K page: 0 16K 32K 48K 64K \|/\| \|///////\| \|/\| \| \| 4K 52K Where \|/\| is the dirty range we need to submit. In the above case, we need the following different handling for the 3 ranges: - [0, 4K) needs to be submitted for regular write A single sector cannot be compressed. - [16K, 32K) needs to be submitted for compressed write - [48K, 52K) needs to be submitted for regular write. Above, if we try to submit [16K, 32K) for compressed write, we will return 1 and immediately, and without submitting the remaining [48K, 52K) range. Furthermore, since extent_writepage() will exit without unlocking any sectors, the submitted range [0, 4K) will not have sector unlocked. That's the reason why for now subpage is only allowed for full page range. [ENHANCEMENT] - Introduce a submission bitmap at btrfs_bio_ctrl::submit_bitmap This records which sectors will be submitted by extent_writepage_io(). This allows us to track which sectors needs to be submitted thus later to be properly unlocked. For asynchronously submitted range (inline/compression), the corresponding bits will be cleared from that bitmap. - Only return 1 if no sector needs to be submitted in writepage_delalloc() - Only submit sectors marked by submission bitmap inside extent_writepage_io() So we won't touch the asynchronously submitted part. - Introduce btrfs_folio_end_writer_lock_bitmap() helper This will only unlock the involved sectors specified by @bitmap parameter, to avoid touching the range asynchronously submitted. Please note that, since subpage compression is still limited to page aligned range, this change is only a preparation for future sector perfect compression support for subpage. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-10 16:51:22 +02:00
Qu Wenruo	49a9907368	btrfs: merge btrfs_folio_unlock_writer() into btrfs_folio_end_writer_lock() The function btrfs_folio_unlock_writer() is already calling btrfs_folio_end_writer_lock() to do the heavy lifting work, the only missing 0 writer check. Thus there is no need to keep two different functions, move the 0 writer check into btrfs_folio_end_writer_lock(), and remove btrfs_folio_unlock_writer(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-10 16:51:22 +02:00
Qu Wenruo	ab6eac7c91	btrfs: remove btrfs_folio_end_all_writers() The function btrfs_folio_end_all_writers() is only utilized in extent_writepage() as a way to unlock all subpage range (for both successful submission and error handling). Meanwhile we have a similar function, btrfs_folio_end_writer_lock(). The difference is, btrfs_folio_end_writer_lock() expects a range that is a subset of the already locked range. This limit on btrfs_folio_end_writer_lock() is a little overkilled, preventing it from being utilized for error paths. So here we enhance btrfs_folio_end_writer_lock() to accept a superset of the locked range, and only end the locked subset. This means we can replace btrfs_folio_end_all_writers() with btrfs_folio_end_writer_lock() instead. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-10 16:51:22 +02:00
Qu Wenruo	ce4a71ee15	btrfs: subpage: remove btrfs_fs_info::subpage_info member The member btrfs_fs_info::subpage_info stores the cached bitmap start position inside the merged bitmap. However in reality there is only one thing depending on the sectorsize, bitmap_nr_bits, which records the number of sectors that fit inside a page. The sequence of sub-bitmaps have fixed order, thus it's just a quick multiplication to calculate the start position of each sub-bitmaps. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-10 16:51:18 +02:00
Qu Wenruo	8189197425	btrfs: refactor __extent_writepage_io() to do sector-by-sector submission Unlike the bitmap usage inside raid56, for __extent_writepage_io() we handle the subpage submission not sector-by-sector, but for each dirty range we found. This is not a big deal normally, as the subpage complex code is already mostly optimized out by the compiler for x86_64. However for the sake of consistency and for the future of subpage sector-perfect compression support, this patch does: - Extract the sector submission code into submit_one_sector() - Add the needed code to extract the dirty bitmap for subpage case There is a small pitfall for non-subpage case, as we cleared page dirty before starting writeback, so we have to manually set the default dirty_bitmap to 1 for such case. - Use bitmap_and() to calculate the target sectors we need to submit This is done for both subpage and non-subpage cases, and will later be expanded to skip inline/compression ranges. For x86_64, the dirty bitmap will be fixed to 1, with the length of 1, so we're still doing the same workload per sector. For larger page sizes, the overhead will be a little larger, as previous we only need to do one extent_map lookup per-dirty-range, but now it will be one extent_map lookup per-sector. But that is the same frequency as x86_64, so we're just aligning the behavior to x86_64. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-10 16:51:18 +02:00
Qu Wenruo	efffb803bf	btrfs: make btrfs_is_subpage() to return false directly for 4K page size Btrfs only supports sectorsize 4K, 8K, 16K, 32K, 64K for now, thus for systems with 4K page size, there is no way the fs is subpage (sectorsize < PAGE_SIZE). So here we define btrfs_is_subpage() different according to the PAGE_SIZE: - PAGE_SIZE > 4K We may hit real subpage cases, define btrfs_is_subpage() as a regular function and do the usual checks. - PAGE_SIZE == 4K (no smaller PAGE_SIZE support AFAIK) There is no way the fs is subpage, so just define btrfs_is_subpage() as an inline function which always return false. This saves about 7K bytes for x86_64 debug builds: text data bss dec hex filename Before: 1484452 168693 25776 1678921 199e49 fs/btrfs/btrfs.ko After: 1476605 168445 25776 1670826 197eaa fs/btrfs/btrfs.ko Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-10 16:51:18 +02:00
Qu Wenruo	bca707e542	btrfs: subpage: introduce helpers to handle subpage delalloc locking Three new helpers are introduced for the incoming subpage delalloc locking change. - btrfs_folio_set_writer_lock() This is to mark specified range with subpage specific writer lock. After calling this, the subpage range can be proper unlocked by btrfs_folio_end_writer_lock() - btrfs_subpage_find_writer_locked() This is to find the writer locked subpage range in a page. With the help of btrfs_folio_set_writer_lock(), it can allow us to record and find previously locked subpage range without extra memory allocation. - btrfs_folio_end_all_writers() This is for the locked_page of __extent_writepage(), as there may be multiple subpage delalloc ranges locked. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-07-11 15:33:22 +02:00
Qu Wenruo	21b5bef20e	btrfs: make __extent_writepage_io() to write specified range only Function __extent_writepage_io() is designed to find all dirty ranges of a page, and add the dirty ranges to the bio_ctrl for submission. It requires all the dirtied ranges to be covered by an ordered extent. It gets called in two locations, but one call site is not subpage aware: - __extent_writepage() It gets called when writepage_delalloc() returned 0, which means writepage_delalloc() has handled delalloc for all subpage sectors inside the page. So this call site is OK. - extent_write_locked_range() This call site is utilized by zoned support, and in this case, we may only run delalloc range for a subset of the page, like this: (64K page size) 0 16K 32K 48K 64K \|/////\| \|///////\| \| In the above case, if extent_write_locked_range() is only triggered for range [0, 16K), __extent_writepage_io() would still try to submit the dirty range of [32K, 48K), then it would not find any ordered extent for it and triggers various ASSERT()s. Fix this problem by: - Introducing @start and @len parameters to specify the range For the first call site, we just pass the whole page, and the behavior is not touched, since run_delalloc_range() for the page should have created all ordered extents for the page. For the second call site, we avoid touching anything beyond the range, thus avoiding the dirty range which is not yet covered by any delalloc range. - Making btrfs_folio_assert_not_dirty() subpage aware The only caller is inside __extent_writepage_io(), and since that caller now accepts a subpage range, we should also check the subpage range other than the whole page. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-07-11 15:33:22 +02:00
Qu Wenruo	8e7e9c672f	btrfs: subpage: make reader lock utilize bitmap Currently btrfs_subpage utilizes its atomic member @reader to manage the reader counter. However it is only utilized to prevent the page to be released/unlocked when we still have reads underway. In that use case, we don't really allow multiple readers on the same subpage sector. So here we can introduce a new locked bitmap to represent exactly which subpage range is locked for read. In theory we can remove btrfs_subpage::reader as it's just the set bits of the new locked bitmap. But unfortunately bitmap doesn't provide such handy API yet, so we still keep the reader counter. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-03-05 17:13:23 +01:00
Qu Wenruo	621b9ff18c	btrfs: unexport btrfs_subpage_start_writer() and btrfs_subpage_end_and_test_writer() Both functions were introduced in commit `1e1de38792` ("btrfs: make process_one_page() to handle subpage locking"), but they have never been utilized out of subpage code. So just unexport them. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-03-05 17:13:23 +01:00
David Sterba	602035d7fe	btrfs: add forward declarations and headers, part 2 Do a cleanup in more headers: - add forward declarations for types referenced by pointers - add includes when types need them This fixes potential compilation problems if the headers are reordered or the missing includes are not provided indirectly. Signed-off-by: David Sterba <dsterba@suse.com>	2024-03-04 16:24:49 +01:00
Qu Wenruo	55151ea9ec	btrfs: migrate subpage code to folio interfaces Although subpage itself is conflicting with higher folio, since subpage (sectorsize < PAGE_SIZE and nodesize < PAGE_SIZE) means we will never need higher order folio, there is a hidden pitfall: - btrfs_page_*() helpers Those helpers are an abstraction to handle both subpage and non-subpage cases, which means we're going to pass pages pointers to those helpers. And since those helpers are shared between data and metadata paths, it's unavoidable to let them to handle folios, including higher order folios). Meanwhile for true subpage case, we should only have a single page backed folios anyway, thus add a new ASSERT() for btrfs_subpage_assert() to ensure that. Also since those helpers are shared between both data and metadata, add some extra ASSERT()s for data path to make sure we only get single page backed folio for now. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-12-15 23:03:58 +01:00
Qu Wenruo	13df3775ef	btrfs: cleanup metadata page pointer usage Although we have migrated extent_buffer::pages[] to folios[], we're still mostly using the folio_page() help to grab the page. This patch would do the following cleanups for metadata: - Introduce num_extent_folios() helper This is to replace most num_extent_pages() callers. - Use num_extent_folios() to iterate future large folios This allows us to use things like bio_add_folio()/bio_add_folio_nofail(), and only set the needed flags for the folio (aka the leading/tailing page), which reduces the loop iteration to 1 for large folios. - Change metadata related functions to use folio pointers Including their function name, involving: * attach_extent_buffer_page() * detach_extent_buffer_page() * page_range_has_eb() * btrfs_release_extent_buffer_pages() * btree_clear_page_dirty() * btrfs_page_inc_eb_refs() * btrfs_page_dec_eb_refs() - Change btrfs_is_subpage() to accept an address_space pointer This is to allow both page->mapping and folio->mapping to be utilized. As data is still using the old per-page code, and may keep so for a while. - Special corner case place holder for future order mismatches between extent buffer and inode filemap For now it's just a block of comments and a dead ASSERT(), no real handling yet. The subpage code would still go page, just because subpage and large folio are conflicting conditions, thus we don't need to bother subpage with higher order folios at all. Just folio_page(folio, 0) would be enough. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor styling tweaks ] Signed-off-by: David Sterba <dsterba@suse.com>	2023-12-15 23:01:04 +01:00
Christoph Hellwig	2b2553f123	btrfs: stop setting PageError in the data I/O path PageError is not used by the VFS/MM and deprecated because it uses up a page bit and has no coherent rules. Instead read errors are usually propagated by not setting or clearing the uptodate bit, and write errors are propagated through the address_space. Btrfs now only sets the flag and never clears it for data pages, so just remove all places setting it, and the subpage error bit. Note that the error propagation for superblock writes that work on the block device mapping still uses PageError for now, but that will be addressed in a separate series. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-06-19 13:59:35 +02:00
Qu Wenruo	75258f20fb	btrfs: subpage: dump extra subpage bitmaps for debug There is a bug report that assert_eb_page_uptodate() gets triggered for free space tree metadata. Without proper dump for the subpage bitmaps it's much harder to debug. Thus this patch would dump all the subpage bitmaps (split them into their own bitmaps) for a easier debugging. The output would look like this: (Dumped after a tree block got read from disk) page:000000006e34bf49 refcount:4 mapcount:0 mapping:0000000067661ac4 index:0x1d1 pfn:0x110e9 memcg:ffff0000d7d62000 aops:btree_aops [btrfs] ino:1 flags: 0x8000000000002002(referenced\|private\|zone=2) page_type: 0xffffffff() raw: 8000000000002002 0000000000000000 dead000000000122 ffff00000188bed0 raw: 00000000000001d1 ffff0000c7992700 00000004ffffffff ffff0000d7d62000 page dumped because: btrfs subpage dump BTRFS warning (device dm-1): start=30490624 len=16384 page=30474240 bitmaps: uptodate=4-7 error= dirty= writeback= ordered= checked= Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-06-19 13:59:30 +02:00
Qu Wenruo	fbca46eb46	btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine The reason why we only support 64K page size for subpage is, for 64K page size we can ensure no matter what the nodesize is, we can fit it into one page. When other page size come, especially like 16K, the limitation is a bit limiting. To remove such limitation, we allow nodesize >= PAGE_SIZE case to go the non-subpage routine. By this, we can allow 4K sectorsize on 16K page size. Although this introduces another smaller limitation, the metadata can not cross page boundary, which is already met by most recent mkfs. Another small improvement is, we can avoid the overhead for metadata if nodesize >= PAGE_SIZE. For 4K sector size and 64K page size/node size, or 4K sector size and 16K page size/node size, we don't need to allocate extra memory for the metadata pages. Please note that, this patch will not yet enable other page size support yet. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-05-16 17:03:11 +02:00
Qu Wenruo	e55a0de185	btrfs: rework page locking in __extent_writepage() Pages passed to __extent_writepage() are always locked, but they may be locked by different functions. There are two types of locked page for __extent_writepage(): - Page locked by plain lock_page() It should not have any subpage::writers count. Can be unlocked by unlock_page(). This is the most common locked page for __extent_writepage() called inside extent_write_cache_pages() or extent_write_full_page(). Rarer cases include the @locked_page from extent_write_locked_range(). - Page locked by lock_delalloc_pages() There is only one caller, all pages except @locked_page for extent_write_locked_range(). In this case, we have to call subpage helper to handle the case. So here we introduce a helper, btrfs_page_unlock_writer(), to allow __extent_writepage() to unlock different locked pages. And since for all other callers of __extent_writepage() their pages are ensured to be locked by lock_page(), also add an extra check for epd::extent_locked to unlock such pages directly. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-26 19:08:05 +02:00
Qu Wenruo	e4f9434749	btrfs: subpage: add bitmap for PageChecked flag Although in btrfs we have very limited usage of PageChecked flag, it's still some page flag not yet subpage compatible. Fix it by introducing btrfs_subpage::checked_offset to do the convert. For most call sites, especially for free-space cache, COW fixup and btrfs_invalidatepage(), they all work in full page mode anyway. For other call sites, they work as subpage compatible mode. Some call sites need extra modification: - btrfs_drop_pages() Needs extra parameter to get the real range we need to clear checked flag. Also since btrfs_drop_pages() will accept pages beyond the dirtied range, update btrfs_subpage_clamp_range() to handle such case by setting @len to 0 if the page is beyond target range. - btrfs_invalidatepage() We need to call subpage helper before calling __btrfs_releasepage(), or it will trigger ASSERT() as page->private will be cleared. - btrfs_verify_data_csum() In theory we don't need the io_bio->csum check anymore, but it's won't hurt. Just change the comment. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-26 19:08:03 +02:00
Qu Wenruo	72a69cd030	btrfs: subpage: pack all subpage bitmaps into a larger bitmap Currently we use u16 bitmap to make 4k sectorsize work for 64K page size. But this u16 bitmap is not large enough to contain larger page size like 128K, nor is space efficient for 16K page size. To handle both cases, here we pack all subpage bitmaps into a larger bitmap, now btrfs_subpage::bitmaps[] will be the ultimate bitmap for subpage usage. Each sub-bitmap will has its start bit number recorded in btrfs_subpage_info::_start, and its bitmap length will be recorded in btrfs_subpage_info::bitmap_nr_bits. All subpage bitmap operations will be converted from using direct u16 operations to bitmap operations, with above _start calculated. For 64K page size with 4K sectorsize, this should not cause much difference. While for 16K page size, we will only need 1 unsigned long (u32) to store all the bitmaps, which saves quite some space. Furthermore, this allows us to support larger page size like 128K and 258K. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-26 19:03:55 +02:00
Qu Wenruo	8481dd80ab	btrfs: subpage: introduce btrfs_subpage_bitmap_info Currently we use fixed size u16 bitmap for subpage bitmap. This is fine for 4K sectorsize with 64K page size. But for 4K sectorsize and larger page size, the bitmap is too small, while for smaller page size like 16K, u16 bitmaps waste too much space. Here we introduce a new helper structure, btrfs_subpage_bitmap_info, to record the proper bitmap size, and where each bitmap should start at. By this, we can later compact all subpage bitmaps into one u32 bitmap. This patch is the first step. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-25 21:17:16 +02:00
Qu Wenruo	651fb41927	btrfs: subpage: make btrfs_alloc_subpage() return btrfs_subpage directly The existing calling convention of btrfs_alloc_subpage() is pretty awful. Change it to a more common pattern by returning struct btrfs_subpage directly and let the caller to determine if the call succeeded. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-25 21:17:16 +02:00
Qu Wenruo	cc1d0d93d5	btrfs: subpage: fix writeback which does not have ordered extent [BUG] When running fsstress with subpage RW support, there are random BUG_ON()s triggered with the following trace: kernel BUG at fs/btrfs/file-item.c:667! Internal error: Oops - BUG: 0 [#1] SMP CPU: 1 PID: 3486 Comm: kworker/u13:2 5.11.0-rc4-custom+ #43 Hardware name: Radxa ROCK Pi 4B (DT) Workqueue: btrfs-worker-high btrfs_work_helper [btrfs] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) pc : btrfs_csum_one_bio+0x420/0x4e0 [btrfs] lr : btrfs_csum_one_bio+0x400/0x4e0 [btrfs] Call trace: btrfs_csum_one_bio+0x420/0x4e0 [btrfs] btrfs_submit_bio_start+0x20/0x30 [btrfs] run_one_async_start+0x28/0x44 [btrfs] btrfs_work_helper+0x128/0x1b4 [btrfs] process_one_work+0x22c/0x430 worker_thread+0x70/0x3a0 kthread+0x13c/0x140 ret_from_fork+0x10/0x30 [CAUSE] Above BUG_ON() means there is some bio range which doesn't have ordered extent, which indeed is worth a BUG_ON(). Unlike regular sectorsize == PAGE_SIZE case, in subpage we have extra subpage dirty bitmap to record which range is dirty and should be written back. This means, if we submit bio for a subpage range, we do not only need to clear page dirty, but also need to clear subpage dirty bits. In __extent_writepage_io(), we will call btrfs_page_clear_dirty() for any range we submit a bio. But there is loophole, if we hit a range which is beyond i_size, we just call btrfs_writepage_endio_finish_ordered() to finish the ordered io, then break out, without clearing the subpage dirty. This means, if we hit above branch, the subpage dirty bits are still there, if other range of the page get dirtied and we need to writeback that page again, we will submit bio for the old range, leaving a wild bio range which doesn't have ordered extent. [FIX] Fix it by always calling btrfs_page_clear_dirty() in __extent_writepage_io(). Also to avoid such problem from happening again, add a new assert, btrfs_page_assert_not_dirty(), to make sure both page dirty and subpage dirty bits are cleared before exiting __extent_writepage_io(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-23 13:19:04 +02:00
Qu Wenruo	3d078efae6	btrfs: subpage: fix a rare race between metadata endio and eb freeing [BUG] There is a very rare ASSERT() triggering during full fstests run for subpage rw support. No other reproducer so far. The ASSERT() gets triggered for metadata read in btrfs_page_set_uptodate() inside end_page_read(). [CAUSE] There is still a small race window for metadata only, the race could happen like this: T1 \| T2 ------------------------------------+----------------------------- end_bio_extent_readpage() \| \|- btrfs_validate_metadata_buffer() \| \| \|- free_extent_buffer() \| \| Still have 2 refs \| \|- end_page_read() \| \|- if (unlikely(PagePrivate()) \| \| The page still has Private \| \| \| free_extent_buffer() \| \| \| Only one ref 1, will be \| \| \| released \| \| \|- detach_extent_buffer_page() \| \| \|- btrfs_detach_subpage() \|- btrfs_set_page_uptodate() \| The page no longer has Private\| >>> ASSERT() triggered <<< \| This race window is super small, thus pretty hard to hit, even with so many runs of fstests. But the race window is still there, we have to go another way to solve it other than relying on random PagePrivate() check. Data path is not affected, as it will lock the page before reading, while unlocking the page after the last read has finished, thus no race window. [FIX] This patch will fix the bug by repurposing btrfs_subpage::readers. Now btrfs_subpage::readers will be a member shared by both metadata and data. For metadata path, we don't do the page unlock as metadata only relies on extent locking. At the same time, teach page_range_has_eb() to take btrfs_subpage::readers into consideration. So that even if the last eb of a page gets freed, page::private won't be detached as long as there still are pending end_page_read() calls. By this we eliminate the race window, this will slight increase the metadata memory usage, as the page may not be released as frequently as usual. But it should not be a big deal. The code got introduced in ("btrfs: submit read time repair only for each corrupted sector"), but the fix is in a separate patch to keep the problem description and the crash is rare so it should not hurt bisectability. Signed-off-by: Qu Wegruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-21 15:19:10 +02:00
Qu Wenruo	6f17400bd9	btrfs: introduce helpers for subpage ordered status This patch introduces the following functions to handle btrfs subpage ordered (Private2) status: - btrfs_subpage_set_ordered() - btrfs_subpage_clear_ordered() - btrfs_subpage_test_ordered() These helpers can only be called when the range is ensured to be inside the page. - btrfs_page_set_ordered() - btrfs_page_clear_ordered() - btrfs_page_test_ordered() These helpers can handle both regular sector size and subpage without problem. These functions are here to coordinate btrfs_invalidatepage() with btrfs_writepage_endio_finish_ordered(), to make sure only one of those functions can finish the ordered extent. Tested-by: Ritesh Harjani <riteshh@linux.ibm.com> # [ppc64] Tested-by: Anand Jain <anand.jain@oracle.com> # [aarch64] Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-21 15:19:09 +02:00
Qu Wenruo	1e1de38792	btrfs: make process_one_page() to handle subpage locking Introduce a new data inodes specific subpage member, writers, to record how many sectors are under page lock for delalloc writing. This member acts pretty much the same as readers, except it's only for delalloc writes. This is important for delalloc code to trace which page can really be freed, as we have cases like run_delalloc_nocow() where we may exit processing nocow range inside a page, but need to exit to do cow half way. In that case, we need a way to determine if we can really unlock a full page. With the new btrfs_subpage::writers, there is a new requirement: - Page locked by process_one_page() must be unlocked by process_one_page() There are still tons of call sites manually lock and unlock a page, without updating btrfs_subpage::writers. So if we lock a page through process_one_page() then it must be unlocked by process_one_page() to keep btrfs_subpage::writers consistent. This will be handled in next patch. Tested-by: Ritesh Harjani <riteshh@linux.ibm.com> # [ppc64] Tested-by: Anand Jain <anand.jain@oracle.com> # [aarch64] Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-21 15:19:09 +02:00
Qu Wenruo	60e2d25500	btrfs: provide btrfs_page_clamp_() helpers In the coming subpage RW supports, there are a lot of page status update calls which need to be converted to subpage compatible version, which needs @start and @len. Some call sites already have such @start/@len and are already in page range, like various endio functions. But there are also call sites which need to clamp the range for subpage case, like btrfs_dirty_pagse() and __process_contig_pages(). Here we introduce new helpers, btrfs_page_clamp_(), to do and only do the clamp for subpage version. Although in theory all existing btrfs_page_() calls can be converted to use btrfs_page_clamp_() directly, but that would make us to do unnecessary clamp operations. Tested-by: Ritesh Harjani <riteshh@linux.ibm.com> # [ppc64] Tested-by: Anand Jain <anand.jain@oracle.com> # [aarch64] Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-21 15:19:09 +02:00
Qu Wenruo	3470da3b7d	btrfs: subpage: introduce helpers for writeback status Introduces the following functions to handle subpage writeback status: - btrfs_subpage_set_writeback() - btrfs_subpage_clear_writeback() - btrfs_subpage_test_writeback() These helpers can only be called when the range is ensured to be inside the page. - btrfs_page_set_writeback() - btrfs_page_clear_writeback() - btrfs_page_test_writeback() These helpers can handle both regular sector size and subpage without problem. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-04-19 17:25:18 +02:00
Qu Wenruo	d8a5713e89	btrfs: subpage: introduce helpers for dirty status Introduce the following functions to handle subpage dirty status: - btrfs_subpage_set_dirty() - btrfs_subpage_clear_dirty() - btrfs_subpage_test_dirty() These helpers can only be called when the range is ensured to be inside the page. - btrfs_page_set_dirty() - btrfs_page_clear_dirty() - btrfs_page_test_dirty() These helpers can handle both regular sector size and subpage without problem. Thus they would be used to replace PageDirty() related calls in later patches. There is one special point to note here, just like set_page_dirty() and clear_page_dirty_for_io(), btrfs_page_set_dirty() and btrfs_page_clear_dirty() must be called with page locked. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-04-19 17:25:18 +02:00
Qu Wenruo	92082d4097	btrfs: integrate page status update for data read path into begin/end_page_read In btrfs data page read path, the page status update are handled in two different locations: btrfs_do_read_page() { while (cur <= end) { /* No need to read from disk / if (HOLE/PREALLOC/INLINE){ memset(); set_extent_uptodate(); continue; } / Read from disk */ ret = submit_extent_page(end_bio_extent_readpage); } end_bio_extent_readpage() { endio_readpage_uptodate_page_status(); } This is fine for sectorsize == PAGE_SIZE case, as for above loop we should only hit one branch and then exit. But for subpage, there is more work to be done in page status update: - Page Unlock condition Unlike regular page size == sectorsize case, we can no longer just unlock a page. Only the last reader of the page can unlock the page. This means, we can unlock the page either in the while() loop, or in the endio function. - Page uptodate condition Since we have multiple sectors to read for a page, we can only mark the full page uptodate if all sectors are uptodate. To handle both subpage and regular cases, introduce a pair of functions to help handling page status update: - begin_page_read() For regular case, it does nothing. For subpage case, it updates the reader counters so that later end_page_read() can know who is the last one to unlock the page. - end_page_read() This is just endio_readpage_uptodate_page_status() renamed. The original name is a little too long and too specific for endio. The new thing added is the condition for page unlock. Now for subpage data, we unlock the page if we're the last reader. This does not only provide the basis for subpage data read, but also hide the special handling of page read from the main read loop. Also, since we're changing how the page lock is handled, there are two existing error paths where we need to manually unlock the page before calling begin_page_read(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-02-08 22:59:03 +01:00
Qu Wenruo	03a816b32b	btrfs: introduce helpers for subpage error status Introduce the following functions to handle subpage error status: - btrfs_subpage_set_error() - btrfs_subpage_clear_error() - btrfs_subpage_test_error() These helpers can only be called when the page has subpage attached and the range is ensured to be inside the page. - btrfs_page_set_error() - btrfs_page_clear_error() - btrfs_page_test_error() These helpers can handle both regular sector size and subpage without problem. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-02-08 22:59:02 +01:00
Qu Wenruo	a1d767c11c	btrfs: introduce helpers for subpage uptodate status Introduce the following functions to handle subpage uptodate status: - btrfs_subpage_set_uptodate() - btrfs_subpage_clear_uptodate() - btrfs_subpage_test_uptodate() These helpers can only be called when the page has subpage attached and the range is ensured to be inside the page. - btrfs_page_set_uptodate() - btrfs_page_clear_uptodate() - btrfs_page_test_uptodate() These helpers can handle both regular sector size and subpage. Although caller should still ensure that the range is inside the page. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-02-08 22:59:02 +01:00
Qu Wenruo	8ff8466d29	btrfs: support subpage for extent buffer page release In btrfs_release_extent_buffer_pages(), we need to add extra handling for subpage. Introduce a helper, detach_extent_buffer_page(), to do different handling for regular and subpage cases. For subpage case, handle detaching page private. For unmapped (dummy or cloned) ebs, we can detach the page private immediately as the page can only be attached to one unmapped eb. For mapped ebs, we have to ensure there are no eb in the page range before we delete it, as page->private is shared between all ebs in the same page. But there is a subpage specific race, where we can race with extent buffer allocation, and clear the page private while new eb is still being utilized, like this: Extent buffer A is the new extent buffer which will be allocated, while extent buffer B is the last existing extent buffer of the page. T1 (eb A) \| T2 (eb B) -------------------------------+------------------------------ alloc_extent_buffer() \| btrfs_release_extent_buffer_pages() \|- p = find_or_create_page() \| \| \|- attach_extent_buffer_page() \| \| \| \| \|- detach_extent_buffer_page() \| \| \|- if (!page_range_has_eb()) \| \| \| No new eb in the page range yet \| \| \| As new eb A hasn't yet been \| \| \| inserted into radix tree. \| \| \|- btrfs_detach_subpage() \| \| \|- detach_page_private(); \|- radix_tree_insert() \| Then we have a metadata eb whose page has no private bit. To avoid such race, we introduce a subpage metadata-specific member, btrfs_subpage::eb_refs. In alloc_extent_buffer() we increase eb_refs in the critical section of private_lock. Then page_range_has_eb() will return true for detach_extent_buffer_page(), and will not detach page private. The section is marked by: - btrfs_page_inc_eb_refs() - btrfs_page_dec_eb_refs() Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-02-08 22:59:02 +01:00

1 2

52 commits