2021-01-26 16:33:47 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
|
|
|
|
|
|
|
#ifndef BTRFS_SUBPAGE_H
|
|
|
|
#define BTRFS_SUBPAGE_H
|
|
|
|
|
|
|
|
#include <linux/spinlock.h>
|
2024-01-27 03:19:56 +01:00
|
|
|
#include <linux/atomic.h>
|
2024-08-05 15:02:54 +09:30
|
|
|
#include <linux/sizes.h>
|
2025-01-29 14:29:30 +10:30
|
|
|
#include "btrfs_inode.h"
|
2025-01-28 15:26:42 +10:30
|
|
|
#include "fs.h"
|
2024-01-27 03:19:56 +01:00
|
|
|
|
|
|
|
struct address_space;
|
|
|
|
struct folio;
|
2021-01-26 16:33:47 +08:00
|
|
|
|
2021-08-17 17:38:51 +08:00
|
|
|
/*
|
|
|
|
* Extra info for subpapge bitmap.
|
|
|
|
*
|
2023-05-31 08:04:57 +02:00
|
|
|
* For subpage we pack all uptodate/dirty/writeback/ordered bitmaps into
|
2021-08-17 17:38:51 +08:00
|
|
|
* one larger bitmap.
|
|
|
|
*
|
|
|
|
* This structure records how they are organized in the bitmap:
|
|
|
|
*
|
2024-08-26 15:44:50 +09:30
|
|
|
* /- uptodate /- dirty /- ordered
|
2021-08-17 17:38:51 +08:00
|
|
|
* | | |
|
|
|
|
* v v v
|
2023-05-31 08:04:57 +02:00
|
|
|
* |u|u|u|u|........|u|u|d|d|.......|d|d|o|o|.......|o|o|
|
2024-08-26 15:44:50 +09:30
|
|
|
* |< sectors_per_page >|
|
|
|
|
*
|
|
|
|
* Unlike regular macro-like enums, here we do not go upper-case names, as
|
|
|
|
* these names will be utilized in various macros to define function names.
|
2021-08-17 17:38:51 +08:00
|
|
|
*/
|
2024-08-26 15:44:50 +09:30
|
|
|
enum {
|
|
|
|
btrfs_bitmap_nr_uptodate = 0,
|
|
|
|
btrfs_bitmap_nr_dirty,
|
2025-06-02 10:08:53 +09:30
|
|
|
|
|
|
|
/*
|
|
|
|
* This can be changed to atomic eventually. But this change will rely
|
|
|
|
* on the async delalloc range rework for locked bitmap. As async
|
|
|
|
* delalloc can unlock its range and mark blocks writeback at random
|
|
|
|
* timing.
|
|
|
|
*/
|
2024-08-26 15:44:50 +09:30
|
|
|
btrfs_bitmap_nr_writeback,
|
2025-06-02 10:08:53 +09:30
|
|
|
|
btrfs: add comments on the extra btrfs specific subpage bitmaps
Unlike the iomap_folio_state structure, the btrfs_subpage structure has a
lot of extra sub-bitmaps, namely:
- writeback sub-bitmap
- locked sub-bitmap
iomap_folio_state uses an atomic for writeback tracking, while it has
no per-block locked tracking.
This is because iomap always locks a single folio, and submits dirty
blocks with that folio locked.
But btrfs has async delalloc ranges (for compression), which are queued
with their range locked, until the compression is done, then marks the
involved range writeback and unlocked.
This means a range can be unlocked and marked writeback at seemingly
random timing, thus it needs the extra tracking.
This needs a huge rework on the lifespan of async delalloc range
before we can remove/simplify these two sub-bitmaps.
- ordered sub-bitmap
- checked sub-bitmap
These are for COW-fixup, but as I mentioned in the past, the COW-fixup
is not really needed anymore and these two flags are already marked
deprecated, and will be removed in the near future after comprehensive
tests.
Add related comments to indicate we're actively trying to align the
sub-bitmaps to the iomap ones.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-06-02 10:08:52 +09:30
|
|
|
/*
|
|
|
|
* The ordered and checked flags are for COW fixup, already marked
|
|
|
|
* deprecated, and will be removed eventually.
|
|
|
|
*/
|
2024-08-26 15:44:50 +09:30
|
|
|
btrfs_bitmap_nr_ordered,
|
|
|
|
btrfs_bitmap_nr_checked,
|
btrfs: add comments on the extra btrfs specific subpage bitmaps
Unlike the iomap_folio_state structure, the btrfs_subpage structure has a
lot of extra sub-bitmaps, namely:
- writeback sub-bitmap
- locked sub-bitmap
iomap_folio_state uses an atomic for writeback tracking, while it has
no per-block locked tracking.
This is because iomap always locks a single folio, and submits dirty
blocks with that folio locked.
But btrfs has async delalloc ranges (for compression), which are queued
with their range locked, until the compression is done, then marks the
involved range writeback and unlocked.
This means a range can be unlocked and marked writeback at seemingly
random timing, thus it needs the extra tracking.
This needs a huge rework on the lifespan of async delalloc range
before we can remove/simplify these two sub-bitmaps.
- ordered sub-bitmap
- checked sub-bitmap
These are for COW-fixup, but as I mentioned in the past, the COW-fixup
is not really needed anymore and these two flags are already marked
deprecated, and will be removed in the near future after comprehensive
tests.
Add related comments to indicate we're actively trying to align the
sub-bitmaps to the iomap ones.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-06-02 10:08:52 +09:30
|
|
|
|
|
|
|
/*
|
|
|
|
* The locked bit is for async delalloc range (compression), currently
|
|
|
|
* async extent is queued with the range locked, until the compression
|
|
|
|
* is done.
|
|
|
|
* So an async extent can unlock the range at any random timing.
|
|
|
|
*
|
|
|
|
* This will need a rework on the async extent lifespan (mark writeback
|
|
|
|
* and do compression) before deprecating this flag.
|
|
|
|
*/
|
2024-08-26 15:44:50 +09:30
|
|
|
btrfs_bitmap_nr_locked,
|
|
|
|
btrfs_bitmap_nr_max
|
2021-08-17 17:38:51 +08:00
|
|
|
};
|
|
|
|
|
2021-01-26 16:33:47 +08:00
|
|
|
/*
|
|
|
|
* Structure to trace status of each sector inside a page, attached to
|
|
|
|
* page::private for both data and metadata inodes.
|
|
|
|
*/
|
2025-06-02 10:08:53 +09:30
|
|
|
struct btrfs_folio_state {
|
2021-01-26 16:33:47 +08:00
|
|
|
/* Common members for both data and metadata pages */
|
|
|
|
spinlock_t lock;
|
2021-01-26 16:33:48 +08:00
|
|
|
union {
|
btrfs: support subpage for extent buffer page release
In btrfs_release_extent_buffer_pages(), we need to add extra handling
for subpage.
Introduce a helper, detach_extent_buffer_page(), to do different
handling for regular and subpage cases.
For subpage case, handle detaching page private.
For unmapped (dummy or cloned) ebs, we can detach the page private
immediately as the page can only be attached to one unmapped eb.
For mapped ebs, we have to ensure there are no eb in the page range
before we delete it, as page->private is shared between all ebs in the
same page.
But there is a subpage specific race, where we can race with extent
buffer allocation, and clear the page private while new eb is still
being utilized, like this:
Extent buffer A is the new extent buffer which will be allocated,
while extent buffer B is the last existing extent buffer of the page.
T1 (eb A) | T2 (eb B)
-------------------------------+------------------------------
alloc_extent_buffer() | btrfs_release_extent_buffer_pages()
|- p = find_or_create_page() | |
|- attach_extent_buffer_page() | |
| | |- detach_extent_buffer_page()
| | |- if (!page_range_has_eb())
| | | No new eb in the page range yet
| | | As new eb A hasn't yet been
| | | inserted into radix tree.
| | |- btrfs_detach_subpage()
| | |- detach_page_private();
|- radix_tree_insert() |
Then we have a metadata eb whose page has no private bit.
To avoid such race, we introduce a subpage metadata-specific member,
btrfs_subpage::eb_refs.
In alloc_extent_buffer() we increase eb_refs in the critical section of
private_lock. Then page_range_has_eb() will return true for
detach_extent_buffer_page(), and will not detach page private.
The section is marked by:
- btrfs_page_inc_eb_refs()
- btrfs_page_dec_eb_refs()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-26 16:33:50 +08:00
|
|
|
/*
|
|
|
|
* Structures only used by metadata
|
|
|
|
*
|
|
|
|
* @eb_refs should only be operated under private_lock, as it
|
2025-06-02 10:08:53 +09:30
|
|
|
* manages whether the btrfs_folio_state can be detached.
|
btrfs: support subpage for extent buffer page release
In btrfs_release_extent_buffer_pages(), we need to add extra handling
for subpage.
Introduce a helper, detach_extent_buffer_page(), to do different
handling for regular and subpage cases.
For subpage case, handle detaching page private.
For unmapped (dummy or cloned) ebs, we can detach the page private
immediately as the page can only be attached to one unmapped eb.
For mapped ebs, we have to ensure there are no eb in the page range
before we delete it, as page->private is shared between all ebs in the
same page.
But there is a subpage specific race, where we can race with extent
buffer allocation, and clear the page private while new eb is still
being utilized, like this:
Extent buffer A is the new extent buffer which will be allocated,
while extent buffer B is the last existing extent buffer of the page.
T1 (eb A) | T2 (eb B)
-------------------------------+------------------------------
alloc_extent_buffer() | btrfs_release_extent_buffer_pages()
|- p = find_or_create_page() | |
|- attach_extent_buffer_page() | |
| | |- detach_extent_buffer_page()
| | |- if (!page_range_has_eb())
| | | No new eb in the page range yet
| | | As new eb A hasn't yet been
| | | inserted into radix tree.
| | |- btrfs_detach_subpage()
| | |- detach_page_private();
|- radix_tree_insert() |
Then we have a metadata eb whose page has no private bit.
To avoid such race, we introduce a subpage metadata-specific member,
btrfs_subpage::eb_refs.
In alloc_extent_buffer() we increase eb_refs in the critical section of
private_lock. Then page_range_has_eb() will return true for
detach_extent_buffer_page(), and will not detach page private.
The section is marked by:
- btrfs_page_inc_eb_refs()
- btrfs_page_dec_eb_refs()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-26 16:33:50 +08:00
|
|
|
*/
|
|
|
|
atomic_t eb_refs;
|
2021-05-31 16:50:45 +08:00
|
|
|
|
2024-10-09 16:21:07 +10:30
|
|
|
/*
|
|
|
|
* Structures only used by data,
|
|
|
|
*
|
|
|
|
* How many sectors inside the page is locked.
|
|
|
|
*/
|
|
|
|
atomic_t nr_locked;
|
2021-01-26 16:33:48 +08:00
|
|
|
};
|
2021-08-17 17:38:52 +08:00
|
|
|
unsigned long bitmaps[];
|
2021-01-26 16:33:47 +08:00
|
|
|
};
|
|
|
|
|
2025-06-02 10:08:53 +09:30
|
|
|
enum btrfs_folio_type {
|
2021-01-26 16:33:47 +08:00
|
|
|
BTRFS_SUBPAGE_METADATA,
|
|
|
|
BTRFS_SUBPAGE_DATA,
|
|
|
|
};
|
|
|
|
|
2025-01-28 15:26:42 +10:30
|
|
|
/*
|
|
|
|
* Subpage support for metadata is more complex, as we can have dummy extent
|
|
|
|
* buffers, where folios have no mapping to determine the owning inode.
|
|
|
|
*
|
|
|
|
* Thankfully we only need to check if node size is smaller than page size.
|
|
|
|
* Even with larger folio support, we will only allocate a folio as large as
|
|
|
|
* node size.
|
|
|
|
* Thus if nodesize < PAGE_SIZE, we know metadata needs need to subpage routine.
|
|
|
|
*/
|
|
|
|
static inline bool btrfs_meta_is_subpage(const struct btrfs_fs_info *fs_info)
|
|
|
|
{
|
|
|
|
return fs_info->nodesize < PAGE_SIZE;
|
|
|
|
}
|
2025-01-29 14:29:30 +10:30
|
|
|
static inline bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info,
|
2025-03-10 13:40:47 +10:30
|
|
|
struct folio *folio)
|
2025-01-29 14:29:30 +10:30
|
|
|
{
|
2025-03-10 13:40:47 +10:30
|
|
|
if (folio->mapping && folio->mapping->host)
|
|
|
|
ASSERT(is_data_inode(BTRFS_I(folio->mapping->host)));
|
2025-03-10 13:50:43 +10:30
|
|
|
return fs_info->sectorsize < folio_size(folio);
|
2025-01-29 14:29:30 +10:30
|
|
|
}
|
btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine
The reason why we only support 64K page size for subpage is, for 64K
page size we can ensure no matter what the nodesize is, we can fit it
into one page.
When other page size come, especially like 16K, the limitation is a bit
limiting.
To remove such limitation, we allow nodesize >= PAGE_SIZE case to go the
non-subpage routine. By this, we can allow 4K sectorsize on 16K page
size.
Although this introduces another smaller limitation, the metadata can
not cross page boundary, which is already met by most recent mkfs.
Another small improvement is, we can avoid the overhead for metadata if
nodesize >= PAGE_SIZE.
For 4K sector size and 64K page size/node size, or 4K sector size and
16K page size/node size, we don't need to allocate extra memory for the
metadata pages.
Please note that, this patch will not yet enable other page size support
yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-13 13:22:09 +08:00
|
|
|
|
2025-06-02 10:08:53 +09:30
|
|
|
int btrfs_attach_folio_state(const struct btrfs_fs_info *fs_info,
|
|
|
|
struct folio *folio, enum btrfs_folio_type type);
|
|
|
|
void btrfs_detach_folio_state(const struct btrfs_fs_info *fs_info, struct folio *folio,
|
|
|
|
enum btrfs_folio_type type);
|
2021-01-26 16:33:47 +08:00
|
|
|
|
2021-01-26 16:33:48 +08:00
|
|
|
/* Allocate additional data where page represents more than one sector */
|
2025-06-02 10:08:53 +09:30
|
|
|
struct btrfs_folio_state *btrfs_alloc_folio_state(const struct btrfs_fs_info *fs_info,
|
|
|
|
size_t fsize, enum btrfs_folio_type type);
|
|
|
|
static inline void btrfs_free_folio_state(struct btrfs_folio_state *bfs)
|
|
|
|
{
|
|
|
|
kfree(bfs);
|
|
|
|
}
|
2021-01-26 16:33:48 +08:00
|
|
|
|
2023-12-07 09:39:28 +10:30
|
|
|
void btrfs_folio_inc_eb_refs(const struct btrfs_fs_info *fs_info, struct folio *folio);
|
|
|
|
void btrfs_folio_dec_eb_refs(const struct btrfs_fs_info *fs_info, struct folio *folio);
|
btrfs: support subpage for extent buffer page release
In btrfs_release_extent_buffer_pages(), we need to add extra handling
for subpage.
Introduce a helper, detach_extent_buffer_page(), to do different
handling for regular and subpage cases.
For subpage case, handle detaching page private.
For unmapped (dummy or cloned) ebs, we can detach the page private
immediately as the page can only be attached to one unmapped eb.
For mapped ebs, we have to ensure there are no eb in the page range
before we delete it, as page->private is shared between all ebs in the
same page.
But there is a subpage specific race, where we can race with extent
buffer allocation, and clear the page private while new eb is still
being utilized, like this:
Extent buffer A is the new extent buffer which will be allocated,
while extent buffer B is the last existing extent buffer of the page.
T1 (eb A) | T2 (eb B)
-------------------------------+------------------------------
alloc_extent_buffer() | btrfs_release_extent_buffer_pages()
|- p = find_or_create_page() | |
|- attach_extent_buffer_page() | |
| | |- detach_extent_buffer_page()
| | |- if (!page_range_has_eb())
| | | No new eb in the page range yet
| | | As new eb A hasn't yet been
| | | inserted into radix tree.
| | |- btrfs_detach_subpage()
| | |- detach_page_private();
|- radix_tree_insert() |
Then we have a metadata eb whose page has no private bit.
To avoid such race, we introduce a subpage metadata-specific member,
btrfs_subpage::eb_refs.
In alloc_extent_buffer() we increase eb_refs in the critical section of
private_lock. Then page_range_has_eb() will return true for
detach_extent_buffer_page(), and will not detach page private.
The section is marked by:
- btrfs_page_inc_eb_refs()
- btrfs_page_dec_eb_refs()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-26 16:33:50 +08:00
|
|
|
|
2024-10-09 16:21:07 +10:30
|
|
|
void btrfs_folio_end_lock(const struct btrfs_fs_info *fs_info,
|
|
|
|
struct folio *folio, u64 start, u32 len);
|
|
|
|
void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info,
|
|
|
|
struct folio *folio, u64 start, u32 len);
|
|
|
|
void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info,
|
|
|
|
struct folio *folio, unsigned long bitmap);
|
2021-01-26 16:33:52 +08:00
|
|
|
/*
|
|
|
|
* Template for subpage related operations.
|
|
|
|
*
|
2023-12-12 12:58:37 +10:30
|
|
|
* btrfs_subpage_*() are for call sites where the folio has subpage attached and
|
|
|
|
* the range is ensured to be inside the folio's single page.
|
2021-01-26 16:33:52 +08:00
|
|
|
*
|
2023-12-12 12:58:37 +10:30
|
|
|
* btrfs_folio_*() are for call sites where the page can either be subpage
|
|
|
|
* specific or regular folios. The function will handle both cases.
|
|
|
|
* But the range still needs to be inside one single page.
|
2021-05-31 16:50:39 +08:00
|
|
|
*
|
2023-12-12 12:58:37 +10:30
|
|
|
* btrfs_folio_clamp_*() are similar to btrfs_folio_*(), except the range doesn't
|
2021-05-31 16:50:39 +08:00
|
|
|
* need to be inside the page. Those functions will truncate the range
|
|
|
|
* automatically.
|
2025-01-29 13:27:39 +10:30
|
|
|
*
|
|
|
|
* Both btrfs_folio_*() and btrfs_folio_clamp_*() are for data folios.
|
|
|
|
*
|
|
|
|
* For metadata, one should use btrfs_meta_folio_*() helpers instead, and there
|
|
|
|
* is no clamp version for metadata helpers, as we either go subpage
|
|
|
|
* (nodesize < PAGE_SIZE) or go regular folio helpers (nodesize >= PAGE_SIZE,
|
|
|
|
* and our folio is never larger than nodesize).
|
2021-01-26 16:33:52 +08:00
|
|
|
*/
|
|
|
|
#define DECLARE_BTRFS_SUBPAGE_OPS(name) \
|
|
|
|
void btrfs_subpage_set_##name(const struct btrfs_fs_info *fs_info, \
|
2023-12-12 12:58:37 +10:30
|
|
|
struct folio *folio, u64 start, u32 len); \
|
2021-01-26 16:33:52 +08:00
|
|
|
void btrfs_subpage_clear_##name(const struct btrfs_fs_info *fs_info, \
|
2023-12-12 12:58:37 +10:30
|
|
|
struct folio *folio, u64 start, u32 len); \
|
2021-01-26 16:33:52 +08:00
|
|
|
bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info, \
|
2023-12-12 12:58:37 +10:30
|
|
|
struct folio *folio, u64 start, u32 len); \
|
|
|
|
void btrfs_folio_set_##name(const struct btrfs_fs_info *fs_info, \
|
|
|
|
struct folio *folio, u64 start, u32 len); \
|
|
|
|
void btrfs_folio_clear_##name(const struct btrfs_fs_info *fs_info, \
|
|
|
|
struct folio *folio, u64 start, u32 len); \
|
|
|
|
bool btrfs_folio_test_##name(const struct btrfs_fs_info *fs_info, \
|
|
|
|
struct folio *folio, u64 start, u32 len); \
|
|
|
|
void btrfs_folio_clamp_set_##name(const struct btrfs_fs_info *fs_info, \
|
|
|
|
struct folio *folio, u64 start, u32 len); \
|
|
|
|
void btrfs_folio_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \
|
|
|
|
struct folio *folio, u64 start, u32 len); \
|
|
|
|
bool btrfs_folio_clamp_test_##name(const struct btrfs_fs_info *fs_info, \
|
2025-01-29 13:27:39 +10:30
|
|
|
struct folio *folio, u64 start, u32 len); \
|
2025-02-25 17:16:48 +01:00
|
|
|
void btrfs_meta_folio_set_##name(struct folio *folio, const struct extent_buffer *eb); \
|
|
|
|
void btrfs_meta_folio_clear_##name(struct folio *folio, const struct extent_buffer *eb); \
|
|
|
|
bool btrfs_meta_folio_test_##name(struct folio *folio, const struct extent_buffer *eb);
|
2021-01-26 16:33:52 +08:00
|
|
|
|
|
|
|
DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
|
2021-03-25 15:14:37 +08:00
|
|
|
DECLARE_BTRFS_SUBPAGE_OPS(dirty);
|
2021-03-25 15:14:38 +08:00
|
|
|
DECLARE_BTRFS_SUBPAGE_OPS(writeback);
|
2021-05-31 16:50:45 +08:00
|
|
|
DECLARE_BTRFS_SUBPAGE_OPS(ordered);
|
2021-09-27 15:21:49 +08:00
|
|
|
DECLARE_BTRFS_SUBPAGE_OPS(checked);
|
2021-03-25 15:14:37 +08:00
|
|
|
|
btrfs: do proper folio cleanup when run_delalloc_nocow() failed
[BUG]
With CONFIG_DEBUG_VM set, test case generic/476 has some chance to crash
with the following VM_BUG_ON_FOLIO():
BTRFS error (device dm-3): cow_file_range failed, start 1146880 end 1253375 len 106496 ret -28
BTRFS error (device dm-3): run_delalloc_nocow failed, start 1146880 end 1253375 len 106496 ret -28
page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664
aops:btrfs_aops [btrfs] ino:101 dentry name(?):"f1774"
flags: 0x2fffff80004028(uptodate|lru|private|node=0|zone=2|lastcpupid=0xfffff)
page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2992!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ #87
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
pc : folio_clear_dirty_for_io+0x128/0x258
lr : folio_clear_dirty_for_io+0x128/0x258
Call trace:
folio_clear_dirty_for_io+0x128/0x258
btrfs_folio_clamp_clear_dirty+0x80/0xd0 [btrfs]
__process_folios_contig+0x154/0x268 [btrfs]
extent_clear_unlock_delalloc+0x5c/0x80 [btrfs]
run_delalloc_nocow+0x5f8/0x760 [btrfs]
btrfs_run_delalloc_range+0xa8/0x220 [btrfs]
writepage_delalloc+0x230/0x4c8 [btrfs]
extent_writepage+0xb8/0x358 [btrfs]
extent_write_cache_pages+0x21c/0x4e8 [btrfs]
btrfs_writepages+0x94/0x150 [btrfs]
do_writepages+0x74/0x190
filemap_fdatawrite_wbc+0x88/0xc8
start_delalloc_inodes+0x178/0x3a8 [btrfs]
btrfs_start_delalloc_roots+0x174/0x280 [btrfs]
shrink_delalloc+0x114/0x280 [btrfs]
flush_space+0x250/0x2f8 [btrfs]
btrfs_async_reclaim_data_space+0x180/0x228 [btrfs]
process_one_work+0x164/0x408
worker_thread+0x25c/0x388
kthread+0x100/0x118
ret_from_fork+0x10/0x20
Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000)
---[ end trace 0000000000000000 ]---
[CAUSE]
The first two lines of extra debug messages show the problem is caused
by the error handling of run_delalloc_nocow().
E.g. we have the following dirtied range (4K blocksize 4K page size):
0 16K 32K
|//////////////////////////////////////|
| Pre-allocated |
And the range [0, 16K) has a preallocated extent.
- Enter run_delalloc_nocow() for range [0, 16K)
Which found range [0, 16K) is preallocated, can do the proper NOCOW
write.
- Enter fallback_to_fow() for range [16K, 32K)
Since the range [16K, 32K) is not backed by preallocated extent, we
have to go COW.
- cow_file_range() failed for range [16K, 32K)
So cow_file_range() will do the clean up by clearing folio dirty,
unlock the folios.
Now the folios in range [16K, 32K) is unlocked.
- Enter extent_clear_unlock_delalloc() from run_delalloc_nocow()
Which is called with PAGE_START_WRITEBACK to start page writeback.
But folios can only be marked writeback when it's properly locked,
thus this triggered the VM_BUG_ON_FOLIO().
Furthermore there is another hidden but common bug that
run_delalloc_nocow() is not clearing the folio dirty flags in its error
handling path.
This is the common bug shared between run_delalloc_nocow() and
cow_file_range().
[FIX]
- Clear folio dirty for range [@start, @cur_offset)
Introduce a helper, cleanup_dirty_folios(), which
will find and lock the folio in the range, clear the dirty flag and
start/end the writeback, with the extra handling for the
@locked_folio.
- Introduce a helper to clear folio dirty, start and end writeback
- Introduce a helper to record the last failed COW range end
This is to trace which range we should skip, to avoid double
unlocking.
- Skip the failed COW range for the error handling
CC: stable@vger.kernel.org
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-12-12 16:43:59 +10:30
|
|
|
/*
|
|
|
|
* Helper for error cleanup, where a folio will have its dirty flag cleared,
|
|
|
|
* with writeback started and finished.
|
|
|
|
*/
|
|
|
|
static inline void btrfs_folio_clamp_finish_io(struct btrfs_fs_info *fs_info,
|
|
|
|
struct folio *locked_folio,
|
|
|
|
u64 start, u32 len)
|
|
|
|
{
|
|
|
|
btrfs_folio_clamp_clear_dirty(fs_info, locked_folio, start, len);
|
|
|
|
btrfs_folio_clamp_set_writeback(fs_info, locked_folio, start, len);
|
|
|
|
btrfs_folio_clamp_clear_writeback(fs_info, locked_folio, start, len);
|
|
|
|
}
|
|
|
|
|
2021-03-25 15:14:37 +08:00
|
|
|
bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
|
2023-12-12 12:58:37 +10:30
|
|
|
struct folio *folio, u64 start, u32 len);
|
2021-01-26 16:33:52 +08:00
|
|
|
|
btrfs: make __extent_writepage_io() to write specified range only
Function __extent_writepage_io() is designed to find all dirty ranges of
a page, and add the dirty ranges to the bio_ctrl for submission.
It requires all the dirtied ranges to be covered by an ordered extent.
It gets called in two locations, but one call site is not subpage aware:
- __extent_writepage()
It gets called when writepage_delalloc() returned 0, which means
writepage_delalloc() has handled delalloc for all subpage sectors
inside the page.
So this call site is OK.
- extent_write_locked_range()
This call site is utilized by zoned support, and in this case, we may
only run delalloc range for a subset of the page, like this: (64K page
size)
0 16K 32K 48K 64K
|/////| |///////| |
In the above case, if extent_write_locked_range() is only triggered for
range [0, 16K), __extent_writepage_io() would still try to submit
the dirty range of [32K, 48K), then it would not find any ordered
extent for it and triggers various ASSERT()s.
Fix this problem by:
- Introducing @start and @len parameters to specify the range
For the first call site, we just pass the whole page, and the behavior
is not touched, since run_delalloc_range() for the page should have
created all ordered extents for the page.
For the second call site, we avoid touching anything beyond the
range, thus avoiding the dirty range which is not yet covered by any
delalloc range.
- Making btrfs_folio_assert_not_dirty() subpage aware
The only caller is inside __extent_writepage_io(), and since that
caller now accepts a subpage range, we should also check the subpage
range other than the whole page.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-16 14:33:41 +10:30
|
|
|
void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info,
|
|
|
|
struct folio *folio, u64 start, u32 len);
|
2025-02-25 17:16:48 +01:00
|
|
|
bool btrfs_meta_folio_clear_and_test_dirty(struct folio *folio, const struct extent_buffer *eb);
|
btrfs: refactor __extent_writepage_io() to do sector-by-sector submission
Unlike the bitmap usage inside raid56, for __extent_writepage_io() we
handle the subpage submission not sector-by-sector, but for each dirty
range we found.
This is not a big deal normally, as the subpage complex code is already
mostly optimized out by the compiler for x86_64.
However for the sake of consistency and for the future of subpage
sector-perfect compression support, this patch does:
- Extract the sector submission code into submit_one_sector()
- Add the needed code to extract the dirty bitmap for subpage case
There is a small pitfall for non-subpage case, as we cleared page
dirty before starting writeback, so we have to manually set
the default dirty_bitmap to 1 for such case.
- Use bitmap_and() to calculate the target sectors we need to submit
This is done for both subpage and non-subpage cases, and will later
be expanded to skip inline/compression ranges.
For x86_64, the dirty bitmap will be fixed to 1, with the length of 1,
so we're still doing the same workload per sector.
For larger page sizes, the overhead will be a little larger, as previous
we only need to do one extent_map lookup per-dirty-range, but now it
will be one extent_map lookup per-sector.
But that is the same frequency as x86_64, so we're just aligning the
behavior to x86_64.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-07 14:31:54 +09:30
|
|
|
void btrfs_get_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info,
|
|
|
|
struct folio *folio,
|
|
|
|
unsigned long *ret_bitmap);
|
2023-05-26 20:30:53 +08:00
|
|
|
void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
|
2023-12-12 12:58:37 +10:30
|
|
|
struct folio *folio, u64 start, u32 len);
|
btrfs: subpage: fix writeback which does not have ordered extent
[BUG]
When running fsstress with subpage RW support, there are random
BUG_ON()s triggered with the following trace:
kernel BUG at fs/btrfs/file-item.c:667!
Internal error: Oops - BUG: 0 [#1] SMP
CPU: 1 PID: 3486 Comm: kworker/u13:2 5.11.0-rc4-custom+ #43
Hardware name: Radxa ROCK Pi 4B (DT)
Workqueue: btrfs-worker-high btrfs_work_helper [btrfs]
pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
pc : btrfs_csum_one_bio+0x420/0x4e0 [btrfs]
lr : btrfs_csum_one_bio+0x400/0x4e0 [btrfs]
Call trace:
btrfs_csum_one_bio+0x420/0x4e0 [btrfs]
btrfs_submit_bio_start+0x20/0x30 [btrfs]
run_one_async_start+0x28/0x44 [btrfs]
btrfs_work_helper+0x128/0x1b4 [btrfs]
process_one_work+0x22c/0x430
worker_thread+0x70/0x3a0
kthread+0x13c/0x140
ret_from_fork+0x10/0x30
[CAUSE]
Above BUG_ON() means there is some bio range which doesn't have ordered
extent, which indeed is worth a BUG_ON().
Unlike regular sectorsize == PAGE_SIZE case, in subpage we have extra
subpage dirty bitmap to record which range is dirty and should be
written back.
This means, if we submit bio for a subpage range, we do not only need to
clear page dirty, but also need to clear subpage dirty bits.
In __extent_writepage_io(), we will call btrfs_page_clear_dirty() for
any range we submit a bio.
But there is loophole, if we hit a range which is beyond i_size, we just
call btrfs_writepage_endio_finish_ordered() to finish the ordered io,
then break out, without clearing the subpage dirty.
This means, if we hit above branch, the subpage dirty bits are still
there, if other range of the page get dirtied and we need to writeback
that page again, we will submit bio for the old range, leaving a wild
bio range which doesn't have ordered extent.
[FIX]
Fix it by always calling btrfs_page_clear_dirty() in
__extent_writepage_io().
Also to avoid such problem from happening again, add a new assert,
btrfs_page_assert_not_dirty(), to make sure both page dirty and subpage
dirty bits are cleared before exiting __extent_writepage_io().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-26 14:34:58 +08:00
|
|
|
|
2021-01-26 16:33:47 +08:00
|
|
|
#endif
|