Commit graph

411 commits

Author SHA1 Message Date
Kent Overstreet
09b9c72bd4 bcachefs: bch_err_throw()
Add a tracepoint for any time we return an error and unwind.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
18dad454cd bcachefs: Replace rcu_read_lock() with guards
The new guard(), scoped_guard() allow for more natural code.

Some of the uses with creative flow control have been left.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
d21262d4e3 bcachefs: bch2_dev_journal_bucket_delete()
Recover from "journal and btree in same bucket".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
66b7c51ceb bcachefs: bch2_check_fix_ptrs() can now repair btree roots
This is straightforward enough: check_fix_ptrs() currently only runs
before we go RW, so updating the btree root pointer in c->btree_roots
suffices - it'll be written out in the first journal write we do.

For that, do_bch2_trans_commit_to_journal_replay() now handles
JSET_ENTRY_btree_root entries.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
56e5c7f65f bcachefs: kill replicas_sectors arg to __trigger_extent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
ff875d4b47 bcachefs: Ensure we print output of run_recovery_pass if it errors
Also, don't error out in bucket_ref_update_err(): we don't want to
return -BCH_ERR_cannot_rewind_recovery if it's not an insert, if it's an
overwrite we continue.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-27 00:02:44 -04:00
Kent Overstreet
9b133c0d74 bcachefs: Small check_fix_ptr fixes
We don't want to change the bucket gen, on gen mismatch: it's possible
to have multiple btree nodes with different gens in the same bucket that
we want to keep, if we have to recover from btree node scan.

It's also not necessary to set g->gen_valid; add a comment to that
effect.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 19:52:31 -04:00
Kent Overstreet
15f969326e bcachefs: Improve bucket_bitmap code
Add some more helpers, and mismatches is now a superset of the empty
bitmap - simplifies most checks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:04 -04:00
Kent Overstreet
d4b30ed90c bcachefs: bch2_run_explicit_recovery_pass() cleanup
Consolidate the run_explicit_recovery_pass() interfaces by adding a
flags parameter; this will also let us add a RUN_RECOVERY_PASS_ratelimit
flag.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:04 -04:00
Kent Overstreet
13ffcbae86 bcachefs: "buckets with backpointer mismatches" now allocated on demand
More self healing work: we're going to be calling
check_bucket_backpointer_mismatch() at runtime, outside of fsck.

Then when we need to we'll kick off the full
check_extents_to_backpointers recovery pass.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:52 -04:00
Kent Overstreet
7677859a47 bcachefs: Run most explicit recovery passes persistent
If we detect an error that requires running a recovery pass, and we're
not in recovery, we won't be able to fix it until the next mount - make
sure we're noting in the superblock that it needs to run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:37 -04:00
Kent Overstreet
cca2c0d224 bcachefs: bch_dev.io_ref -> enumerated_ref
Convert device IO refs to enumerated_refs, for easier debugging of
refcount issues.

Simple conversion: enumerate all users and convert to the new helpers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:28 -04:00
Kent Overstreet
ebf561b208 bcachefs: print_str_as_lines() -> print_str()
bch2_print_string_as_lines() is a low level helper that allows messages
longer than 1k to be printed without truncation.

But we should always be printing with the helpers that take a filesystem
object, if we're in fsck they direct output to the userspace process
controlling fsck instead of the dmesg log.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:18 -04:00
Kent Overstreet
2085325171 bcachefs: Simplify bch2_count_fsck_err()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:17 -04:00
Kent Overstreet
bb36a12921 bcachefs: bch2_run_explicit_recovery_pass_printbuf()
We prefer helpers that emit log messages to printbufs rather than
printing them directly; that way, we can ensure that different log
messages from the same event are grouped together and formatted
appropriately in the dmesg log.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:16 -04:00
Kent Overstreet
eca5b56ccf bcachefs: Don't generate alloc updates to invalid buckets
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28 16:46:13 -04:00
Kent Overstreet
002466446a bcachefs: fix bch2_dev_buckets_resize()
The resize memcpy path was totally busted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28 16:46:13 -04:00
Kent Overstreet
345731a389 bcachefs: fix bch2_dev_usage_full_read_fast()
One reference to bch_dev_usage wasn't updated, which meant we weren't
reading the full bch_dev_usage_full - oops.

Fixes: 955ba7b5ea ("bcachefs: bch_dev_usage_full")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-13 14:54:34 -04:00
Kent Overstreet
955ba7b5ea bcachefs: bch_dev_usage_full
All the fastpaths that need device usage don't need the sector totals or
fragmentation, just bucket counts.

Split bch_dev_usage up into two different versions, the normal one with
just bucket counts.

This is also a stack usage improvement, since we have a bch_dev_usage on
the stack in the allocation path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
9180ad2e16 bcachefs: Kill btree_iter.trans
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
dcffc3b1ae bcachefs: Split up bch_dev.io_ref
We now have separate per device io_refs for read and write access.

This fixes a device removal bug where the discard workers were still
running while we're removing alloc info for that device.

It's also a bit of hardening; we no longer allow writes to devices that
are read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
7f10fde38f bcachefs: Fix field spanning write warning
Struct with embedded VLA...

memcpy: detected field-spanning write (size 8) of single field "&gc->r.e" at fs/bcachefs/ec.c:465 (size 3)
WARNING: CPU: 1 PID: 936 at fs/bcachefs/ec.c:465 bch2_trigger_stripe+0x706/0x730
Modules linked in:
CPU: 1 UID: 0 PID: 936 Comm: mount.bcachefs Not tainted 6.14.0-rc6-ktest-00236-gefb0b5c62dbc #55
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:bch2_trigger_stripe+0x706/0x730
Code: b4 00 01 b9 03 00 00 00 48 89 fb 48 c7 c7 33 54 da 81 48 89 d6 49 89 d6 48 c7 c2 c3 36 db 81 e8 60 54 c5 ff 48 89 df 4c 89 f2 <0f> 0b e9 5c fd ff ff e8 fe 5e 4e 00 bf 10 00 00 00 48 c7 c6 ff ff
RSP: 0018:ffff88817081f680 EFLAGS: 00010246
RAX: f8fe7dd1c56b5600 RBX: ffff888101265368 RCX: 0000000000000027
RDX: 0000000000000008 RSI: 00000000fffbffff RDI: ffff888101265368
RBP: 0000000000000000 R08: 000000000003ffff R09: ffff88817f1fe000
R10: 00000000000bfffd R11: 0000000000000004 R12: ffff8881012652c0
R13: 0000000000000000 R14: 0000000000000008 R15: ffff88817081f6c9
FS:  00007fc428bc7c80(0000) GS:ffff888179280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd3ee4a038 CR3: 000000010a9bc000 CR4: 0000000000750eb0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn+0xce/0x1b0
 ? bch2_trigger_stripe+0x706/0x730
 ? report_bug+0x11b/0x1a0
 ? bch2_trigger_stripe+0x706/0x730
 ? handle_bug+0x5e/0x90
 ? exc_invalid_op+0x1a/0x50
 ? asm_exc_invalid_op+0x1a/0x20
 ? bch2_trigger_stripe+0x706/0x730
 bch2_gc_mark_key+0x2cf/0x430
 bch2_check_allocations+0x1a64/0x1ed0
 ? vsnprintf+0x1ad/0x420
 ? bch2_check_allocations+0x191f/0x1ed0
 bch2_run_recovery_passes+0x13b/0x2b0
 bch2_fs_recovery+0x9b7/0x1290
 ? __bch2_print+0xb2/0xf0
 ? bch2_printbuf_exit+0x1e/0x30
 ? print_mount_opts+0x153/0x180
 bch2_fs_start+0x274/0x3b0
 bch2_fs_get_tree+0x516/0x6e0
 vfs_get_tree+0x21/0xa0
 do_new_mount+0x153/0x350
 __x64_sys_mount+0x16c/0x1f0
 do_syscall_64+0x6c/0x140
 ? arch_exit_to_user_mode_prepare+0x9/0x40
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-31 17:39:10 -04:00
Kent Overstreet
393a05a741 bcachefs: Don't use designated initializers for disk_accounting_pos
Not all compilers fully initialize these - they're not guaranteed to
because of the union shenanigans.

Fixes: https://github.com/koverstreet/bcachefs/issues/844
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30 16:35:13 -04:00
Kent Overstreet
6d77ce4a27 bcachefs: Better printing of inconsistency errors
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.

Also, add better indenting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-29 13:26:13 -04:00
Kent Overstreet
f4a584f4bf bcachefs: bch2_disk_accounting_mod2()
We're hitting some issues with uninitialized struct padding, flagged by
kmsan.

They appear to be falso positives, otherwise bch2_accounting_validate()
would have flagged them as "junk at end". But for now, we'll need to
initialize disk_accounting_pos with memset().

This adds a new helper, bch2_disk_accounting_mod2(), that initializes a
disk_accounting_pos and does the accounting mod all at once - so overall
things actually get slightly more ergonomic.

BCH_DISK_ACCOUNTING_replicas keys are left for now; KMSAN isn't warning
about them and they're a bit special.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
15800f3d4b bcachefs: bcachefs_metadata_version_cached_backpointers
Cached pointers now have backpointers.

This means that we'll be able to kill cached pointers in the
bucket_invalidate path, when invalidating/reusing buckets containing
cached data, instead of leaving them around to be cleaned up by gc_gens
garbago collection - which requires a full metadata scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:14 -04:00
Kent Overstreet
cc297dfb41 bcachefs: bch2_trigger_stripe_ptr() no longer uses ec_stripes_heap_lock
Introduce per-entry locks, like with struct bucket - the stripes heap is
going away.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:14 -04:00
Kent Overstreet
4541408391 bcachefs: bch2_kvmalloc()
Add a version of kvmalloc() that doesn't have the INT_MAX limit; large
filesystems do hit this.

We'll want to get rid of the in-memory bucket gens array, but we're not
there quite yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
7171b1fd27 bcachefs: kill __bch2_extent_ptr_to_bp()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29 13:30:39 -05:00
Kent Overstreet
aca7a26f7f bcachefs: bch2_extent_ptr_to_bp() no longer depends on device
bch_backpointer no longer contains the bucket_offset field, it's just a
direct LBA mapping (with low bits to account for compressed extent
splitting), so we don't need to refer to the device to construct it
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29 13:30:39 -05:00
Kent Overstreet
f859bc945e bcachefs: alloc_data_type_set() happens in alloc trigger
Originally, we ran insert triggers before overwrite so that if an extent
was being moved (by fallocate insert/collapse range), the bucket sector
count wouldn't hit 0 partway through, and so we don't trigger state
changes caused by that too soon.

But this is better solved by just moving the data type change to the
alloc trigger itself, where it's already called.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:23 -05:00
Kent Overstreet
49f2d18263 bcachefs: Kill unnecessary mark_lock usage
We can't hold mark_lock while calling fsck_err() - that's a deadlock,
mark_lock is meant to be a leaf node lock.

It's also unnecessary for gc_bucket() and bucket_gen(); rcu suffices
since the bucket_gens array describes its size, and we can't race with
device removal or resize during gc/fsck since that takes state lock.

Reported-by: syzbot+38641fcbda1aaffefdd4@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:22 -05:00
Kent Overstreet
828552ca74 bcachefs: Kill bch2_bucket_alloc_new_fs()
The early-early allocation path, bch2_bucket_alloc_new_fs(), is no
longer needed - and inconsistencies around new_fs_bucket_idx have been a
frequent source of bugs.

Reported-by: syzbot+592425844580a6598410@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:19 -05:00
Kent Overstreet
eb25733aba bcachefs: bch_backpointer -> bkey_i_backpointer
Since we no longer store backpointers in alloc keys, there's no reason
not to pass around bkey_i_backpointers; this means we don't have to pass
the bucket pos separately.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:19 -05:00
Kent Overstreet
eb73e7773f bcachefs: Kill FSCK_NEED_FSCK
If we find an error that indicates that we need to run fsck, we can
specify that directly with run_explicit_recovery_pass().

These are now log_fsck_err() calls: we're just logging in the superblock
that an error occurred - and possibly doing an emergency shutdown,
depending on policy.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:17 -05:00
Kent Overstreet
161d13835e bcachefs: Move bch_extent_rebalance code to rebalance.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:16 -05:00
Thorsten Blum
901ff6555b bcachefs: Annotate struct bucket_gens with __counted_by()
Add the __counted_by compiler attribute to the flexible array member b
to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Use struct_size() to calculate the number of bytes to be allocated.

Update bucket_gens->nbuckets and bucket_gens->nbuckets_minus_first when
resizing.

Compile-tested only.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:15 -05:00
Kent Overstreet
335d318ef5 bcachefs: bch2_folio_reservation_get_partial() is now better behaved
bch2_folio_reservation_get_partial(), on partial success, will now
return a reservation that's aligned to the filesystem blocksize.

This is a partial fix for fstests generic/299 - fio verify is badly
behaved in the presence of short writes that aren't aligned to its
blocksize.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18 00:49:48 -04:00
Kent Overstreet
934137b0c0 bcachefs: bch2_trigger_ptr() calculates sectors even when no device
This is necessary for erasure coded pointers to devices that have been
removed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:49 -04:00
Kent Overstreet
54a12984a9 bcachefs: EIO errcode cleanup
We want to be using private errcodes whenever possible, for better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
805ddc2042 bcachefs: bch2_dev_rcu_noerror()
bch2_dev_rcu() now properly errors if the device is invalid

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
8ed4ba3663 bcachefs: Move tabstop setup to bch2_dev_usage_to_text()
No reason for it not to be where it's needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Thorsten Blum
fa1ab1b466 bcachefs: Annotate bch_replicas_entry_{v0,v1} with __counted_by()
Add the __counted_by compiler attribute to the flexible array members
devs to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Increment nr_devs before adding a new device to the devs array and
adjust the array indexes accordingly. Add a helper macro for adding a
new device.

In bch2_journal_read(), explicitly set nr_devs to 0.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:49 -04:00
Kent Overstreet
52df04f039 bcachefs: More BCH_SB_MEMBER_INVALID support
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:46 -04:00
Kent Overstreet
7f12a963b6 bcachefs: fix rebalance accounting
Fixes: 49aa783039 ("bcachefs: Fix rebalance_work accounting")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-01 15:54:40 -04:00
Kent Overstreet
49aa783039 bcachefs: Fix rebalance_work accounting
rebalance_work was keying off of the presence of rebelance_opts in the
extent - but that was incorrect, we keep those around after rebalance
for indirect extents since the inode's options are not directly
available

Fixes: 20ac515a9c ("bcachefs: bch_acct_rebalance_work")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-24 10:16:21 -04:00
Kent Overstreet
0e49d3ff12 bcachefs: Fix locking in __bch2_trans_mark_dev_sb()
We run this in full RW mode now, so we have to guard against the
superblock buffer being reallocated.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-16 20:45:15 -04:00
Kent Overstreet
075cabf324 bcachefs: Fix forgetting to pass trans to fsck_err()
Reported-by: syzbot+e3938cd6d761b78750e6@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-16 12:46:40 -04:00
Kent Overstreet
58474f76a7 bcachefs: bcachefs_metadata_version_disk_accounting_inum
This adds another disk accounting counter to track usage per inode
number (any snapshot ID).

This will be used for a couple things:

- It'll give us a way to tell the user how much space a given file ista
  consuming in all snapshots; i.e. how much extra space it's consuming
  due to snapshot versioning.

- It counts number of extents and total size of extents (both in btree
  keyspace sectors and actual disk usage), meaning it gives us average
  extent size: that is, it'll let us cheaply find fragmented files that
  should be defragmented.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13 23:00:50 -04:00
Kent Overstreet
1a9e219db1 bcachefs: improve bch2_dev_usage_to_text()
Add a line for capacity

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-09 14:40:19 -04:00