Commit graph

180 commits

Author SHA1 Message Date
Kent Overstreet
c2b2c7d1da bcachefs: Tweak btree cache helpers for use by btree node scan
btree node scan needs to not use the btree node cache: that causes
interference from prior failed reads and parallel workers.

Instead we need to allocate btree nodes that don't live in the btree
cache, so that we can call bch2_btree_node_read_done() directly.

This patch tweaks the low level helpers so they don't touch the btree
cache lists.

Cc: Nikita Ofitserov <himikof@gmail.com>
Reviewed-by: Nikita Ofitserov <himikof@gmail.com>
Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-04 23:17:07 -04:00
Kent Overstreet
09b9c72bd4 bcachefs: bch_err_throw()
Add a tracepoint for any time we return an error and unwind.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
7ed4c14e20 bcachefs: Reduce usage of recovery.curr_pass
We want recovery.curr_pass to be private to the recovery passes code,
for better showing recovery pass status; also, it may rewind and is
generally not the correct member to use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:03 -04:00
Kent Overstreet
68708efcac bcachefs: struct bch_fs_recovery
bch_fs has gotten obnoxiously big, let's start organizing thins a bit
better.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:03 -04:00
Kent Overstreet
2842515575 bcachefs: Debug params are now static_keys
We'd like users to be able to debug without building custom kernels, so
this will help us get rid of CONFIG_BCACHEFS_DEBUG, at least for most
things.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:54 -04:00
Alan Huang
f013b4ca35 bcachefs: Kill bch2_trans_unlock_noassert
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:14 -04:00
Kent Overstreet
93ac4d5f92 bcachefs: Improve bch2_btree_cache_to_text()
Make the output slightly clearer, and include a counter for "nodes we
couldn't free because we would have gone under our reserve".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:13 -04:00
Kent Overstreet
e50fe14c54 bcachefs: __btree_node_reclaim_checks()
Factor out a helper so we're not duplicating checks after locking the
btree node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:13 -04:00
Kent Overstreet
68aaeb7c8b bcachefs: kill BTREE_CACHE_NOT_FREED_INCREMENT()
Small cleanup, just always increment the counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:12 -04:00
Kent Overstreet
19b22d04cd bcachefs: Don't set btree nodes as accessed on fill
Prevent jobs that do lots of scanning (i.e. evacuatee, scrub) from
causing OOMs.

The shrinker code seems to be having issues when it doesn't do any
freeing because it's just flipping off the acccessed bit - and the
accessed bit shouldn't be set on first use anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14 17:05:19 -04:00
Kent Overstreet
1ece53237e bcachefs: Consistent indentation of multiline fsck errors
Add the new helper printbuf_indent_add_nextline(), and use it in
__bch2_fsck_err() to centralize setting the indentation of multiline
fsck errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 22:31:47 -04:00
Kent Overstreet
7c1e2a254f bcachefs: Add a cond_resched() to btree cache teardown
[12308.606480] watchdog: BUG: soft lockup - CPU#18 stuck for 26s! [umount:48479]
[12308.606485] Modules linked in: bcachefs lz4hc_compress lz4_compress lz4_decompress sunrpc overlay nf_conntrack_netlink xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE bridge stp llc xfrm_user ip6table_nat ip6table_filter ip6_tables iptable_nat xt_addrtype iptable_filter ip_tables x_tables nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 psample ext4 mbcache jbd2 nls_iso8859_1 nls_cp850 vfat fat binfmt_misc skx_edac_common nfit edac_core libnvdimm cbc encrypted_keys intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common ipmi_ssif x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm drivetemp rapl intel_cstate coretemp mgag200 i2c_algo_bit ixgbe drm_shmem_helper drm_kms_helper mdio_devres xfrm_algo mdio drm ptp intel_uncore mei_me efi_pstore evdev uas pl2303 pps_core libphy usb_storage usbserial lpc_ich mei drm_panel_orientation_quirks acpi_power_meter tiny_power_button ipmi_si mfd_core intel_pch_thermal acpi_tad acpi_ipmi ioatdma
[12308.606541]  ipmi_devintf ipmi_msghandler dca wmi button efivarfs polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 sha1_generic xhci_pci xhci_hcd aesni_intel ehci_pci ehci_hcd gf128mul crypto_simd cryptd usbcore hpwdt usb_common
[12308.606557] CPU: 18 UID: 0 PID: 48479 Comm: umount Tainted: G             L     6.14.0-rc6-x86_64-00159-ga09496a03e63 #1
[12308.606560] Tainted: [L]=SOFTLOCKUP
[12308.606561] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 07/20/2023
[12308.606563] RIP: 0010:clear_page_erms+0x7/0x10
[12308.606570] Code: 48 89 47 38 48 8d 7f 40 75 d9 90 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[12308.606572] RSP: 0018:ffff9ed5b622fba0 EFLAGS: 00010246
[12308.606574] RAX: 0000000000000000 RBX: ffff90347fffe6c0 RCX: 00000000000004c0
[12308.606575] RDX: ffffe34ea9bec1c0 RSI: 00000000000405f0 RDI: ffff902eafb07b40
[12308.606576] RBP: ffff9ed5b622fbf0 R08: 0000000000000001 R09: 0000000000000006
[12308.606577] R10: 0000000000040001 R11: 0000000000000000 R12: ffffe34ea9bec000
[12308.606578] R13: 0000000000000000 R14: 0000000000000006 R15: ffffe34ea9bed000
[12308.606580] FS:  00007fe704ecfb68(0000) GS:ffff9053fea00000(0000) knlGS:0000000000000000
[12308.606581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12308.606582] CR2: 00007f18159068ae CR3: 00000001314d0005 CR4: 00000000007726f0
[12308.606583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[12308.606584] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[12308.606584] PKRU: 55555554
[12308.606585] Call Trace:
[12308.606587]  <IRQ>
[12308.606590]  ? show_regs.cold+0x19/0x28
[12308.606595]  ? watchdog_timer_fn.cold+0x3d/0x9d
[12308.606598]  ? __pfx_watchdog_timer_fn+0x10/0x10
[12308.606602]  ? __hrtimer_run_queues+0x12e/0x250
[12308.606607]  ? hrtimer_interrupt+0xfd/0x220
[12308.606609]  ? __sysvec_apic_timer_interrupt+0x53/0xe0
[12308.606614]  ? sysvec_apic_timer_interrupt+0x76/0xa0
[12308.606619]  </IRQ>
[12308.606620]  <TASK>
[12308.606620]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[12308.606626]  ? clear_page_erms+0x7/0x10
[12308.606628]  ? __free_pages_ok+0x374/0x640
[12308.606633]  free_frozen_pages+0x34/0x570
[12308.606636]  __folio_put+0x87/0xe0
[12308.606641]  free_large_kmalloc+0x70/0x80
[12308.606645]  kfree+0x2f6/0x390
[12308.606648]  kvfree+0x2d/0x40
[12308.606653]  __btree_node_data_free+0xaf/0xf0 [bcachefs]
[12308.606726]  btree_node_data_free+0x6a/0x80 [bcachefs]
[12308.606778]  bch2_fs_btree_cache_exit+0x262/0x440 [bcachefs]
[12308.606829]  bch2_fs_release+0xe8/0x340 [bcachefs]
[12308.606905]  kobject_put+0x60/0xc0
[12308.606908]  bch2_fs_free+0xdd/0x120 [bcachefs]
[12308.606981]  bch2_kill_sb+0x1e/0x30 [bcachefs]
[12308.607051]  deactivate_locked_super+0x32/0xb0
[12308.607055]  deactivate_super+0x40/0x50
[12308.607057]  cleanup_mnt+0xc3/0x160
[12308.607060]  __cleanup_mnt+0x12/0x20
[12308.607062]  task_work_run+0x5f/0xa0
[12308.607064]  syscall_exit_to_user_mode+0x194/0x1a0
[12308.607066]  do_syscall_64+0x67/0x170
[12308.607068]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[12308.607070] RIP: 0033:0x7fe704e66eed
[12308.607073] Code: 08 49 89 ca b8 a5 00 00 00 0f 05 48 89 c7 e8 8a e6 ff ff 48 83 c4

Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16 13:47:55 -04:00
Alan Huang
677bdb7346 bcachefs: Fix deadlock
This fixes two deadlocks:

1.pcpu_alloc_mutex involved one as pointed by syzbot[1]
2.recursion deadlock.

The root cause is that we hold the bc lock during alloc_percpu, fix it
by following the pattern used by __btree_node_mem_alloc().

[1] https://lore.kernel.org/all/66f97d9a.050a0220.6bad9.001d.GAE@google.com/T/

Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-26 19:31:05 -05:00
Kent Overstreet
0c74c85bbe bcachefs: fix bch2_btree_node_flags
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-25 19:33:19 -05:00
Kent Overstreet
cf67f46641 bcachefs: __bch2_btree_pos_to_text()
Factor out a version of bch2_btree_pos_to_text() that doesn't take a
pointer to a in-memory btree node, to be used for btree node scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
c738866e47 bcachefs: check_extents_to_backpointers() now only checks buckets with mismatches
Instead of walking every extent and every backpointer it points to,
first sum up backpointers in each bucket and check for mismatches, and
only look for missing backpointers if mismatches were detected, and only
check extents in those buckets.

This is a major fsck scalability improvement, since the two backpointers
passes (backpointers -> extents and extents -> backpointers) are the
most expensive fsck passes by far.

Additionally, to speed up the upgrade for backpointer bucket gens, or in
situations when we have to rebuild alloc info, add a special case for
when no backpointers are found in a bucket - don't check each individual
backpointer (in particular, avoiding the write buffer flushes), just
recreate them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29 13:30:39 -05:00
Kent Overstreet
d941597636 bcachefs: Fix bch2_btree_node_update_key_early()
Fix an assertion pop from the recent btree cache freelist fixes.

Fixes: baefd3f849 ("bcachefs: btree_cache.freeable list fixes")
Reported-by: Tyler <th020394@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:20 -05:00
Kent Overstreet
abf23afa36 bcachefs: Fix btree node scan when unknown btree IDs are present
btree_root entries for unknown btree IDs are created during recovery,
before reading those btree roots.

But btree_node_scan may find btree nodes with unknown btree IDs when we
haven't seen roots for those btrees.

Reported-by: syzbot+1f202d4da221ec6ebf8e@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:19 -05:00
Kent Overstreet
375d21b76d bcachefs: BCH_ERR_btree_node_read_error_cached
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:19 -05:00
Kent Overstreet
db514cf677 bcachefs: Avoid bch2_btree_id_str()
Prefer bch2_btree_id_to_text() - it prints out the integer ID when
unknown.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:15 -05:00
Kent Overstreet
baefd3f849 bcachefs: btree_cache.freeable list fixes
When allocating new btree nodes, we were leaving them on the freeable
list - unlocked - allowing them to be reclaimed: ouch.

Additionally, bch2_btree_node_free_never_used() ->
bch2_btree_node_hash_remove was putting it on the freelist, while
bch2_btree_node_free_never_used() was putting it back on the btree
update reserve list - ouch.

Originally, the code was written to always keep btree nodes on a list -
live or freeable - and this worked when new nodes were kept locked.

But now with the cycle detector, we can't keep nodes locked that aren't
tracked by the cycle detector; and this is fine as long as they're not
reachable.

We also have better and more robust leak detection now, with memory
allocation profiling, so the original justification no longer applies.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07 16:48:21 -05:00
Kent Overstreet
72acab3a7c bcachefs: Fix error handling in bch2_btree_node_prefetch()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07 16:48:20 -05:00
Kent Overstreet
7a51608d01 bcachefs: Rework btree node pinning
In backpointers fsck, we do a seqential scan of one btree, and check
references to another: extents <-> backpointers

Checking references generates random lookups, so we want to pin that
btree in memory (or only a range, if it doesn't fit in ram).

Previously, this was done with a simple check in the shrinker - "if
btree node is in range being pinned, don't free it" - but this generated
OOMs, as our shrinker wasn't well behaved if there was less memory
available than expected.

Instead, we now have two different shrinkers and lru lists; the second
shrinker being for pinned nodes, with seeks set much higher than normal
- so they can still be freed if necessary, but we'll prefer not to.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
91ddd71510 bcachefs: split up btree cache counters for live, freeable
this is prep for introducing a second live list and shrinker for pinned
nodes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
691f2cba22 bcachefs: btree cache counters should be size_t
32 bits won't overflow any time soon, but size_t is the correct type for
counting objects in memory.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
ad5dbe3ce5 bcachefs: Don't count "skipped access bit" as touched in btree cache scan
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
ff7f756f2b bcachefs: Use mm_account_reclaimed_pages() when freeing btree nodes
When freeing in a shrinker callback, we need to notify memory reclaim,
so it knows forward progress has been made.

Normally this is done in e.g. slab code, but we're not freeing through
slab - or rather we are, but these allocations are big, and use the
kmalloc_large() path.

This is really a bug in the slub code, but we're working around it here
for now.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:48 -04:00
Kent Overstreet
895fbf1cf0 bcachefs: Use __GFP_ACCOUNT for reclaimable memory
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:46 -04:00
Kent Overstreet
3340dee235 bcachefs: Add pinned to btree cache not freed counters
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:35:19 -04:00
Kent Overstreet
b36f679c99 bcachefs: Drop memalloc_nofs_save() in bch2_btree_node_mem_alloc()
It's really not needed: the only locks used here are the btree cache
lock, which we drop for GFP_WAIT allocations, and btree node locks - but
we also drop those for GFP_WAIT allocations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:49 -04:00
Kent Overstreet
5dbfc4ef72 bcachefs: fix failure to relock in btree_node_fill()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22 02:07:23 -04:00
Kent Overstreet
3c5d0b72a8 bcachefs: fix failure to relock in bch2_btree_node_mem_alloc()
We weren't always so strict about trans->locked state - but now we are,
and new assertions are shaking some bugs out.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22 02:07:23 -04:00
Kent Overstreet
49203a6b9d bcachefs: Fix failure to relock in btree_node_get()
discovered by new trans->locked asserts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22 02:07:22 -04:00
Kent Overstreet
11169d9983 bcachefs: bch2_btree_id_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:15 -04:00
Uros Bizjak
68573b936d bcachefs: Use try_cmpxchg() family of functions instead of cmpxchg()
Use try_cmpxchg() family of functions instead of
cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg).

Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:12 -04:00
Kent Overstreet
5ae67abcdf bcachefs: Enable automatic shrinking for rhashtables
Since the key cache shrinker walks the rhashtable, a mostly empty
rhashtable leads to really nasty reclaim performance issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Daniel Hill
bceacfa97e bcachefs: add counters for failed shrinker reclaim
This adds distinct counters for every reason the btree node shrinker can
fail to free an object - if our shrinker isn't making progress, this
will tell us why.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:24:29 -04:00
Kent Overstreet
b6fb4269e7 bcachefs: for_each_bset() declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:23:34 -04:00
Kent Overstreet
e11ecc6133 bcachefs: Improve sysfs internal/btree_cache
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:24 -04:00
Kent Overstreet
f40d13f94d bcachefs: Run bch2_check_fix_ptrs() via triggers
Currently, the reflink_p gc trigger does repair as well - turning a
reflink_p key into an error key if the reflink_v it points to doesn't
exist.

This won't work with online check/repair, because the repair path once
online will be subject to transaction restarts, but BTREE_TRIGGER_gc is
not idempotant - we can't run it multiple times if we get a transaction
restart.

So we need to split these paths; to do so this patch calls
check_fix_ptrs() by a new general path - a new trigger type,
BTREE_TRIGGER_check_repair.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
e879389f57 bcachefs: Fix bch2_btree_node_fill() for !path
We shouldn't be doing the unlock/relock dance when we're not using a
path - this fixes an assertion pop when called from btree node scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 20:02:11 -04:00
Kent Overstreet
8cf2036e7b bcachefs: add safety checks in bch2_btree_node_fill()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 18:01:12 -04:00
Kent Overstreet
7b4c4ccf84 bcachefs: fix race in bch2_btree_node_evict()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:16 -04:00
Kent Overstreet
79032b0781 bcachefs: Improved topology repair checks
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
aa6e130e3c bcachefs: Add an assertion for trying to evict btree root
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
91dcad18d3 bcachefs: Pin btree cache in ram for random access in fsck
Various phases of fsck involve checking references from one btree to
another: this means doing a sequential scan of one btree, and then
mostly random access into the second.

This is particularly painful for checking extents <-> backpointers; we
can prefetch btree node access on the sequential scan, but not on the
random access portion, and this is particularly painful on spinning
rust, where we'd like to keep the pipeline fairly full of btree node
reads so that the elevator can reduce seeking.

This patch implements prefetching and pinning of the portion of the
btree that we'll be doing random access to. We already calculate how
much of the random access btree will fit in memory so it's a fairly
straightforward change.

This will put more pressure on system memory usage, so we introduce a
new option, fsck_memory_usage_percent, which is the percentage of total
system ram that fsck is allowed to pin.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
52946d828a bcachefs: Kill more -EIO error codes
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
cb6fc943b6 bcachefs: kill kvpmalloc()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:12 -04:00
Kent Overstreet
5f43b0134e bcachefs: btree node prefetching in check_topology
btree_and_journal_iter is old code that we want to get rid of, but we're
not ready to yet.

lack of btree node prefetching is, it turns out, a real performance
issue for fsck on spinning rust, so - add it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
ec4edd7b9d bcachefs: Prep work for variable size btree node buffers
bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.

And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.

Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00