Commit graph

17 commits

Author SHA1 Message Date
David Sterba
72b2b199d5 btrfs: accessors: rename variable for folio offset
There used to be 'oip' short for offset in page, which got changed
during conversion to folios, the name is a bit confusing so rename it.

Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 00:09:21 +02:00
David Sterba
ae80748225 btrfs: accessors: factor out split memcpy with two sources
The case of a reading the bytes from 2 folios needs two memcpy()s, the
compiler does not emit calls but two inline loops.

Factoring out the code makes some improvement (stack, code) and in the
future will provide an optimized implementation as well. (The analogical
version with two destinations is not done as it increases stack usage
but can be done if needed.)

The address of the second folio is reordered before the first memcpy,
which leads to an optimization reusing the vmemmap_base and
page_offset_base (implementing folio_address()).

Stack usage reduction:

  btrfs_get_32                                           -8 (32 -> 24)
  btrfs_get_64                                           -8 (32 -> 24)

Code size reduction:

     text    data     bss     dec     hex filename
  1454279  115665   16088 1586032  183370 pre/btrfs.ko
  1454229  115665   16088 1585982  18333e post/btrfs.ko

  DELTA: -50

As this is the last patch in this series, here's the overall diff
starting and including commit "btrfs: accessors: simplify folio bounds
checks":

Stack:

  btrfs_set_16                                          -72 (88 -> 16)
  btrfs_get_32                                          -56 (80 -> 24)
  btrfs_set_8                                           -72 (88 -> 16)
  btrfs_set_64                                          -64 (88 -> 24)
  btrfs_get_8                                           -72 (80 -> 8)
  btrfs_get_16                                          -64 (80 -> 16)
  btrfs_set_32                                          -64 (88 -> 24)
  btrfs_get_64                                          -56 (80 -> 24)

  NEW (48):
	  report_setget_bounds                           48
  LOST/NEW DELTA:      +48
  PRE/POST DELTA:     -472

Code:

     text    data     bss     dec     hex filename
  1456601  115665   16088 1588354  183c82 pre/btrfs.ko
  1454229  115665   16088 1585982  18333e post/btrfs.ko

  DELTA: -2372

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 00:09:21 +02:00
David Sterba
c8b33a57fb btrfs: accessors: set target address at initialization
The target address for the read/write can be simplified as it's the same
expression for the first folio. This improves the generated code as the
folio address does not have to be cached on stack.

Stack usage reduction:

  btrfs_set_32                                           -8 (32 -> 24)
  btrfs_set_64                                           -8 (32 -> 24)
  btrfs_get_16                                           -8 (24 -> 16)

Code size reduction:

     text    data     bss     dec     hex filename
  1454459  115665   16088 1586212  183424 pre/btrfs.ko
  1454279  115665   16088 1586032  183370 post/btrfs.ko

  DELTA: -180

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 00:09:21 +02:00
David Sterba
1ed0f75d57 btrfs: accessors: compile-time fast path for u16
Reading/writing 2 bytes (u16) may need 2 folios to be written to, each
time it's just one byte so using memcpy for that is an overkill.
Add a branch for the split case so that memcpy is now used for u32 and
u64. Another side effect is that the u16 types now don't need additional
stack space, everything fits to registers.

Stack usage is reduced:

  btrfs_get_16                                           -8 (32 -> 24)
  btrfs_set_16                                          -16 (32 -> 16)

Code size reduction:

     text    data     bss     dec     hex filename
  1454691  115665   16088 1586444  18350c pre/btrfs.ko
  1454459  115665   16088 1586212  183424 post/btrfs.ko

  DELTA: -232

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 00:09:21 +02:00
David Sterba
58383c6866 btrfs: accessors: compile-time fast path for u8
Reading/writing 1 byte (u8) is a special case compared to the others as
it's always contained in the folio we find, so the split memcpy will
never be needed. Turn it to a compile-time check that the memcpy part
can be optimized out.

The stack usage is reduced:

  btrfs_set_8                                         -16 (32 -> 16)
  btrfs_get_8                                         -16 (24 -> 8)

Code size reduction:

     text    data     bss     dec     hex filename
  1454951  115665   16088 1586704  183610 pre/btrfs.ko
  1454691  115665   16088 1586444  18350c post/btrfs.ko

  DELTA: -260

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 00:09:21 +02:00
David Sterba
d5a87dbd95 btrfs: accessors: inline eb bounds check and factor out the error report
There's a check in each set/get helper if the requested range is within
extent buffer bounds, and if it's not then report it. This was in an
ASSERT statement so with CONFIG_BTRFS_ASSERT this crashes right away, on
other configs this is only reported but reading out of the bounds is
done anyway. There are currently no known reports of this particular
condition failing.

There are some drawbacks though. The behaviour dependence on the
assertions being compiled in or not and a less visible effect of
inlining report_setget_bounds() into each helper.

As the bounds check is expected to succeed almost always it's ok to
inline it but make the report a function and move it out of the helper
completely (__cold puts it to a different section). This also skips
reading/writing the requested range in case it fails.

This improves stack usage significantly:

  btrfs_get_16                                         -48 (80 -> 32)
  btrfs_get_32                                         -48 (80 -> 32)
  btrfs_get_64                                         -48 (80 -> 32)
  btrfs_get_8                                          -48 (72 -> 24)
  btrfs_set_16                                         -56 (88 -> 32)
  btrfs_set_32                                         -56 (88 -> 32)
  btrfs_set_64                                         -56 (88 -> 32)
  btrfs_set_8                                          -48 (80 -> 32)

  NEW (48):
	  report_setget_bounds                                     48
  LOST/NEW DELTA:      +48
  PRE/POST DELTA:     -360

Same as .ko size:

     text    data     bss     dec     hex filename
  1456079  115665   16088 1587832  183a78 pre/btrfs.ko
  1454951  115665   16088 1586704  183610 post/btrfs.ko

  DELTA: -1128

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 00:09:21 +02:00
David Sterba
378c95c477 btrfs: accessors: use type sizeof constants directly
Now unit_size is used only once, so use it directly in 'part'
calculation. Don't cache sizeof(type) in a variable. While this is a
compile-time constant, forcing the type 'int' generates worse code as it
leads to additional conversion from 32 to 64 bit type on x86_64.

The sizeof() is used only a few times and it does not make the code that
harder to read, so use it directly and let the compiler utilize the
immediate constants in the context it needs. The .ko code size slightly
increases (+50) but further patches will reduce that again.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 00:09:21 +02:00
David Sterba
00c0cf8444 btrfs: accessors: simplify folio bounds checks
As we can a have non-contiguous range in the eb->folios, any item can be
straddling two folios and we need to check if it can be read in one go
or in two parts. For that there's a check which is not implemented in
the simplest way:

  offset in folio + size <= folio size

With a simple expression transformation:

  oil + size <= unit_size
        size <= unit_size - oil
    sizeof() <= part

this can be simplified and reusing existing run-time or compile-time
constants.

Add likely() annotation for this expression as this is the fast path and
compiler sometimes reorders that after the followup block with the
memcpy (observed in practice with other simplifications).

Overall effect on stack consumption:

  btrfs_get_8                                        -8 (80 -> 72)
  btrfs_set_8                                        -8 (88 -> 80)

And .ko size (due to optimizations making use of the direct constants):

     text    data     bss     dec     hex filename
  1456601  115665   16088 1588354  183c82 pre/btrfs.ko
  1456093  115665   16088 1587846  183a86 post/btrfs.ko

  DELTA: -508

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 00:09:21 +02:00
David Sterba
b37eb352c4 btrfs: accessors: delete token versions of set/get helpers
Once upon a time there was a need to cache address of extent buffer
pages, as it was a costly operation (map_private_extent_buffer(),
cfed81a04e ("Btrfs: add the ability to cache a pointer into the
eb")).  This was not even due to use of HIGHMEM, this had been removed
before that due to possible locking issues (a65917156e ("Btrfs:
stop using highmem for extent_buffers")).

Over time the amount of work in the set/get helpers got reduced and
became quite straightforward bounds checking with an unaligned
read/write, commit db3756c879 ("btrfs: remove unused
map_private_extent_buffer").

The actual caching of the page_address()/folio_address() in the token
was more work for very little gain. This depended on subsequent access
into the same page/folio, otherwise the cached pointer had to be
updated.

For metadata-heavy operations this showed up in the 'perf top' profile
where the btrfs_get_token_32() calls were at the top, on my testing
machine consuming about 2-3%. The other generic 32/64 bit helpers also
appeared in the profile with similar fraction.

After removing use of the token helpers we can remove them completely,
this leads to reduction of btrfs.ko by 6.7KiB on release config.

   text    data     bss     dec     hex filename
1463289  115665   16088 1595042  1856a2 pre/btrfs.ko
1456601  115665   16088 1588354  183c82 post/btrfs.ko

DELTA: -6688

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22 00:05:00 +02:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
David Sterba
2b712e3bb2 btrfs: remove unused included headers
With help of neovim, LSP and clangd we can identify header files that
are not actually needed to be included in the .c files. This is focused
only on removal (with minor fixups), further cleanups are possible but
will require doing the header files properly with forward declarations,
minimized includes and include-what-you-use care.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:46 +01:00
Qu Wenruo
84cda1a608 btrfs: cache folio size and shift in extent_buffer
After the conversion to folio interfaces (but without the patch to
enable larger folio allocation), there is an LTP report about observable
performance drop on metadata heavy operations.

https://lore.kernel.org/linux-btrfs/202312221750.571925bd-oliver.sang@intel.com/

This drop is caused by the extra code of calculating the
folio_size()/folio_shift(), instead of the old hard coded
PAGE_SIZE/PAGE_SHIFT.

To slightly reduce the overhead, just cache both folio_size and
folio_shift in extent_buffer.

The two new members (u32 folio_size and u8 folio_shift) are stored
inside the holes of extent_buffer. folio_size is shared with len, which
is reduced to u32. The size of eb does not change.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:45 +01:00
Qu Wenruo
8d99361835 btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios
These two functions are still using the old page based code, which is
not going to handle larger folios at all.

The migration itself is going to involve the following changes:

- PAGE_SIZE -> folio_size()
- PAGE_SHIFT -> folio_shift()
- get_eb_page_index() -> get_eb_folio_index()
- get_eb_offset_in_page() -> get_eb_offset_in_folio()

And since we're going to support larger folios, although above straight
conversion is good enough, this patch would add extra comments in the
involved functions to explain why the same single line code can now
cover 3 cases:

- folio_size == PAGE_SIZE, sectorsize == PAGE_SIZE, nodesize >= PAGE_SIZE
  The common, non-subpage case with per-page folio.

- folio_size > PAGE_SIZE, sectorsize == PAGE_SIZE, nodesize >= PAGE_SIZE
  The incoming larger folio, non-subpage case.

- folio_size == PAGE_SIZE, sectorsize < PAGE_SIZE, nodesize < PAGE_SIZE
  The existing subpage case, we won't larger folio anyway.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 23:03:58 +01:00
Qu Wenruo
082d5bb9b3 btrfs: migrate extent_buffer::pages[] to folio
For now extent_buffer::pages[] are still only accepting single page
pointer, thus we can migrate to folios pretty easily.

As for single page, page and folio are 1:1 mapped, including their page
flags.

This patch would just do the conversion from struct page to struct
folio, providing the first step to higher order folio in the future.

This conversion is pretty simple:

- extent_buffer::pages[] -> extent_buffer::folios[]

- page_address(eb->pages[i]) -> folio_address(eb->pages[i])

- eb->pages[i] -> folio_page(eb->folios[i], 0)

There would be more specific cleanups preparing for the incoming higher
order folio support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 23:01:04 +01:00
Josef Bacik
e23efd8e87 btrfs: add eb to btrfs_node_key_ptr_offset
This is a change needed for extent tree v2, as we will be growing the
header size.  This exists in btrfs-progs currently, and not having it
makes syncing accessors.[ch] more problematic.  So make this change to
set us up for extent tree v2 and match what btrfs-progs does to make
syncing easier.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:58 +01:00
Josef Bacik
ad1ac5012c btrfs: move btrfs_map_token to accessors
This is specific to the item-accessor code, move it out of ctree.h into
accessor.h/.c and then update the users to include the new header file.
This un-inlines btrfs_init_map_token, however this is only called once
per function so it's not critical to be inlined.  This also saves 904
bytes of code on a release build.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:42 +01:00
Josef Bacik
818fe33aed btrfs: rename struct-funcs.c to accessors.c
Rename struct-funcs.c to accessors.c so we can move the item accessors
out of ctree.h. accessors.c is a better description of the code that is
contained in these files.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:42 +01:00
Renamed from fs/btrfs/struct-funcs.c (Browse further)