Commit graph

7 commits

Author SHA1 Message Date
Eric Biggers
35984c730d x86/crc: drop "glue" from filenames
The use of the term "glue" in filenames is a Crypto API-ism that rarely
shows up elsewhere in lib/ or arch/*/lib/.  I think adopting it there
was a mistake.  The library just uses standard functions, so the amount
of code that could be considered "glue" is quite small.  And while often
the C functions just wrap the assembly functions, there are also cases
like crc32c_arch() in arch/x86/lib/crc32-glue.c that blur the line by
in-lining the actual implementation into the C function.  That's not
"glue code", but rather the actual code.

Therefore, let's drop "glue" from the filenames and instead use e.g.
crc32.c instead of crc32-glue.c.

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250424002038.179114-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-28 09:07:19 -07:00
Eric Biggers
7ef377c4d4 lib/crc: make the CPU feature static keys __ro_after_init
All of the CRC library's CPU feature static_keys are initialized by
initcalls and never change afterwards, so there's no need for them to be
in the regular .data section.  Put them in .data..ro_after_init instead.

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Link: https://lore.kernel.org/r/20250413154350.10819-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-28 09:07:19 -07:00
Eric Biggers
5aebe00b2f x86/crc32: optimize tail handling for crc32c short inputs
For handling the 0 <= len < sizeof(unsigned long) bytes left at the end,
do a 4-2-1 step-down instead of a byte-at-a-time loop.  This allows
taking advantage of wider CRC instructions.  Note that crc32c-3way.S
already uses this same optimization too.

crc_kunit shows an improvement of about 25% for len=127.

Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250304213216.108925-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-10 09:29:29 -07:00
Eric Biggers
cf1ea3a7c1 x86/crc32: improve crc32c_arch() code generation with clang
crc32c_arch() is affected by
https://github.com/llvm/llvm-project/issues/20571 where clang
unnecessarily spills the inputs to "rm"-constrained operands to the
stack.  Replace "rm" with ASM_INPUT_RM which partially works around this
by expanding to "r" when the compiler is clang.  This results in better
code generation with clang, though still not optimal.

Link: https://lore.kernel.org/r/20250210210741.471725-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-12 12:08:56 -08:00
Eric Biggers
a03fda967e x86/crc32: implement crc32_le using new template
Instantiate crc-pclmul-template.S for crc32_le, and delete the original
PCLMULQDQ optimized implementation.  This has the following advantages:

- Less CRC-variant-specific code.
- VPCLMULQDQ support, greatly improving performance on sufficiently long
  messages on newer CPUs.
- A faster reduction from 128 bits to the final CRC.
- Support for lengths not a multiple of 16 bytes, improving performance
  for such lengths.
- Support for misaligned buffers, improving performance in such cases.

Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit:

	Length     Before        After
	------     ------        -----
	     1     427 MB/s      605 MB/s
	    16     710 MB/s     3631 MB/s
	    64     704 MB/s     7615 MB/s
	   127    3610 MB/s     9710 MB/s
	   128    8759 MB/s    12702 MB/s
	   200    7083 MB/s    15343 MB/s
	   256   17284 MB/s    22904 MB/s
	   511   10919 MB/s    27309 MB/s
	   512   19849 MB/s    48900 MB/s
	  1024   21216 MB/s    62630 MB/s
	  3173   22150 MB/s    72437 MB/s
	  4096   22496 MB/s    79593 MB/s
	 16384   22018 MB/s    85106 MB/s

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20250210174540.161705-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-10 09:49:28 -08:00
Eric Biggers
68ea3c2ae0 lib/crc32: remove "_le" from crc32c base and arch functions
Following the standardization on crc32c() as the lib entry point for the
Castagnoli CRC32 instead of the previous mix of crc32c(), crc32c_le(),
and __crc32c_le(), make the same change to the underlying base and arch
functions that implement it.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250208024911.14936-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08 20:06:30 -08:00
Eric Biggers
55d1ecceb8 x86/crc32: expose CRC32 functions through lib
Move the x86 CRC32 assembly code into the lib directory and wire it up
to the library interface.  This allows it to be used without going
through the crypto API.  It remains usable via the crypto API too via
the shash algorithms that use the library interface.  Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01 17:23:01 -08:00