linux/arch/x86/crypto/Kconfig

403 lines
10 KiB
Text
Raw Permalink Normal View History

# SPDX-License-Identifier: GPL-2.0
menu "Accelerated Cryptographic Algorithms for CPU (x86)"
config CRYPTO_CURVE25519_X86
tristate
depends on 64BIT
select CRYPTO_KPP
select CRYPTO_LIB_CURVE25519_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CURVE25519
default CRYPTO_LIB_CURVE25519_INTERNAL
help
Curve25519 algorithm
Architecture: x86_64 using:
- ADX (large integer arithmetic)
config CRYPTO_AES_NI_INTEL
tristate "Ciphers: AES, modes: ECB, CBC, CTS, CTR, XCTR, XTS, GCM (AES-NI/VAES)"
select CRYPTO_AEAD
select CRYPTO_LIB_AES
crypto: x86/aes-gcm - add VAES and AVX512 / AVX10 optimized AES-GCM Add implementations of AES-GCM for x86_64 CPUs that support VAES (vector AES), VPCLMULQDQ (vector carryless multiplication), and either AVX512 or AVX10. There are two implementations, sharing most source code: one using 256-bit vectors and one using 512-bit vectors. This patch improves AES-GCM performance by up to 162%; see Tables 1 and 2 below. I wrote the new AES-GCM assembly code from scratch, focusing on correctness, performance, code size (both source and binary), and documenting the source. The new assembly file aes-gcm-avx10-x86_64.S is about 1200 lines including extensive comments, and it generates less than 8 KB of binary code. The main loop does 4 vectors at a time, with the AES and GHASH instructions interleaved. Any remainder is handled using a simple 1 vector at a time loop, with masking. Several VAES + AVX512 implementations of AES-GCM exist from Intel, including one in OpenSSL and one proposed for inclusion in Linux in 2021 (https://lore.kernel.org/linux-crypto/1611386920-28579-6-git-send-email-megha.dey@intel.com/). These aren't really suitable to be used, though, due to the massive amount of binary code generated (696 KB for OpenSSL, 200 KB for Linux) and well as the significantly larger amount of assembly source (4978 lines for OpenSSL, 1788 lines for Linux). Also, Intel's code does not support 256-bit vectors, which makes it not usable on future AVX10/256-only CPUs, and also not ideal for certain Intel CPUs that have downclocking issues. So I ended up starting from scratch. Usually my much shorter code is actually slightly faster than Intel's AVX512 code, though it depends on message length and on which of Intel's implementations is used; for details, see Tables 3 and 4 below. To facilitate potential integration into other projects, I've dual-licensed aes-gcm-avx10-x86_64.S under Apache-2.0 OR BSD-2-Clause, the same as the recently added RISC-V crypto code. The following two tables summarize the performance improvement over the existing AES-GCM code in Linux that uses AES-NI and AVX2: Table 1: AES-256-GCM encryption throughput improvement, CPU microarchitecture vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ----------------------+-------+-------+-------+-------+-------+-------+ Intel Ice Lake | 42% | 48% | 60% | 62% | 70% | 69% | Intel Sapphire Rapids | 157% | 145% | 162% | 119% | 96% | 96% | Intel Emerald Rapids | 156% | 144% | 161% | 115% | 95% | 100% | AMD Zen 4 | 103% | 89% | 78% | 56% | 54% | 54% | | 300 | 200 | 64 | 63 | 16 | ----------------------+-------+-------+-------+-------+-------+ Intel Ice Lake | 66% | 48% | 49% | 70% | 53% | Intel Sapphire Rapids | 80% | 60% | 41% | 62% | 38% | Intel Emerald Rapids | 79% | 60% | 41% | 62% | 38% | AMD Zen 4 | 51% | 35% | 27% | 32% | 25% | Table 2: AES-256-GCM decryption throughput improvement, CPU microarchitecture vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ----------------------+-------+-------+-------+-------+-------+-------+ Intel Ice Lake | 42% | 48% | 59% | 63% | 67% | 71% | Intel Sapphire Rapids | 159% | 145% | 161% | 125% | 102% | 100% | Intel Emerald Rapids | 158% | 144% | 161% | 124% | 100% | 103% | AMD Zen 4 | 110% | 95% | 80% | 59% | 56% | 54% | | 300 | 200 | 64 | 63 | 16 | ----------------------+-------+-------+-------+-------+-------+ Intel Ice Lake | 67% | 56% | 46% | 70% | 56% | Intel Sapphire Rapids | 79% | 62% | 39% | 61% | 39% | Intel Emerald Rapids | 80% | 62% | 40% | 58% | 40% | AMD Zen 4 | 49% | 36% | 30% | 35% | 28% | The above numbers are percentage improvements in single-thread throughput, so e.g. an increase from 4000 MB/s to 6000 MB/s would be listed as 50%. They were collected by directly measuring the Linux crypto API performance using a custom kernel module. Note that indirect benchmarks (e.g. 'cryptsetup benchmark' or benchmarking dm-crypt I/O) include more overhead and won't see quite as much of a difference. All these benchmarks used an associated data length of 16 bytes. Note that AES-GCM is almost always used with short associated data lengths. The following two tables summarize how the performance of my code compares with Intel's AVX512 AES-GCM code, both the version that is in OpenSSL and the version that was proposed for inclusion in Linux. Neither version exists in Linux currently, but these are alternative AES-GCM implementations that could be chosen instead of mine. I collected the following numbers on Emerald Rapids using a userspace benchmark program that calls the assembly functions directly. I've also included a comparison with Cloudflare's AES-GCM implementation from https://boringssl-review.googlesource.com/c/boringssl/+/65987/3. Table 3: VAES-based AES-256-GCM encryption throughput in MB/s, implementation name vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ---------------------+-------+-------+-------+-------+-------+-------+ This implementation | 14171 | 12956 | 12318 | 9588 | 7293 | 6449 | AVX512_Intel_OpenSSL | 14022 | 12467 | 11863 | 9107 | 5891 | 6472 | AVX512_Intel_Linux | 13954 | 12277 | 11530 | 8712 | 6627 | 5898 | AVX512_Cloudflare | 12564 | 11050 | 10905 | 8152 | 5345 | 5202 | | 300 | 200 | 64 | 63 | 16 | ---------------------+-------+-------+-------+-------+-------+ This implementation | 4939 | 3688 | 1846 | 1821 | 738 | AVX512_Intel_OpenSSL | 4629 | 4532 | 2734 | 2332 | 1131 | AVX512_Intel_Linux | 4035 | 2966 | 1567 | 1330 | 639 | AVX512_Cloudflare | 3344 | 2485 | 1141 | 1127 | 456 | Table 4: VAES-based AES-256-GCM decryption throughput in MB/s, implementation name vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ---------------------+-------+-------+-------+-------+-------+-------+ This implementation | 14276 | 13311 | 13007 | 11086 | 8268 | 8086 | AVX512_Intel_OpenSSL | 14067 | 12620 | 12421 | 9587 | 5954 | 7060 | AVX512_Intel_Linux | 14116 | 12795 | 11778 | 9269 | 7735 | 6455 | AVX512_Cloudflare | 13301 | 12018 | 11919 | 9182 | 7189 | 6726 | | 300 | 200 | 64 | 63 | 16 | ---------------------+-------+-------+-------+-------+-------+ This implementation | 6454 | 5020 | 2635 | 2602 | 1079 | AVX512_Intel_OpenSSL | 5184 | 5799 | 2957 | 2545 | 1228 | AVX512_Intel_Linux | 4394 | 4247 | 2235 | 1635 | 922 | AVX512_Cloudflare | 4289 | 3851 | 1435 | 1417 | 574 | So, usually my code is actually slightly faster than Intel's code, though the OpenSSL implementation has a slight edge on messages shorter than 256 bytes in this microbenchmark. (This also holds true when doing the same tests on AMD Zen 4.) It can be seen that the large code size (up to 94x larger!) of the Intel implementations doesn't seem to bring much benefit, so starting from scratch with much smaller code, as I've done, seems appropriate. The performance of my code on messages shorter than 256 bytes could be improved through a limited amount of unrolling, but it's unclear it would be worth it, given code size considerations (e.g. caches) that don't get measured in microbenchmarks. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-06-02 15:22:19 -07:00
select CRYPTO_LIB_GF128MUL
select CRYPTO_ALGAPI
select CRYPTO_SKCIPHER
help
Block cipher: AES cipher algorithms
AEAD cipher: AES with GCM
Length-preserving ciphers: AES with ECB, CBC, CTS, CTR, XCTR, XTS
Architecture: x86 (32-bit and 64-bit) using:
- AES-NI (AES new instructions)
- VAES (Vector AES)
Some algorithm implementations are supported only in 64-bit builds,
and some have additional prerequisites such as AVX2 or AVX512.
config CRYPTO_BLOWFISH_X86_64
tristate "Ciphers: Blowfish, modes: ECB, CBC"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_BLOWFISH_COMMON
imply CRYPTO_CTR
help
Block cipher: Blowfish cipher algorithm
Length-preserving ciphers: Blowfish with ECB and CBC modes
Architecture: x86_64
config CRYPTO_CAMELLIA_X86_64
tristate "Ciphers: Camellia with modes: ECB, CBC"
depends on 64BIT
select CRYPTO_SKCIPHER
imply CRYPTO_CTR
help
Block cipher: Camellia cipher algorithms
Length-preserving ciphers: Camellia with ECB and CBC modes
Architecture: x86_64
config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
tristate "Ciphers: Camellia with modes: ECB, CBC (AES-NI/AVX)"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_CAMELLIA_X86_64
imply CRYPTO_XTS
help
Length-preserving ciphers: Camellia with ECB and CBC modes
Architecture: x86_64 using:
- AES-NI (AES New Instructions)
- AVX (Advanced Vector Extensions)
config CRYPTO_CAMELLIA_AESNI_AVX2_X86_64
tristate "Ciphers: Camellia with modes: ECB, CBC (AES-NI/AVX2)"
depends on 64BIT
select CRYPTO_CAMELLIA_AESNI_AVX_X86_64
help
Length-preserving ciphers: Camellia with ECB and CBC modes
Architecture: x86_64 using:
- AES-NI (AES New Instructions)
- AVX2 (Advanced Vector Extensions 2)
config CRYPTO_CAST5_AVX_X86_64
tristate "Ciphers: CAST5 with modes: ECB, CBC (AVX)"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_CAST5
select CRYPTO_CAST_COMMON
imply CRYPTO_CTR
help
Length-preserving ciphers: CAST5 (CAST-128) cipher algorithm
(RFC2144) with ECB and CBC modes
Architecture: x86_64 using:
- AVX (Advanced Vector Extensions)
Processes 16 blocks in parallel.
config CRYPTO_CAST6_AVX_X86_64
tristate "Ciphers: CAST6 with modes: ECB, CBC (AVX)"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_CAST6
select CRYPTO_CAST_COMMON
imply CRYPTO_XTS
imply CRYPTO_CTR
help
Length-preserving ciphers: CAST6 (CAST-256) cipher algorithm
(RFC2612) with ECB and CBC modes
Architecture: x86_64 using:
- AVX (Advanced Vector Extensions)
Processes eight blocks in parallel.
config CRYPTO_DES3_EDE_X86_64
tristate "Ciphers: Triple DES EDE with modes: ECB, CBC"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_LIB_DES
imply CRYPTO_CTR
help
Block cipher: Triple DES EDE (FIPS 46-3) cipher algorithm
Length-preserving ciphers: Triple DES EDE with ECB and CBC modes
Architecture: x86_64
Processes one or three blocks in parallel.
config CRYPTO_SERPENT_SSE2_X86_64
tristate "Ciphers: Serpent with modes: ECB, CBC (SSE2)"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
imply CRYPTO_CTR
help
Length-preserving ciphers: Serpent cipher algorithm
with ECB and CBC modes
Architecture: x86_64 using:
- SSE2 (Streaming SIMD Extensions 2)
Processes eight blocks in parallel.
config CRYPTO_SERPENT_SSE2_586
tristate "Ciphers: Serpent with modes: ECB, CBC (32-bit with SSE2)"
depends on !64BIT
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
imply CRYPTO_CTR
help
Length-preserving ciphers: Serpent cipher algorithm
with ECB and CBC modes
Architecture: x86 (32-bit) using:
- SSE2 (Streaming SIMD Extensions 2)
Processes four blocks in parallel.
config CRYPTO_SERPENT_AVX_X86_64
tristate "Ciphers: Serpent with modes: ECB, CBC (AVX)"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
imply CRYPTO_XTS
imply CRYPTO_CTR
help
Length-preserving ciphers: Serpent cipher algorithm
with ECB and CBC modes
Architecture: x86_64 using:
- AVX (Advanced Vector Extensions)
Processes eight blocks in parallel.
config CRYPTO_SERPENT_AVX2_X86_64
tristate "Ciphers: Serpent with modes: ECB, CBC (AVX2)"
depends on 64BIT
select CRYPTO_SERPENT_AVX_X86_64
help
Length-preserving ciphers: Serpent cipher algorithm
with ECB and CBC modes
Architecture: x86_64 using:
- AVX2 (Advanced Vector Extensions 2)
Processes 16 blocks in parallel.
config CRYPTO_SM4_AESNI_AVX_X86_64
tristate "Ciphers: SM4 with modes: ECB, CBC, CTR (AES-NI/AVX)"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_ALGAPI
select CRYPTO_SM4
help
Length-preserving ciphers: SM4 cipher algorithms
(OSCCA GB/T 32907-2016) with ECB, CBC, and CTR modes
Architecture: x86_64 using:
- AES-NI (AES New Instructions)
- AVX (Advanced Vector Extensions)
Through two affine transforms,
we can use the AES S-Box to simulate the SM4 S-Box to achieve the
effect of instruction acceleration.
If unsure, say N.
config CRYPTO_SM4_AESNI_AVX2_X86_64
tristate "Ciphers: SM4 with modes: ECB, CBC, CTR (AES-NI/AVX2)"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_ALGAPI
select CRYPTO_SM4
select CRYPTO_SM4_AESNI_AVX_X86_64
help
Length-preserving ciphers: SM4 cipher algorithms
(OSCCA GB/T 32907-2016) with ECB, CBC, and CTR modes
Architecture: x86_64 using:
- AES-NI (AES New Instructions)
- AVX2 (Advanced Vector Extensions 2)
Through two affine transforms,
we can use the AES S-Box to simulate the SM4 S-Box to achieve the
effect of instruction acceleration.
If unsure, say N.
config CRYPTO_TWOFISH_586
tristate "Ciphers: Twofish (32-bit)"
depends on !64BIT
select CRYPTO_ALGAPI
select CRYPTO_TWOFISH_COMMON
imply CRYPTO_CTR
help
Block cipher: Twofish cipher algorithm
Architecture: x86 (32-bit)
config CRYPTO_TWOFISH_X86_64
tristate "Ciphers: Twofish"
depends on 64BIT
select CRYPTO_ALGAPI
select CRYPTO_TWOFISH_COMMON
imply CRYPTO_CTR
help
Block cipher: Twofish cipher algorithm
Architecture: x86_64
config CRYPTO_TWOFISH_X86_64_3WAY
tristate "Ciphers: Twofish with modes: ECB, CBC (3-way parallel)"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_TWOFISH_COMMON
select CRYPTO_TWOFISH_X86_64
help
Length-preserving cipher: Twofish cipher algorithm
with ECB and CBC modes
Architecture: x86_64
Processes three blocks in parallel, better utilizing resources of
out-of-order CPUs.
config CRYPTO_TWOFISH_AVX_X86_64
tristate "Ciphers: Twofish with modes: ECB, CBC (AVX)"
depends on 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_TWOFISH_COMMON
select CRYPTO_TWOFISH_X86_64
select CRYPTO_TWOFISH_X86_64_3WAY
imply CRYPTO_XTS
help
Length-preserving cipher: Twofish cipher algorithm
with ECB and CBC modes
Architecture: x86_64 using:
- AVX (Advanced Vector Extensions)
Processes eight blocks in parallel.
crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher The implementation is based on the 32-bit implementation of the aria. Also, aria-avx process steps are the similar to the camellia-avx. 1. Byteslice(16way) 2. Add-round-key. 3. Sbox 4. Diffusion layer. Except for s-box, all steps are the same as the aria-generic implementation. s-box step is very similar to camellia and sm4 implementation. There are 2 implementations for s-box step. One is to use AES-NI and affine transformation, which is the same as Camellia, sm4, and others. Another is to use GFNI. GFNI implementation is faster than AES-NI implementation. So, it uses GFNI implementation if the running CPU supports GFNI. There are 4 s-boxes in the ARIA and the 2 s-boxes are the same as AES's s-boxes. To calculate the first sbox, it just uses the aesenclast and then inverts shift_row. No more process is needed for this job because the first s-box is the same as the AES encryption s-box. To calculate the second sbox(invert of s1), it just uses the aesdeclast and then inverts shift_row. No more process is needed for this job because the second s-box is the same as the AES decryption s-box. To calculate the third s-box, it uses the aesenclast, then affine transformation, which is combined AES inverse affine and ARIA S2. To calculate the last s-box, it uses the aesdeclast, then affine transformation, which is combined X2 and AES forward affine. The optimized third and last s-box logic and GFNI s-box logic are implemented by Jussi Kivilinna. The aria-generic implementation is based on a 32-bit implementation, not an 8-bit implementation. the aria-avx Diffusion Layer implementation is based on aria-generic implementation because 8-bit implementation is not fit for parallel implementation but 32-bit is enough to fit for this. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-09-16 12:57:35 +00:00
config CRYPTO_ARIA_AESNI_AVX_X86_64
tristate "Ciphers: ARIA with modes: ECB, CTR (AES-NI/AVX/GFNI)"
depends on 64BIT
crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher The implementation is based on the 32-bit implementation of the aria. Also, aria-avx process steps are the similar to the camellia-avx. 1. Byteslice(16way) 2. Add-round-key. 3. Sbox 4. Diffusion layer. Except for s-box, all steps are the same as the aria-generic implementation. s-box step is very similar to camellia and sm4 implementation. There are 2 implementations for s-box step. One is to use AES-NI and affine transformation, which is the same as Camellia, sm4, and others. Another is to use GFNI. GFNI implementation is faster than AES-NI implementation. So, it uses GFNI implementation if the running CPU supports GFNI. There are 4 s-boxes in the ARIA and the 2 s-boxes are the same as AES's s-boxes. To calculate the first sbox, it just uses the aesenclast and then inverts shift_row. No more process is needed for this job because the first s-box is the same as the AES encryption s-box. To calculate the second sbox(invert of s1), it just uses the aesdeclast and then inverts shift_row. No more process is needed for this job because the second s-box is the same as the AES decryption s-box. To calculate the third s-box, it uses the aesenclast, then affine transformation, which is combined AES inverse affine and ARIA S2. To calculate the last s-box, it uses the aesdeclast, then affine transformation, which is combined X2 and AES forward affine. The optimized third and last s-box logic and GFNI s-box logic are implemented by Jussi Kivilinna. The aria-generic implementation is based on a 32-bit implementation, not an 8-bit implementation. the aria-avx Diffusion Layer implementation is based on aria-generic implementation because 8-bit implementation is not fit for parallel implementation but 32-bit is enough to fit for this. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-09-16 12:57:35 +00:00
select CRYPTO_SKCIPHER
select CRYPTO_ALGAPI
select CRYPTO_ARIA
help
Length-preserving cipher: ARIA cipher algorithms
(RFC 5794) with ECB and CTR modes
Architecture: x86_64 using:
- AES-NI (AES New Instructions)
- AVX (Advanced Vector Extensions)
- GFNI (Galois Field New Instructions)
Processes 16 blocks in parallel.
crypto: x86/aria - implement aria-avx2 aria-avx2 implementation uses AVX2, AES-NI, and GFNI. It supports 32way parallel processing. So, byteslicing code is changed to support 32way parallel. And it exports some aria-avx functions such as encrypt() and decrypt(). There are two main logics, s-box layer and diffusion layer. These codes are the same as aria-avx implementation. But some instruction are exchanged because they don't support 256bit registers. Also, AES-NI doesn't support 256bit register. So, aesenclast and aesdeclast are used twice like below: vextracti128 $1, ymm0, xmm6; vaesenclast xmm7, xmm0, xmm0; vaesenclast xmm7, xmm6, xmm6; vinserti128 $1, xmm6, ymm0, ymm0; Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100: ARIA-AVX2 with GFNI(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption tcrypt: 1 operation in 2003 cycles (1024 bytes) tcrypt: 1 operation in 5867 cycles (4096 bytes) tcrypt: 1 operation in 2358 cycles (1024 bytes) tcrypt: 1 operation in 7295 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption tcrypt: 1 operation in 2004 cycles (1024 bytes) tcrypt: 1 operation in 5956 cycles (4096 bytes) tcrypt: 1 operation in 2409 cycles (1024 bytes) tcrypt: 1 operation in 7564 cycles (4096 bytes) ARIA-AVX with GFNI(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx) encryption tcrypt: 1 operation in 2761 cycles (1024 bytes) tcrypt: 1 operation in 9390 cycles (4096 bytes) tcrypt: 1 operation in 3401 cycles (1024 bytes) tcrypt: 1 operation in 11876 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx) decryption tcrypt: 1 operation in 2735 cycles (1024 bytes) tcrypt: 1 operation in 9424 cycles (4096 bytes) tcrypt: 1 operation in 3369 cycles (1024 bytes) tcrypt: 1 operation in 11954 cycles (4096 bytes) Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-01 09:12:51 +00:00
config CRYPTO_ARIA_AESNI_AVX2_X86_64
tristate "Ciphers: ARIA with modes: ECB, CTR (AES-NI/AVX2/GFNI)"
depends on 64BIT
crypto: x86/aria - implement aria-avx2 aria-avx2 implementation uses AVX2, AES-NI, and GFNI. It supports 32way parallel processing. So, byteslicing code is changed to support 32way parallel. And it exports some aria-avx functions such as encrypt() and decrypt(). There are two main logics, s-box layer and diffusion layer. These codes are the same as aria-avx implementation. But some instruction are exchanged because they don't support 256bit registers. Also, AES-NI doesn't support 256bit register. So, aesenclast and aesdeclast are used twice like below: vextracti128 $1, ymm0, xmm6; vaesenclast xmm7, xmm0, xmm0; vaesenclast xmm7, xmm6, xmm6; vinserti128 $1, xmm6, ymm0, ymm0; Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100: ARIA-AVX2 with GFNI(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption tcrypt: 1 operation in 2003 cycles (1024 bytes) tcrypt: 1 operation in 5867 cycles (4096 bytes) tcrypt: 1 operation in 2358 cycles (1024 bytes) tcrypt: 1 operation in 7295 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption tcrypt: 1 operation in 2004 cycles (1024 bytes) tcrypt: 1 operation in 5956 cycles (4096 bytes) tcrypt: 1 operation in 2409 cycles (1024 bytes) tcrypt: 1 operation in 7564 cycles (4096 bytes) ARIA-AVX with GFNI(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx) encryption tcrypt: 1 operation in 2761 cycles (1024 bytes) tcrypt: 1 operation in 9390 cycles (4096 bytes) tcrypt: 1 operation in 3401 cycles (1024 bytes) tcrypt: 1 operation in 11876 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx) decryption tcrypt: 1 operation in 2735 cycles (1024 bytes) tcrypt: 1 operation in 9424 cycles (4096 bytes) tcrypt: 1 operation in 3369 cycles (1024 bytes) tcrypt: 1 operation in 11954 cycles (4096 bytes) Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-01 09:12:51 +00:00
select CRYPTO_SKCIPHER
select CRYPTO_ALGAPI
select CRYPTO_ARIA
select CRYPTO_ARIA_AESNI_AVX_X86_64
help
Length-preserving cipher: ARIA cipher algorithms
(RFC 5794) with ECB and CTR modes
Architecture: x86_64 using:
- AES-NI (AES New Instructions)
- AVX2 (Advanced Vector Extensions)
- GFNI (Galois Field New Instructions)
Processes 32 blocks in parallel.
crypto: x86/aria - implement aria-avx512 aria-avx512 implementation uses AVX512 and GFNI. It supports 64way parallel processing. So, byteslicing code is changed to support 64way parallel. And it exports some aria-avx2 functions such as encrypt() and decrypt(). AVX and AVX2 have 16 registers. They should use memory to store/load state because of lack of registers. But AVX512 supports 32 registers. So, it doesn't require store/load in the s-box layer. It means that it can reduce overhead of store/load in the s-box layer. Also code become much simpler. Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100: ARIA-AVX512(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx512) encryption tcrypt: 1 operation in 1504 cycles (1024 bytes) tcrypt: 1 operation in 4595 cycles (4096 bytes) tcrypt: 1 operation in 1763 cycles (1024 bytes) tcrypt: 1 operation in 5540 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx512) decryption tcrypt: 1 operation in 1502 cycles (1024 bytes) tcrypt: 1 operation in 4615 cycles (4096 bytes) tcrypt: 1 operation in 1759 cycles (1024 bytes) tcrypt: 1 operation in 5554 cycles (4096 bytes) ARIA-AVX2 with GFNI(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption tcrypt: 1 operation in 2003 cycles (1024 bytes) tcrypt: 1 operation in 5867 cycles (4096 bytes) tcrypt: 1 operation in 2358 cycles (1024 bytes) tcrypt: 1 operation in 7295 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption tcrypt: 1 operation in 2004 cycles (1024 bytes) tcrypt: 1 operation in 5956 cycles (4096 bytes) tcrypt: 1 operation in 2409 cycles (1024 bytes) tcrypt: 1 operation in 7564 cycles (4096 bytes) Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-01 09:12:52 +00:00
config CRYPTO_ARIA_GFNI_AVX512_X86_64
tristate "Ciphers: ARIA with modes: ECB, CTR (AVX512/GFNI)"
depends on 64BIT && AS_GFNI
crypto: x86/aria - implement aria-avx512 aria-avx512 implementation uses AVX512 and GFNI. It supports 64way parallel processing. So, byteslicing code is changed to support 64way parallel. And it exports some aria-avx2 functions such as encrypt() and decrypt(). AVX and AVX2 have 16 registers. They should use memory to store/load state because of lack of registers. But AVX512 supports 32 registers. So, it doesn't require store/load in the s-box layer. It means that it can reduce overhead of store/load in the s-box layer. Also code become much simpler. Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100: ARIA-AVX512(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx512) encryption tcrypt: 1 operation in 1504 cycles (1024 bytes) tcrypt: 1 operation in 4595 cycles (4096 bytes) tcrypt: 1 operation in 1763 cycles (1024 bytes) tcrypt: 1 operation in 5540 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx512) decryption tcrypt: 1 operation in 1502 cycles (1024 bytes) tcrypt: 1 operation in 4615 cycles (4096 bytes) tcrypt: 1 operation in 1759 cycles (1024 bytes) tcrypt: 1 operation in 5554 cycles (4096 bytes) ARIA-AVX2 with GFNI(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption tcrypt: 1 operation in 2003 cycles (1024 bytes) tcrypt: 1 operation in 5867 cycles (4096 bytes) tcrypt: 1 operation in 2358 cycles (1024 bytes) tcrypt: 1 operation in 7295 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption tcrypt: 1 operation in 2004 cycles (1024 bytes) tcrypt: 1 operation in 5956 cycles (4096 bytes) tcrypt: 1 operation in 2409 cycles (1024 bytes) tcrypt: 1 operation in 7564 cycles (4096 bytes) Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-01 09:12:52 +00:00
select CRYPTO_SKCIPHER
select CRYPTO_ALGAPI
select CRYPTO_ARIA
select CRYPTO_ARIA_AESNI_AVX_X86_64
select CRYPTO_ARIA_AESNI_AVX2_X86_64
help
Length-preserving cipher: ARIA cipher algorithms
(RFC 5794) with ECB and CTR modes
Architecture: x86_64 using:
- AVX512 (Advanced Vector Extensions)
- GFNI (Galois Field New Instructions)
Processes 64 blocks in parallel.
config CRYPTO_AEGIS128_AESNI_SSE2
tristate "AEAD ciphers: AEGIS-128 (AES-NI/SSE4.1)"
depends on 64BIT
select CRYPTO_AEAD
help
AEGIS-128 AEAD algorithm
Architecture: x86_64 using:
- AES-NI (AES New Instructions)
- SSE4.1 (Streaming SIMD Extensions 4.1)
config CRYPTO_NHPOLY1305_SSE2
tristate "Hash functions: NHPoly1305 (SSE2)"
depends on 64BIT
select CRYPTO_NHPOLY1305
help
NHPoly1305 hash function for Adiantum
Architecture: x86_64 using:
- SSE2 (Streaming SIMD Extensions 2)
config CRYPTO_NHPOLY1305_AVX2
tristate "Hash functions: NHPoly1305 (AVX2)"
depends on 64BIT
select CRYPTO_NHPOLY1305
help
NHPoly1305 hash function for Adiantum
Architecture: x86_64 using:
- AVX2 (Advanced Vector Extensions 2)
config CRYPTO_POLYVAL_CLMUL_NI
tristate "Hash functions: POLYVAL (CLMUL-NI)"
depends on 64BIT
select CRYPTO_POLYVAL
help
POLYVAL hash function for HCTR2
Architecture: x86_64 using:
- CLMUL-NI (carry-less multiplication new instructions)
config CRYPTO_SM3_AVX_X86_64
tristate "Hash functions: SM3 (AVX)"
depends on 64BIT
select CRYPTO_HASH
select CRYPTO_LIB_SM3
help
SM3 secure hash function as defined by OSCCA GM/T 0004-2012 SM3
Architecture: x86_64 using:
- AVX (Advanced Vector Extensions)
If unsure, say N.
config CRYPTO_GHASH_CLMUL_NI_INTEL
tristate "Hash functions: GHASH (CLMUL-NI)"
depends on 64BIT
select CRYPTO_CRYPTD
help
GCM GHASH hash function (NIST SP800-38D)
Architecture: x86_64 using:
- CLMUL-NI (carry-less multiplication new instructions)
endmenu