2022-08-20 13:41:39 -05:00
|
|
|
# SPDX-License-Identifier: GPL-2.0
|
|
|
|
|
|
|
|
menu "Accelerated Cryptographic Algorithms for CPU (x86)"
|
|
|
|
|
|
|
|
config CRYPTO_CURVE25519_X86
|
2025-02-27 15:48:39 +08:00
|
|
|
tristate
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2025-02-27 15:48:39 +08:00
|
|
|
select CRYPTO_KPP
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_LIB_CURVE25519_GENERIC
|
2025-02-27 15:48:39 +08:00
|
|
|
select CRYPTO_ARCH_HAVE_LIB_CURVE25519
|
|
|
|
default CRYPTO_LIB_CURVE25519_INTERNAL
|
2022-08-20 13:41:45 -05:00
|
|
|
help
|
|
|
|
Curve25519 algorithm
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- ADX (large integer arithmetic)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_AES_NI_INTEL
|
2024-08-27 11:50:01 -07:00
|
|
|
tristate "Ciphers: AES, modes: ECB, CBC, CTS, CTR, XCTR, XTS, GCM (AES-NI/VAES)"
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_AEAD
|
|
|
|
select CRYPTO_LIB_AES
|
crypto: x86/aes-gcm - add VAES and AVX512 / AVX10 optimized AES-GCM
Add implementations of AES-GCM for x86_64 CPUs that support VAES (vector
AES), VPCLMULQDQ (vector carryless multiplication), and either AVX512 or
AVX10. There are two implementations, sharing most source code: one
using 256-bit vectors and one using 512-bit vectors. This patch
improves AES-GCM performance by up to 162%; see Tables 1 and 2 below.
I wrote the new AES-GCM assembly code from scratch, focusing on
correctness, performance, code size (both source and binary), and
documenting the source. The new assembly file aes-gcm-avx10-x86_64.S is
about 1200 lines including extensive comments, and it generates less
than 8 KB of binary code. The main loop does 4 vectors at a time, with
the AES and GHASH instructions interleaved. Any remainder is handled
using a simple 1 vector at a time loop, with masking.
Several VAES + AVX512 implementations of AES-GCM exist from Intel,
including one in OpenSSL and one proposed for inclusion in Linux in 2021
(https://lore.kernel.org/linux-crypto/1611386920-28579-6-git-send-email-megha.dey@intel.com/).
These aren't really suitable to be used, though, due to the massive
amount of binary code generated (696 KB for OpenSSL, 200 KB for Linux)
and well as the significantly larger amount of assembly source (4978
lines for OpenSSL, 1788 lines for Linux). Also, Intel's code does not
support 256-bit vectors, which makes it not usable on future
AVX10/256-only CPUs, and also not ideal for certain Intel CPUs that have
downclocking issues. So I ended up starting from scratch. Usually my
much shorter code is actually slightly faster than Intel's AVX512 code,
though it depends on message length and on which of Intel's
implementations is used; for details, see Tables 3 and 4 below.
To facilitate potential integration into other projects, I've
dual-licensed aes-gcm-avx10-x86_64.S under Apache-2.0 OR BSD-2-Clause,
the same as the recently added RISC-V crypto code.
The following two tables summarize the performance improvement over the
existing AES-GCM code in Linux that uses AES-NI and AVX2:
Table 1: AES-256-GCM encryption throughput improvement,
CPU microarchitecture vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
----------------------+-------+-------+-------+-------+-------+-------+
Intel Ice Lake | 42% | 48% | 60% | 62% | 70% | 69% |
Intel Sapphire Rapids | 157% | 145% | 162% | 119% | 96% | 96% |
Intel Emerald Rapids | 156% | 144% | 161% | 115% | 95% | 100% |
AMD Zen 4 | 103% | 89% | 78% | 56% | 54% | 54% |
| 300 | 200 | 64 | 63 | 16 |
----------------------+-------+-------+-------+-------+-------+
Intel Ice Lake | 66% | 48% | 49% | 70% | 53% |
Intel Sapphire Rapids | 80% | 60% | 41% | 62% | 38% |
Intel Emerald Rapids | 79% | 60% | 41% | 62% | 38% |
AMD Zen 4 | 51% | 35% | 27% | 32% | 25% |
Table 2: AES-256-GCM decryption throughput improvement,
CPU microarchitecture vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
----------------------+-------+-------+-------+-------+-------+-------+
Intel Ice Lake | 42% | 48% | 59% | 63% | 67% | 71% |
Intel Sapphire Rapids | 159% | 145% | 161% | 125% | 102% | 100% |
Intel Emerald Rapids | 158% | 144% | 161% | 124% | 100% | 103% |
AMD Zen 4 | 110% | 95% | 80% | 59% | 56% | 54% |
| 300 | 200 | 64 | 63 | 16 |
----------------------+-------+-------+-------+-------+-------+
Intel Ice Lake | 67% | 56% | 46% | 70% | 56% |
Intel Sapphire Rapids | 79% | 62% | 39% | 61% | 39% |
Intel Emerald Rapids | 80% | 62% | 40% | 58% | 40% |
AMD Zen 4 | 49% | 36% | 30% | 35% | 28% |
The above numbers are percentage improvements in single-thread
throughput, so e.g. an increase from 4000 MB/s to 6000 MB/s would be
listed as 50%. They were collected by directly measuring the Linux
crypto API performance using a custom kernel module. Note that indirect
benchmarks (e.g. 'cryptsetup benchmark' or benchmarking dm-crypt I/O)
include more overhead and won't see quite as much of a difference. All
these benchmarks used an associated data length of 16 bytes. Note that
AES-GCM is almost always used with short associated data lengths.
The following two tables summarize how the performance of my code
compares with Intel's AVX512 AES-GCM code, both the version that is in
OpenSSL and the version that was proposed for inclusion in Linux.
Neither version exists in Linux currently, but these are alternative
AES-GCM implementations that could be chosen instead of mine. I
collected the following numbers on Emerald Rapids using a userspace
benchmark program that calls the assembly functions directly.
I've also included a comparison with Cloudflare's AES-GCM implementation
from https://boringssl-review.googlesource.com/c/boringssl/+/65987/3.
Table 3: VAES-based AES-256-GCM encryption throughput in MB/s,
implementation name vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
---------------------+-------+-------+-------+-------+-------+-------+
This implementation | 14171 | 12956 | 12318 | 9588 | 7293 | 6449 |
AVX512_Intel_OpenSSL | 14022 | 12467 | 11863 | 9107 | 5891 | 6472 |
AVX512_Intel_Linux | 13954 | 12277 | 11530 | 8712 | 6627 | 5898 |
AVX512_Cloudflare | 12564 | 11050 | 10905 | 8152 | 5345 | 5202 |
| 300 | 200 | 64 | 63 | 16 |
---------------------+-------+-------+-------+-------+-------+
This implementation | 4939 | 3688 | 1846 | 1821 | 738 |
AVX512_Intel_OpenSSL | 4629 | 4532 | 2734 | 2332 | 1131 |
AVX512_Intel_Linux | 4035 | 2966 | 1567 | 1330 | 639 |
AVX512_Cloudflare | 3344 | 2485 | 1141 | 1127 | 456 |
Table 4: VAES-based AES-256-GCM decryption throughput in MB/s,
implementation name vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
---------------------+-------+-------+-------+-------+-------+-------+
This implementation | 14276 | 13311 | 13007 | 11086 | 8268 | 8086 |
AVX512_Intel_OpenSSL | 14067 | 12620 | 12421 | 9587 | 5954 | 7060 |
AVX512_Intel_Linux | 14116 | 12795 | 11778 | 9269 | 7735 | 6455 |
AVX512_Cloudflare | 13301 | 12018 | 11919 | 9182 | 7189 | 6726 |
| 300 | 200 | 64 | 63 | 16 |
---------------------+-------+-------+-------+-------+-------+
This implementation | 6454 | 5020 | 2635 | 2602 | 1079 |
AVX512_Intel_OpenSSL | 5184 | 5799 | 2957 | 2545 | 1228 |
AVX512_Intel_Linux | 4394 | 4247 | 2235 | 1635 | 922 |
AVX512_Cloudflare | 4289 | 3851 | 1435 | 1417 | 574 |
So, usually my code is actually slightly faster than Intel's code,
though the OpenSSL implementation has a slight edge on messages shorter
than 256 bytes in this microbenchmark. (This also holds true when doing
the same tests on AMD Zen 4.) It can be seen that the large code size
(up to 94x larger!) of the Intel implementations doesn't seem to bring
much benefit, so starting from scratch with much smaller code, as I've
done, seems appropriate. The performance of my code on messages shorter
than 256 bytes could be improved through a limited amount of unrolling,
but it's unclear it would be worth it, given code size considerations
(e.g. caches) that don't get measured in microbenchmarks.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-06-02 15:22:19 -07:00
|
|
|
select CRYPTO_LIB_GF128MUL
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_ALGAPI
|
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Block cipher: AES cipher algorithms
|
|
|
|
AEAD cipher: AES with GCM
|
2024-08-27 11:50:01 -07:00
|
|
|
Length-preserving ciphers: AES with ECB, CBC, CTS, CTR, XCTR, XTS
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86 (32-bit and 64-bit) using:
|
|
|
|
- AES-NI (AES new instructions)
|
2024-08-27 11:50:01 -07:00
|
|
|
- VAES (Vector AES)
|
|
|
|
|
|
|
|
Some algorithm implementations are supported only in 64-bit builds,
|
|
|
|
and some have additional prerequisites such as AVX2 or AVX512.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_BLOWFISH_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Blowfish, modes: ECB, CBC"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_BLOWFISH_COMMON
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Block cipher: Blowfish cipher algorithm
|
|
|
|
Length-preserving ciphers: Blowfish with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_CAMELLIA_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Camellia with modes: ECB, CBC"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Block cipher: Camellia cipher algorithms
|
|
|
|
Length-preserving ciphers: Camellia with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Camellia with modes: ECB, CBC (AES-NI/AVX)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_CAMELLIA_X86_64
|
|
|
|
imply CRYPTO_XTS
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: Camellia with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AES-NI (AES New Instructions)
|
|
|
|
- AVX (Advanced Vector Extensions)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_CAMELLIA_AESNI_AVX2_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Camellia with modes: ECB, CBC (AES-NI/AVX2)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_CAMELLIA_AESNI_AVX_X86_64
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: Camellia with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AES-NI (AES New Instructions)
|
|
|
|
- AVX2 (Advanced Vector Extensions 2)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_CAST5_AVX_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: CAST5 with modes: ECB, CBC (AVX)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_CAST5
|
|
|
|
select CRYPTO_CAST_COMMON
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: CAST5 (CAST-128) cipher algorithm
|
|
|
|
(RFC2144) with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AVX (Advanced Vector Extensions)
|
|
|
|
|
|
|
|
Processes 16 blocks in parallel.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_CAST6_AVX_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: CAST6 with modes: ECB, CBC (AVX)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_CAST6
|
|
|
|
select CRYPTO_CAST_COMMON
|
|
|
|
imply CRYPTO_XTS
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: CAST6 (CAST-256) cipher algorithm
|
|
|
|
(RFC2612) with ECB and CBC modes
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AVX (Advanced Vector Extensions)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Processes eight blocks in parallel.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_DES3_EDE_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Triple DES EDE with modes: ECB, CBC"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_LIB_DES
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Block cipher: Triple DES EDE (FIPS 46-3) cipher algorithm
|
|
|
|
Length-preserving ciphers: Triple DES EDE with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64
|
|
|
|
|
|
|
|
Processes one or three blocks in parallel.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_SERPENT_SSE2_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Serpent with modes: ECB, CBC (SSE2)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_SERPENT
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: Serpent cipher algorithm
|
|
|
|
with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64 using:
|
|
|
|
- SSE2 (Streaming SIMD Extensions 2)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Processes eight blocks in parallel.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_SERPENT_SSE2_586
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Serpent with modes: ECB, CBC (32-bit with SSE2)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on !64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_SERPENT
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: Serpent cipher algorithm
|
|
|
|
with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86 (32-bit) using:
|
|
|
|
- SSE2 (Streaming SIMD Extensions 2)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Processes four blocks in parallel.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_SERPENT_AVX_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Serpent with modes: ECB, CBC (AVX)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_SERPENT
|
|
|
|
imply CRYPTO_XTS
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: Serpent cipher algorithm
|
|
|
|
with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AVX (Advanced Vector Extensions)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Processes eight blocks in parallel.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_SERPENT_AVX2_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Serpent with modes: ECB, CBC (AVX2)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SERPENT_AVX_X86_64
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: Serpent cipher algorithm
|
|
|
|
with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AVX2 (Advanced Vector Extensions 2)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Processes 16 blocks in parallel.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_SM4_AESNI_AVX_X86_64
|
2023-09-16 17:16:52 +08:00
|
|
|
tristate "Ciphers: SM4 with modes: ECB, CBC, CTR (AES-NI/AVX)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_ALGAPI
|
|
|
|
select CRYPTO_SM4
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: SM4 cipher algorithms
|
2023-09-16 17:16:52 +08:00
|
|
|
(OSCCA GB/T 32907-2016) with ECB, CBC, and CTR modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AES-NI (AES New Instructions)
|
|
|
|
- AVX (Advanced Vector Extensions)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Through two affine transforms,
|
2022-08-20 13:41:39 -05:00
|
|
|
we can use the AES S-Box to simulate the SM4 S-Box to achieve the
|
|
|
|
effect of instruction acceleration.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config CRYPTO_SM4_AESNI_AVX2_X86_64
|
2023-09-16 17:16:52 +08:00
|
|
|
tristate "Ciphers: SM4 with modes: ECB, CBC, CTR (AES-NI/AVX2)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_ALGAPI
|
|
|
|
select CRYPTO_SM4
|
|
|
|
select CRYPTO_SM4_AESNI_AVX_X86_64
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving ciphers: SM4 cipher algorithms
|
2023-09-16 17:16:52 +08:00
|
|
|
(OSCCA GB/T 32907-2016) with ECB, CBC, and CTR modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AES-NI (AES New Instructions)
|
|
|
|
- AVX2 (Advanced Vector Extensions 2)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Through two affine transforms,
|
2022-08-20 13:41:39 -05:00
|
|
|
we can use the AES S-Box to simulate the SM4 S-Box to achieve the
|
|
|
|
effect of instruction acceleration.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config CRYPTO_TWOFISH_586
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Twofish (32-bit)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on !64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_ALGAPI
|
|
|
|
select CRYPTO_TWOFISH_COMMON
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Block cipher: Twofish cipher algorithm
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86 (32-bit)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_TWOFISH_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Twofish"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_ALGAPI
|
|
|
|
select CRYPTO_TWOFISH_COMMON
|
|
|
|
imply CRYPTO_CTR
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Block cipher: Twofish cipher algorithm
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_TWOFISH_X86_64_3WAY
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Twofish with modes: ECB, CBC (3-way parallel)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_TWOFISH_COMMON
|
|
|
|
select CRYPTO_TWOFISH_X86_64
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving cipher: Twofish cipher algorithm
|
|
|
|
with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Processes three blocks in parallel, better utilizing resources of
|
|
|
|
out-of-order CPUs.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_TWOFISH_AVX_X86_64
|
2022-08-20 13:41:50 -05:00
|
|
|
tristate "Ciphers: Twofish with modes: ECB, CBC (AVX)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_TWOFISH_COMMON
|
|
|
|
select CRYPTO_TWOFISH_X86_64
|
|
|
|
select CRYPTO_TWOFISH_X86_64_3WAY
|
|
|
|
imply CRYPTO_XTS
|
|
|
|
help
|
2022-08-20 13:41:50 -05:00
|
|
|
Length-preserving cipher: Twofish cipher algorithm
|
|
|
|
with ECB and CBC modes
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AVX (Advanced Vector Extensions)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
2022-08-20 13:41:50 -05:00
|
|
|
Processes eight blocks in parallel.
|
2022-08-20 13:41:39 -05:00
|
|
|
|
crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher
The implementation is based on the 32-bit implementation of the aria.
Also, aria-avx process steps are the similar to the camellia-avx.
1. Byteslice(16way)
2. Add-round-key.
3. Sbox
4. Diffusion layer.
Except for s-box, all steps are the same as the aria-generic
implementation. s-box step is very similar to camellia and
sm4 implementation.
There are 2 implementations for s-box step.
One is to use AES-NI and affine transformation, which is the same as
Camellia, sm4, and others.
Another is to use GFNI.
GFNI implementation is faster than AES-NI implementation.
So, it uses GFNI implementation if the running CPU supports GFNI.
There are 4 s-boxes in the ARIA and the 2 s-boxes are the same as
AES's s-boxes.
To calculate the first sbox, it just uses the aesenclast and then
inverts shift_row.
No more process is needed for this job because the first s-box is
the same as the AES encryption s-box.
To calculate the second sbox(invert of s1), it just uses the aesdeclast
and then inverts shift_row.
No more process is needed for this job because the second s-box is
the same as the AES decryption s-box.
To calculate the third s-box, it uses the aesenclast,
then affine transformation, which is combined AES inverse affine and
ARIA S2.
To calculate the last s-box, it uses the aesdeclast,
then affine transformation, which is combined X2 and AES forward affine.
The optimized third and last s-box logic and GFNI s-box logic are
implemented by Jussi Kivilinna.
The aria-generic implementation is based on a 32-bit implementation,
not an 8-bit implementation. the aria-avx Diffusion Layer implementation
is based on aria-generic implementation because 8-bit implementation is
not fit for parallel implementation but 32-bit is enough to fit for this.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-09-16 12:57:35 +00:00
|
|
|
config CRYPTO_ARIA_AESNI_AVX_X86_64
|
|
|
|
tristate "Ciphers: ARIA with modes: ECB, CTR (AES-NI/AVX/GFNI)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher
The implementation is based on the 32-bit implementation of the aria.
Also, aria-avx process steps are the similar to the camellia-avx.
1. Byteslice(16way)
2. Add-round-key.
3. Sbox
4. Diffusion layer.
Except for s-box, all steps are the same as the aria-generic
implementation. s-box step is very similar to camellia and
sm4 implementation.
There are 2 implementations for s-box step.
One is to use AES-NI and affine transformation, which is the same as
Camellia, sm4, and others.
Another is to use GFNI.
GFNI implementation is faster than AES-NI implementation.
So, it uses GFNI implementation if the running CPU supports GFNI.
There are 4 s-boxes in the ARIA and the 2 s-boxes are the same as
AES's s-boxes.
To calculate the first sbox, it just uses the aesenclast and then
inverts shift_row.
No more process is needed for this job because the first s-box is
the same as the AES encryption s-box.
To calculate the second sbox(invert of s1), it just uses the aesdeclast
and then inverts shift_row.
No more process is needed for this job because the second s-box is
the same as the AES decryption s-box.
To calculate the third s-box, it uses the aesenclast,
then affine transformation, which is combined AES inverse affine and
ARIA S2.
To calculate the last s-box, it uses the aesdeclast,
then affine transformation, which is combined X2 and AES forward affine.
The optimized third and last s-box logic and GFNI s-box logic are
implemented by Jussi Kivilinna.
The aria-generic implementation is based on a 32-bit implementation,
not an 8-bit implementation. the aria-avx Diffusion Layer implementation
is based on aria-generic implementation because 8-bit implementation is
not fit for parallel implementation but 32-bit is enough to fit for this.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-09-16 12:57:35 +00:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_ALGAPI
|
|
|
|
select CRYPTO_ARIA
|
|
|
|
help
|
|
|
|
Length-preserving cipher: ARIA cipher algorithms
|
|
|
|
(RFC 5794) with ECB and CTR modes
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AES-NI (AES New Instructions)
|
|
|
|
- AVX (Advanced Vector Extensions)
|
|
|
|
- GFNI (Galois Field New Instructions)
|
|
|
|
|
|
|
|
Processes 16 blocks in parallel.
|
|
|
|
|
crypto: x86/aria - implement aria-avx2
aria-avx2 implementation uses AVX2, AES-NI, and GFNI.
It supports 32way parallel processing.
So, byteslicing code is changed to support 32way parallel.
And it exports some aria-avx functions such as encrypt() and decrypt().
There are two main logics, s-box layer and diffusion layer.
These codes are the same as aria-avx implementation.
But some instruction are exchanged because they don't support 256bit
registers.
Also, AES-NI doesn't support 256bit register.
So, aesenclast and aesdeclast are used twice like below:
vextracti128 $1, ymm0, xmm6;
vaesenclast xmm7, xmm0, xmm0;
vaesenclast xmm7, xmm6, xmm6;
vinserti128 $1, xmm6, ymm0, ymm0;
Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100:
ARIA-AVX2 with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption
tcrypt: 1 operation in 2003 cycles (1024 bytes)
tcrypt: 1 operation in 5867 cycles (4096 bytes)
tcrypt: 1 operation in 2358 cycles (1024 bytes)
tcrypt: 1 operation in 7295 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption
tcrypt: 1 operation in 2004 cycles (1024 bytes)
tcrypt: 1 operation in 5956 cycles (4096 bytes)
tcrypt: 1 operation in 2409 cycles (1024 bytes)
tcrypt: 1 operation in 7564 cycles (4096 bytes)
ARIA-AVX with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx) encryption
tcrypt: 1 operation in 2761 cycles (1024 bytes)
tcrypt: 1 operation in 9390 cycles (4096 bytes)
tcrypt: 1 operation in 3401 cycles (1024 bytes)
tcrypt: 1 operation in 11876 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx) decryption
tcrypt: 1 operation in 2735 cycles (1024 bytes)
tcrypt: 1 operation in 9424 cycles (4096 bytes)
tcrypt: 1 operation in 3369 cycles (1024 bytes)
tcrypt: 1 operation in 11954 cycles (4096 bytes)
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-01 09:12:51 +00:00
|
|
|
config CRYPTO_ARIA_AESNI_AVX2_X86_64
|
|
|
|
tristate "Ciphers: ARIA with modes: ECB, CTR (AES-NI/AVX2/GFNI)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
crypto: x86/aria - implement aria-avx2
aria-avx2 implementation uses AVX2, AES-NI, and GFNI.
It supports 32way parallel processing.
So, byteslicing code is changed to support 32way parallel.
And it exports some aria-avx functions such as encrypt() and decrypt().
There are two main logics, s-box layer and diffusion layer.
These codes are the same as aria-avx implementation.
But some instruction are exchanged because they don't support 256bit
registers.
Also, AES-NI doesn't support 256bit register.
So, aesenclast and aesdeclast are used twice like below:
vextracti128 $1, ymm0, xmm6;
vaesenclast xmm7, xmm0, xmm0;
vaesenclast xmm7, xmm6, xmm6;
vinserti128 $1, xmm6, ymm0, ymm0;
Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100:
ARIA-AVX2 with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption
tcrypt: 1 operation in 2003 cycles (1024 bytes)
tcrypt: 1 operation in 5867 cycles (4096 bytes)
tcrypt: 1 operation in 2358 cycles (1024 bytes)
tcrypt: 1 operation in 7295 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption
tcrypt: 1 operation in 2004 cycles (1024 bytes)
tcrypt: 1 operation in 5956 cycles (4096 bytes)
tcrypt: 1 operation in 2409 cycles (1024 bytes)
tcrypt: 1 operation in 7564 cycles (4096 bytes)
ARIA-AVX with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx) encryption
tcrypt: 1 operation in 2761 cycles (1024 bytes)
tcrypt: 1 operation in 9390 cycles (4096 bytes)
tcrypt: 1 operation in 3401 cycles (1024 bytes)
tcrypt: 1 operation in 11876 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx) decryption
tcrypt: 1 operation in 2735 cycles (1024 bytes)
tcrypt: 1 operation in 9424 cycles (4096 bytes)
tcrypt: 1 operation in 3369 cycles (1024 bytes)
tcrypt: 1 operation in 11954 cycles (4096 bytes)
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-01 09:12:51 +00:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_ALGAPI
|
|
|
|
select CRYPTO_ARIA
|
|
|
|
select CRYPTO_ARIA_AESNI_AVX_X86_64
|
|
|
|
help
|
|
|
|
Length-preserving cipher: ARIA cipher algorithms
|
|
|
|
(RFC 5794) with ECB and CTR modes
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AES-NI (AES New Instructions)
|
|
|
|
- AVX2 (Advanced Vector Extensions)
|
|
|
|
- GFNI (Galois Field New Instructions)
|
|
|
|
|
|
|
|
Processes 32 blocks in parallel.
|
|
|
|
|
2023-01-01 09:12:52 +00:00
|
|
|
config CRYPTO_ARIA_GFNI_AVX512_X86_64
|
|
|
|
tristate "Ciphers: ARIA with modes: ECB, CTR (AVX512/GFNI)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT && AS_GFNI
|
2023-01-01 09:12:52 +00:00
|
|
|
select CRYPTO_SKCIPHER
|
|
|
|
select CRYPTO_ALGAPI
|
|
|
|
select CRYPTO_ARIA
|
|
|
|
select CRYPTO_ARIA_AESNI_AVX_X86_64
|
|
|
|
select CRYPTO_ARIA_AESNI_AVX2_X86_64
|
|
|
|
help
|
|
|
|
Length-preserving cipher: ARIA cipher algorithms
|
|
|
|
(RFC 5794) with ECB and CTR modes
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AVX512 (Advanced Vector Extensions)
|
|
|
|
- GFNI (Galois Field New Instructions)
|
|
|
|
|
|
|
|
Processes 64 blocks in parallel.
|
|
|
|
|
2022-08-20 13:41:39 -05:00
|
|
|
config CRYPTO_AEGIS128_AESNI_SSE2
|
2024-10-16 17:00:46 -07:00
|
|
|
tristate "AEAD ciphers: AEGIS-128 (AES-NI/SSE4.1)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_AEAD
|
|
|
|
help
|
2022-08-20 13:41:47 -05:00
|
|
|
AEGIS-128 AEAD algorithm
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AES-NI (AES New Instructions)
|
2024-10-16 17:00:46 -07:00
|
|
|
- SSE4.1 (Streaming SIMD Extensions 4.1)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_NHPOLY1305_SSE2
|
2022-08-20 13:41:48 -05:00
|
|
|
tristate "Hash functions: NHPoly1305 (SSE2)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_NHPOLY1305
|
|
|
|
help
|
2022-08-20 13:41:48 -05:00
|
|
|
NHPoly1305 hash function for Adiantum
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- SSE2 (Streaming SIMD Extensions 2)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_NHPOLY1305_AVX2
|
2022-08-20 13:41:48 -05:00
|
|
|
tristate "Hash functions: NHPoly1305 (AVX2)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_NHPOLY1305
|
|
|
|
help
|
2022-08-20 13:41:48 -05:00
|
|
|
NHPoly1305 hash function for Adiantum
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AVX2 (Advanced Vector Extensions 2)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_POLYVAL_CLMUL_NI
|
2022-08-20 13:41:48 -05:00
|
|
|
tristate "Hash functions: POLYVAL (CLMUL-NI)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_POLYVAL
|
|
|
|
help
|
2022-08-20 13:41:48 -05:00
|
|
|
POLYVAL hash function for HCTR2
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- CLMUL-NI (carry-less multiplication new instructions)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
config CRYPTO_SM3_AVX_X86_64
|
2022-08-20 13:41:48 -05:00
|
|
|
tristate "Hash functions: SM3 (AVX)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_HASH
|
2025-04-12 18:57:29 +08:00
|
|
|
select CRYPTO_LIB_SM3
|
2022-08-20 13:41:39 -05:00
|
|
|
help
|
2022-08-20 13:41:48 -05:00
|
|
|
SM3 secure hash function as defined by OSCCA GM/T 0004-2012 SM3
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- AVX (Advanced Vector Extensions)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config CRYPTO_GHASH_CLMUL_NI_INTEL
|
2022-08-20 13:41:48 -05:00
|
|
|
tristate "Hash functions: GHASH (CLMUL-NI)"
|
2025-04-22 08:27:07 -07:00
|
|
|
depends on 64BIT
|
2022-08-20 13:41:39 -05:00
|
|
|
select CRYPTO_CRYPTD
|
|
|
|
help
|
2022-08-20 13:41:48 -05:00
|
|
|
GCM GHASH hash function (NIST SP800-38D)
|
|
|
|
|
|
|
|
Architecture: x86_64 using:
|
|
|
|
- CLMUL-NI (carry-less multiplication new instructions)
|
2022-08-20 13:41:39 -05:00
|
|
|
|
|
|
|
endmenu
|