The kernel_fpu structure has a quite large size of 520 bytes. In order to
reduce stack footprint introduce several kernel fpu structures with
different and also smaller sizes. This way every kernel fpu user must use
the correct variant. A compile time check verifies that the correct variant
is used.
There are several users which use only 16 instead of all 32 vector
registers. For those users the new kernel_fpu_16 structure with a size of
only 266 bytes can be used.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Move, rename, and merge the fpu and vx header files. This way fpu header
files have a consistent naming scheme (fpu*.h).
Also get rid of the fpu subdirectory and move header files to asm
directory, so that all fpu and vx header files can be found at the same
location.
Merge internal.h header file into other header files, since the internal
helpers are used at many locations. so those helper functions are really
not internal.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Code sections in s390 specific kernel code which use floating point or
vector registers all come with a 520 byte stack variable to save already in
use registers, if required.
With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled this variable
will always be initialized on function entry in addition to saving register
contents, which contradicts the intention (performance improvement) of such
code sections.
Therefore provide a DECLARE_KERNEL_FPU_ONSTACK() macro which provides
struct kernel_fpu variables with an __uninitialized attribute, and convert
all existing code to use this.
This way only this specific type of stack variable will not be initialized,
regardless of config options.
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240205154844.3757121-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx() which is a
short readable wrapper for "test_facility(129)".
Facility bit 129 is set if the vector facility is present. test_facility()
returns also true for all bits which are set in the architecture level set
of the cpu that the kernel is compiled for. This means that
test_facility(129) is a compile time constant which returns true for z13
and later, since the vector facility bit is part of the z13 kernel ALS.
In result the compiled code will have less runtime checks, and less code.
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Rework cpufeature implementation to allow for various cpu feature
indications, which is not only limited to hwcap bits. This is achieved
by adding a sequential list of cpu feature numbers, where each of them
is mapped to an entry which indicates what this number is about.
Each entry contains a type member, which indicates what feature
name space to look into (e.g. hwcap, or cpu facility). If wanted this
allows also to automatically load modules only in e.g. z/VM
configurations.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713125644.16121-2-seiden@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Implement a crypto library interface for the s390-native ChaCha20 cipher
algorithm. This allows us to stop to select CRYPTO_CHACHA20 and instead
select CRYPTO_ARCH_HAVE_LIB_CHACHA. This allows BIG_KEYS=y not to build
a whole ChaCha20 crypto infrastructure as a built-in, but build a smaller
CRYPTO_LIB_CHACHA instead.
Make CRYPTO_CHACHA_S390 config entry to look like similar ones on other
architectures. Remove CRYPTO_ALGAPI select as anyway it is selected by
CRYPTO_SKCIPHER.
Add a new test module and a test script for ChaCha20 cipher and its
interfaces. Here are test results on an idle z15 machine:
Data | Generic crypto TFM | s390 crypto TFM | s390 lib
size | enc dec | enc dec | enc dec
-----+--------------------+------------------+----------------
512b | 1545ns 1295ns | 604ns 446ns | 430ns 407ns
4k | 9536ns 9463ns | 2329ns 2174ns | 2170ns 2154ns
64k | 149.6us 149.3us | 34.4us 34.5us | 33.9us 33.1us
6M | 23.61ms 23.11ms | 4223us 4160us | 3951us 4008us
60M | 143.9ms 143.9ms | 33.5ms 33.2ms | 32.2ms 32.1ms
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add an implementation of the ChaCha20 stream cipher (see e.g. RFC 7539)
that makes use of z13's vector instruction set extension.
The original implementation is by Andy Polyakov which is
adapted for kernel use.
Four to six blocks are processed in parallel resulting in a performance
gain for inputs >= 256 bytes.
chacha20-generic
1 operation in 622 cycles (256 bytes)
1 operation in 2346 cycles (1024 bytes)
chacha20-s390
1 operation in 218 cycles (256 bytes)
1 operation in 647 cycles (1024 bytes)
Cc: Andy Polyakov <appro@openssl.org>
Reviewed-by: Harald Freudenberger <freude@de.ibm.com>
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>