glibc

mirror of git://sourceware.org/git/glibc.git synced 2025-03-06 20:58:33 +01:00

Author	SHA1	Message	Date
Paul Eggert	2642002380	Update copyright dates with scripts/update-copyrights	2025-01-01 11:22:09 -08:00
Feifei Wang	ca90758b2a	x86: Enable non-temporal memset for Hygon processors This patch uses 'Avoid_Non_Temporal_Memset' flag to access the non-temporal memset implementation for hygon processors. Test Results: hygon1 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 0.994 4MB 0.996 8MB 0.670 16MB 0.343 32MB 0.355 hygon2 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 1 4MB 1 8MB 1.312 16MB 0.822 32MB 0.830 hygon3 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 1 4MB 0.990 8MB 0.737 16MB 0.390 32MB 0.401 For hygon arch with this patch, non-temporal stores can improve performance by 20% - 65%. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-08-26 10:01:58 -07:00
Feifei Wang	6b08116b2d	x86: Add new architecture type for Hygon processors Add a new architecture type arch_kind_hygon to spilt Hygon branch from AMD. This is to facilitate the Hygon processors to make settings that are suitable for its own characteristics. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-08-26 10:01:58 -07:00
Noah Goldstein	f446d90fe6	x86: Add `Avoid_STOSB` tunable to allow NT memset without ERMS The goal of this flag is to allow targets which don't prefer/have ERMS to still access the non-temporal memset implementation. There are 4 cases for tuning memset: 1) `Avoid_STOSB && Avoid_Non_Temporal_Memset` - Memset with temporal stores 2) `Avoid_STOSB && !Avoid_Non_Temporal_Memset` - Memset with temporal/non-temporal stores. Non-temporal path goes through `rep stosb` path. We accomplish this by setting `x86_rep_stosb_threshold` to `x86_memset_non_temporal_threshold`. 3) `!Avoid_STOSB && Avoid_Non_Temporal_Memset` - Memset with temporal stores/`rep stosb` 3) `!Avoid_STOSB && !Avoid_Non_Temporal_Memset` - Memset with temporal stores/`rep stosb`/non-temporal stores. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-08-15 08:19:15 -07:00
Noah Goldstein	b93dddfaf4	x86: Use `Avoid_Non_Temporal_Memset` to control non-temporal path This is just a refactor and there should be no behavioral change from this commit. The goal is to make `Avoid_Non_Temporal_Memset` a more universal knob for controlling whether we use non-temporal memset rather than having extra logic based on vendor. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-08-15 08:19:15 -07:00
Florian Weimer	0df48472ff	x86: Add missing switch/case fall-through markers to init_cpu_features The commits introducing these fall-throughs intended them to happen. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-08-02 18:08:14 +02:00
Noah Goldstein	5bcf6265f2	x86: Disable non-temporal memset on Skylake Server The original commit enabling non-temporal memset on Skylake Server had erroneous benchmarks (actually done on ICX). Further benchmarks indicate non-temporal stores may in fact by a regression on Skylake Server. This commit may be over-cautious in some cases, but should avoid any regressions for 2.40. Tested using qemu on all x86_64 cpu arch supported by both qemu + GLIBC. Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-07-16 17:20:18 +08:00
MayShao-oc	9dc645cb56	x86: Set default non_temporal_threshold for Zhaoxin processors Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound' on Zhaoxin processors without ERMS. The default 'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000 Zhaoxin processors, this patch updates the value to 'shared / cachesize_non_temporal_divisor'. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-30 06:26:43 -07:00
MayShao-oc	44d757eb9f	x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors Fix code formatting under the Zhaoxin branch and add comments for different Zhaoxin models. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-30 06:26:43 -07:00
H.J. Lu	717ebfa85c	x86-64: Allocate state buffer space for RDI, RSI and RBX _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack. After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to STATE_SAVE_OFFSET(%rsp). +==================+<- stack frame start aligned at 8 or 16 bytes \| \|<- RDI saved in the red zone \| \|<- RSI saved in the red zone \| \|<- RBX saved in the red zone \| \|<- paddings for stack realignment of 64 bytes \|------------------\|<- xsave buffer end aligned at 64 bytes \| \|<- \| \|<- \| \|<- \|------------------\|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) \| \|<- 8-byte padding for 64-byte alignment \| \|<- 8-byte padding for 64-byte alignment \| \|<- R11 \| \|<- R10 \| \|<- R9 \| \|<- R8 \| \|<- RDX \| \|<- RCX +==================+<- RSP aligned at 64 bytes Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI and RBX are saved onto stack without adjusting stack pointer first, using the red-zone. This fixes BZ #31501. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2024-03-18 19:45:13 -07:00
Sunil K Pandey	b6e3898194	x86-64: Simplify minimum ISA check ifdef conditional with if Replace minimum ISA check ifdef conditional with if. Since MINIMUM_X86_ISA_LEVEL and AVX_X86_ISA_LEVEL are compile time constants, compiler will perform constant folding optimization, getting same results. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-03-03 15:47:53 -08:00
H.J. Lu	9b7091415a	x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers _dl_tlsdesc_dynamic should also preserve AMX registers which are caller-saved. Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID to x86-64 TLSDESC_CALL_STATE_SAVE_MASK. Compute the AMX state size and save it in xsave_state_full_size which is only used by _dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec. This fixes the AMX part of BZ #31372. Tested on AMX processor. AMX test is enabled only for compilers with the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098 GCC 14 and GCC 11/12/13 branches have the bug fix. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2024-02-29 04:30:01 -08:00
H.J. Lu	befe2d3c4d	x86-64: Don't use SSE resolvers for ISA level 3 or above When glibc is built with ISA level 3 or above enabled, SSE resolvers aren't available and glibc fails to build: ld: .../elf/librtld.os: in function `init_cpu_features': .../elf/../sysdeps/x86/cpu-features.c:1200:(.text+0x1445f): undefined reference to `_dl_runtime_resolve_fxsave' ld: .../elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object /usr/local/bin/ld: final link failed: bad value For ISA level 3 or above, don't use _dl_runtime_resolve_fxsave nor _dl_tlsdesc_dynamic_fxsave. This fixes BZ #31429. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-02-28 11:49:30 -08:00
H.J. Lu	0aac205a81	x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers Compiler generates the following instruction sequence for GNU2 dynamic TLS access: leaq tls_var@TLSDESC(%rip), %rax call tls_var@TLSCALL(%rax) or leal tls_var@TLSDESC(%ebx), %eax call tls_var@TLSCALL(%eax) CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS and RAX/EAX, are unchanged after CALL. When _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow path. __tls_get_addr is a normal function which doesn't preserve any caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer caller-saved registers, but didn't preserve any other caller-saved registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all caller-saved registers. This fixes BZ #31372. Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic) to optimize elf_machine_runtime_setup. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-02-28 09:02:56 -08:00
H.J. Lu	457bd9cf2e	x86-64: Check if mprotect works before rewriting PLT Systemd execution environment configuration may prohibit changing a memory mapping to become executable: MemoryDenyWriteExecute= Takes a boolean argument. If set, attempts to create memory mappings that are writable and executable at the same time, or to change existing memory mappings to become executable, or mapping shared memory segments as executable, are prohibited. When it is set, systemd service stops working if PLT rewrite is enabled. Check if mprotect works before rewriting PLT. This fixes BZ #31230. This also works with SELinux when deny_execmem is on. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-01-15 06:59:23 -08:00
H.J. Lu	874214db62	i386: Remove CET support bits 1. Remove _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk. 2. Move CET offsets from x86 cpu-features-offsets.sym to x86-64 features-offsets.sym. 3. Rename x86 cet-control.h to x86-64 feature-control.h since it is only for x86-64 and also used for PLT rewrite. 4. Add x86-64 ldsodefs.h to include feature-control.h. 5. Change TUNABLE_CALLBACK (set_plt_rewrite) to x86-64 only. 6. Move x86 dl-procruntime.c to x86-64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-10 05:20:20 -08:00
H.J. Lu	848746e88e	elf: Add ELF_DYNAMIC_AFTER_RELOC to rewrite PLT Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after relocation. For x86-64, add #define DT_X86_64_PLT (DT_LOPROC + 0) #define DT_X86_64_PLTSZ (DT_LOPROC + 1) #define DT_X86_64_PLTENT (DT_LOPROC + 3) 1. DT_X86_64_PLT: The address of the procedure linkage table. 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage table. 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table entry. With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the memory offset of the indirect branch instruction. Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section with direct branch after relocation when the lazy binding is disabled. PLT rewrite is disabled by default since SELinux may disallow modifying code pages and ld.so can't detect it in all cases. Use $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 to enable PLT rewrite with 32-bit direct jump at run-time or $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2 to enable PLT rewrite with 32-bit direct jump and on APX processors with 64-bit absolute jump at run-time. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-05 05:49:49 -08:00
Paul Eggert	dff8da6b3e	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
H.J. Lu	55d63e7312	x86/cet: Don't set CET active by default Not all CET enabled applications and libraries have been properly tested in CET enabled environments. Some CET enabled applications or libraries will crash or misbehave when CET is enabled. Don't set CET active by default so that all applications and libraries will run normally regardless of whether CET is active or not. Shadow stack can be enabled by $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK at run-time if shadow stack can be enabled by kernel. NB: This commit can be reverted if it is OK to enable CET by default for all applications and libraries.	2024-01-01 05:22:48 -08:00
H.J. Lu	541641a3de	x86/cet: Enable shadow stack during startup Previously, CET was enabled by kernel before passing control to user space and the startup code must disable CET if applications or shared libraries aren't CET enabled. Since the current kernel only supports shadow stack and won't enable shadow stack before passing control to user space, we need to enable shadow stack during startup if the application and all shared library are shadow stack enabled. There is no need to disable shadow stack at startup. Shadow stack can only be enabled in a function which will never return. Otherwise, shadow stack will underflow at the function return. 1. GL(dl_x86_feature_1) is set to the CET features which are supported by the processor and are not disabled by the tunable. Only non-zero features in GL(dl_x86_feature_1) should be enabled. After enabling shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check if shadow stack is really enabled. 2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable. It is safe since RTLD_START never returns. 3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static executable. Since the start function using ARCH_SETUP_TLS never returns, it is safe to enable shadow stack in ARCH_SETUP_TLS.	2024-01-01 05:22:48 -08:00
H.J. Lu	edb5e0c8f9	x86/cet: Sync with Linux kernel 6.6 shadow stack interface Sync with Linux kernel 6.6 shadow stack interface. Since only x86-64 is supported, i386 shadow stack codes are unchanged and CET shouldn't be enabled for i386. 1. When the shadow stack base in TCB is unset, the default shadow stack is in use. Use the current shadow stack pointer as the marker for the default shadow stack. It is used to identify if the current shadow stack is the same as the target shadow stack when switching ucontexts. If yes, INCSSP will be used to unwind shadow stack. Otherwise, shadow stack restore token will be used. 2. Allocate shadow stack with the map_shadow_stack syscall. Since there is no function to explicitly release ucontext, there is no place to release shadow stack allocated by map_shadow_stack in ucontext functions. Such shadow stacks will be leaked. 3. Rename arch_prctl CET commands to ARCH_SHSTK_XXX. 4. Rewrite the CET control functions with the current kernel shadow stack interface. Since CET is no longer enabled by kernel, a separate patch will enable shadow stack during startup.	2024-01-01 05:22:48 -08:00
Noah Goldstein	d90b43a4ed	x86: Add support for AVX10 preset and vec size in cpu-features This commit add support for the new AVX10 cpu features: https://cdrdv2-public.intel.com/784267/355989-intel-avx10-spec.pdf We add checks for: - `AVX10`: Check if AVX10 is present. - `AVX10_{X,Y,Z}MM`: Check if a given vec class has AVX10 support. `make check` passes and cpuid output was checked against GNR/DMR on an emulator.	2023-09-29 14:18:42 -05:00
H.J. Lu	1547d6a64f	<sys/platform/x86.h>: Add APX support Add support for Intel Advanced Performance Extensions: https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html to <sys/platform/x86.h>.	2023-07-27 08:42:32 -07:00
Paul Pluzhnikov	4290aed051	Fix misspellings -- BZ 25337	2023-06-19 21:58:33 +00:00
Noah Goldstein	180897c161	x86: Make the divisor in setting `non_temporal_threshold` cpu specific Different systems prefer a different divisors. From benchmarks[1] so far the following divisors have been found: ICX : 2 SKX : 2 BWD : 8 For Intel, we are generalizing that BWD and older prefers 8 as a divisor, and SKL and newer prefers 2. This number can be further tuned as benchmarks are run. [1]: https://github.com/goldsteinn/memcpy-nt-benchmarks Reviewed-by: DJ Delorie <dj@redhat.com>	2023-06-12 11:33:39 -05:00
Noah Goldstein	f193ea20ed	x86: Refactor Intel `init_cpu_features` This patch should have no affect on existing functionality. The current code, which has a single switch for model detection and setting prefered features, is difficult to follow/extend. The cases use magic numbers and many microarchitectures are missing. This makes it difficult to reason about what is implemented so far and/or how/where to add support for new features. This patch splits the model detection and preference setting stages so that CPU preferences can be set based on a complete list of available microarchitectures, rather than based on model magic numbers. Reviewed-by: DJ Delorie <dj@redhat.com>	2023-06-12 11:33:39 -05:00
Paul Pluzhnikov	65cc53fe7c	Fix misspellings in sysdeps/ -- BZ 25337	2023-05-30 23:02:29 +00:00
H.J. Lu	81a3cc956e	<sys/platform/x86.h>: Add PREFETCHI support Add PREFETCHI support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	b05521c916	<sys/platform/x86.h>: Add AMX-COMPLEX support Add AMX-COMPLEX support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	609b7b2d3c	<sys/platform/x86.h>: Add AVX-NE-CONVERT support Add AVX-NE-CONVERT support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	4c120c88a6	<sys/platform/x86.h>: Add AVX-VNNI-INT8 support Add AVX-VNNI-INT8 support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	96037c697d	<sys/platform/x86.h>: Add AVX-IFMA support Add AVX-IFMA support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	8b4cc05eab	<sys/platform/x86.h>: Add AMX-FP16 support Add AMX-FP16 support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	2f02d0d8e1	<sys/platform/x86.h>: Add CMPCCXADD support Add CMPCCXADD support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	231bf916ce	<sys/platform/x86.h>: Add RAO-INT support Add RAO-INT support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	743113d42e	x86: Set FSGSBASE to active if enabled by kernel Linux kernel uses AT_HWCAP2 to indicate if FSGSBASE instructions are enabled. If the HWCAP2_FSGSBASE bit in AT_HWCAP2 is set, FSGSBASE instructions can be used in user space. Define dl_check_hwcap2 to set the FSGSBASE feature to active on Linux when the HWCAP2_FSGSBASE bit is set. Add a test to verify that FSGSBASE is active on current kernels. NB: This test will fail if the kernel doesn't set the HWCAP2_FSGSBASE bit in AT_HWCAP2 while fsgsbase shows up in /proc/cpuinfo. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2023-04-03 11:36:48 -07:00
Adhemerval Zanella Netto	33237fe83d	Remove --enable-tunables configure option And make always supported. The configure option was added on glibc 2.25 and some features require it (such as hwcap mask, huge pages support, and lock elisition tuning). It also simplifies the build permutations. Changes from v1: * Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs more discussion. * Cleanup more code. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-03-29 14:33:06 -03:00
H.J. Lu	317f1c0a8a	x86-64: Add glibc.cpu.prefer_map_32bit_exec [BZ #28656 ] Crossing 2GB boundaries with indirect calls and jumps can use more branch prediction resources on Intel Golden Cove CPU (see the "Misprediction for Branches >2GB" section in Intel 64 and IA-32 Architectures Optimization Reference Manual.) There is visible performance improvement on workloads with many PLT calls when executable and shared libraries are mmapped below 2GB. Add the Prefer_MAP_32BIT_EXEC bit so that mmap will try to map executable or denywrite pages in shared libraries with MAP_32BIT first. NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space layout randomization (ASLR), which is always disabled for SUID programs and can only be enabled by the tunable, glibc.cpu.prefer_map_32bit_exec, or the environment variable, LD_PREFER_MAP_32BIT_EXEC. This works only between shared libraries or between shared libraries and executables with addresses below 2GB. PIEs are usually loaded at a random address above 4GB by the kernel.	2023-02-22 18:28:37 -08:00
Joseph Myers	6d7e8eda9b	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
H.J. Lu	1e000d3d33	x86: Black list more Intel CPUs for TSX [BZ #27398 ] Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html This fixes BZ #27398. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-01-18 14:20:09 -08:00
Paul Eggert	581c785bf3	Update copyright dates with scripts/update-copyrights I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: * 912-#endif remote: * 913: remote: * 914- remote: * error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines	2022-01-01 11:40:24 -08:00
H.J. Lu	ceeffe968c	x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used.	2021-12-06 07:14:12 -08:00
H.J. Lu	14dbbf46a0	x86-64: Remove Prefer_AVX2_STRCMP Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1 VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1 VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake.	2021-11-01 07:53:04 -07:00
H.J. Lu	91cc803d27	x86-64: Add Avoid_Short_Distance_REP_MOVSB commit `3ec5d83d2a` Author: H.J. Lu <hjl.tools@gmail.com> Date: Sat Jan 25 14:19:40 2020 -0800 x86-64: Avoid rep movsb with short distance [BZ #27130] introduced some regressions on Intel processors without Fast Short REP MOV (FSRM). Add Avoid_Short_Distance_REP_MOVSB to avoid rep movsb with short distance only on Intel processors with FSRM. bench-memmove-large on Skylake server shows that cycles of __memmove_evex_unaligned_erms improves for the following data size: before after Improvement length=4127, align1=3, align2=0: 479.38 349.25 27% length=4223, align1=9, align2=5: 405.62 333.25 18% length=8223, align1=3, align2=0: 786.12 496.38 37% length=8319, align1=9, align2=5: 727.50 501.38 31% length=16415, align1=3, align2=0: 1436.88 840.00 41% length=16511, align1=9, align2=5: 1375.50 836.38 39% length=32799, align1=3, align2=0: 2890.00 1860.12 36% length=32895, align1=9, align2=5: 2891.38 1931.88 33%	2021-07-28 13:23:57 -07:00
H.J. Lu	7c124e3714	x86: Install <bits/platform/x86.h> [BZ #27958 ] 1. Install <bits/platform/x86.h> for <sys/platform/x86.h> which includes <bits/platform/x86.h>. 2. Rename HAS_CPU_FEATURE to CPU_FEATURE_PRESENT which checks if the processor has the feature. 3. Rename CPU_FEATURE_USABLE to CPU_FEATURE_ACTIVE which checks if the feature is active. There may be other preconditions, like sufficient stack space or further setup for AMX, which must be satisfied before the feature can be used. This fixes BZ #27958. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-07-23 05:12:51 -07:00
H.J. Lu	ea8e465a6b	x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033 ] From https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html * Intel TSX will be disabled by default. * The processor will force abort all Restricted Transactional Memory (RTM) transactions by default. * A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated, which is set to indicate to updated software that the loaded microcode is forcing RTM abort. * On processors that enumerate support for RTM, the CPUID enumeration bits for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to be set by default after microcode update. * Workloads that were benefited from Intel TSX might experience a change in performance. * System software may use a new bit in Model-Specific Register (MSR) 0x10F TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock Elision (HLE) and RTM bits to indicate to software that Intel TSX is disabled. 1. Add RTM_ALWAYS_ABORT to CPUID features. 2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set. This skips the string/tst-memchr-rtm etc. testcases on the affected processors, which always fail after a microcde update. 3. Check RTM feature, instead of usability, against /proc/cpuinfo. This fixes BZ #28033.	2021-07-01 10:47:35 -07:00
H.J. Lu	ea26ff0322	x86: Copy IBT and SHSTK usable only if CET is enabled IBT and SHSTK usable bits are copied from CPUID feature bits and later cleared if kernel doesn't support CET. Copy IBT and SHSTK usable only if CET is enabled so that they aren't set on CET capable processors with non-CET enabled glibc.	2021-06-23 17:35:47 -07:00
H.J. Lu	1da50d4bda	x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP 1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. 2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.	2021-03-29 07:40:17 -07:00
H.J. Lu	27f7463675	x86: Properly disable XSAVE related features [BZ #27605 ] 1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE. 2. Disable all features which depend on XSAVE: a. If OSXSAVE is disabled by glibc tunables. Or b. If both XSAVE and XSAVEC aren't usable.	2021-03-29 06:04:17 -07:00
H.J. Lu	5ab25c8875	x86: Add PTWRITE feature detection [BZ #27346 ] 1. Add CPUID_INDEX_14_ECX_0 for CPUID leaf 0x14 to detect PTWRITE feature in EBX of CPUID leaf 0x14 with ECX == 0. 2. Add PTWRITE detection to CPU feature tests. 3. Add 2 static CPU feature tests.	2021-02-07 08:01:14 -08:00

1 2 3

119 commits