glibc/sysdeps
Mahesh Bodapati 82b5340ebd powerpc64: Optimize strcpy and stpcpy for Power9/10
This patch modifies the current Power9 implementation of strcpy and
stpcpy to optimize it for Power9 and Power10.

No new Power10 instructions are used, so the original Power9 strcpy
is modified instead of creating a new implementation for Power10.

The changes also affect stpcpy, which uses the same implementation
with some additional code before returning.

Improvements compared to the old Power9 version:

Use simple comparisons for the first ~512 bytes:
  The main loop is good for long strings, but comparing 16B each time is
  better for shorter strings. After aligning the address to 16 bytes, we
  unroll the loop four times, checking 128 bytes each time. There may be
  some overlap with the main loop for unaligned strings, but it is better
  for shorter strings.

Loop with 64 bytes for longer bytes:
  Use 4 consecutive lxv/stxv instructions.

Showed an average improvement of 13%.

Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-08-23 16:48:32 -05:00
..
aarch64 aarch64: Regenerate ULPs 2024-08-07 11:02:03 -03:00
alpha math: Update alpha ulps 2024-07-14 12:44:15 +02:00
arc ARC: Regenerate ULPs 2024-08-11 15:29:56 +02:00
arm arm: Regenerate ULPs 2024-08-07 11:02:03 -03:00
csky elf: Remove HWCAP_IMPORTANT 2024-06-18 10:45:36 +02:00
generic nptl: Fix Race conditions in pthread cancellation [BZ#12683] 2024-08-23 14:27:43 -03:00
gnu sysdeps: Re-flow and sort multiline gnu/Makefile definitions 2024-08-07 11:02:03 -03:00
hppa elf: Make dl-fptr and dl-symaddr hppa specific 2024-08-19 14:54:07 -03:00
htl hurd: Fix missing pthread_ compat symbol in libc 2024-08-01 23:58:51 +02:00
hurd hurd: Move internal functions to internal header 2024-03-23 22:43:07 +01:00
i386 i386: Regenerate ULPs 2024-08-07 11:02:03 -03:00
ieee754 Convert to autoconf 2.72 (vanilla release, no distribution patches) 2024-06-17 21:15:28 +02:00
loongarch LoongArch: Add cfi instructions for _dl_tlsdesc_dynamic 2024-08-09 09:06:17 +08:00
m68k math: Update m68k ULPs 2024-07-08 21:51:03 +02:00
mach hurd: Fix missing pthread_ compat symbol in libc 2024-08-01 23:58:51 +02:00
microblaze Implement C23 logp1 2024-06-17 13:47:09 +00:00
mips MIPS: Regenerate ULPs 2024-08-08 14:53:53 +02:00
nios2 Convert to autoconf 2.72 (vanilla release, no distribution patches) 2024-06-17 21:15:28 +02:00
nptl nptl: Fix Race conditions in pthread cancellation [BZ#12683] 2024-08-23 14:27:43 -03:00
or1k Implement C23 logp1 2024-06-17 13:47:09 +00:00
posix posix: Sync tempname with gnulib 2024-04-10 14:53:39 -03:00
powerpc powerpc64: Optimize strcpy and stpcpy for Power9/10 2024-08-23 16:48:32 -05:00
pthread nptl: Fix Race conditions in pthread cancellation [BZ#12683] 2024-08-23 14:27:43 -03:00
riscv RISC-V: Regenerate ULPs 2024-08-08 14:53:55 +02:00
s390 s390x: Update ulps 2024-08-08 13:01:02 +02:00
sh nptl: Fix Race conditions in pthread cancellation [BZ#12683] 2024-08-23 14:27:43 -03:00
sparc sparc: Regenerate ULPs 2024-08-07 11:02:03 -03:00
unix nptl: Fix Race conditions in pthread cancellation [BZ#12683] 2024-08-23 14:27:43 -03:00
wordsize-32 Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
wordsize-64 Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
x86 x86: Add Avoid_STOSB tunable to allow NT memset without ERMS 2024-08-15 08:19:15 -07:00
x86_64 nptl: Fix Race conditions in pthread cancellation [BZ#12683] 2024-08-23 14:27:43 -03:00