Commit graph

16790 commits

Author SHA1 Message Date
Samuel Thibault
81c4ec1ca8 htl: Make __pthread_sigmask directly call __sigthreadmask
If no thread was created yet, __pthread_sigstate will not find our ss
because self->kernel_thread is still nul, and then change the global
sigstate instead of our sigstate! We can directly call __sigthreadmask and
skip the (bogus) lookup step.
2025-03-06 02:28:35 +01:00
Samuel Thibault
7a185eb9e9 hurd: Consolidate signal mask change
__pthread_sigstate and __sigprocmask were already the same, except for
clear_pending.
2025-03-06 02:28:35 +01:00
Ronan Pigott
50351e0570 sysdeps: linux: Add BTRFS_SUPER_MAGIC to pathconf
btrfs has a 65535 maximum link count. Include this value in pathconf to
give the real max link count for this filesystem.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-03-05 15:28:31 -03:00
Adhemerval Zanella
6cb703b81d linux: Prefix AT_HWCAP with 0x on LD_SHOW_AUXV
Suggested-by: Stefan Liebler <stli@linux.ibm.com>
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
2025-03-05 11:22:09 -03:00
Adhemerval Zanella
1d60b9dfda Remove dl-procinfo.h
powerpc was the only architecture with arch-specific hooks for
LD_SHOW_AUXV, and with the information moved to ld diagnostics there
is no need to keep the _dl_procinfo hook.

Checked with a build for all affected ABIs.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-03-05 11:22:09 -03:00
Adhemerval Zanella
2fd580ea46 powerpc: Remove unused dl-procinfo.h
The _dl_string_platform is moved to hwcapinfo.h, since it is only used
by hwcapinfo.c and test-get_hwcap internal test.

Checked on powerpc64le-linux-gnu.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-03-05 11:22:09 -03:00
Adhemerval Zanella
a768993c10 powerpc: Move cache geometry information to ld diagnostics
From LD_SHOW_AUXV output.

Checked on powerpc64le-linux-gnu.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-03-05 11:22:09 -03:00
Adhemerval Zanella
8a995670a8 powerpc: Move AT_HWCAP descriptions to ld diagnostics
The ld.so diagnostics already prints AT_HWCAP values, but only in
hexadecimal.  To avoid duplicating the strings, consolidate the
hwcap_names from cpu-features.h on a new file, dl-hwcap-info.h
(and it also improves the hwcap string description with more
values).

For future AT_HWCAP3/AT_HWCAP4 extensions, it is just a matter
to add them on dl-hwcap-info.c so both ld diagnostics and
tunable filtering will parse the new values.

Checked on powerpc64le-linux-gnu.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-03-05 11:22:09 -03:00
Samuel Thibault
ccdb68e829 htl: move pthread_once into libc 2025-03-02 15:37:33 +01:00
Wilco Dijkstra
e5893e6349 Remove unused dl-procinfo.h
Remove unused _dl_hwcap_string defines.  As a result many dl-procinfo.h headers
can be removed.  This also removes target specific _dl_procinfo implementations
which only printed HWCAP strings using dl_hwcap_string.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-28 16:55:18 +00:00
Xi Ruoyao
c0f380c465 LoongArch: Optimize f{max,min}imum_mag_num{,f}
Following the logic of the previous commits.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2025-02-28 11:44:19 +08:00
Xi Ruoyao
efd13567f7 LoongArch: Optimize f{max,min}imum_num{,f}
Following the logic of the previous commits.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2025-02-28 11:44:19 +08:00
Xi Ruoyao
ee4ee1cb02 LoongArch: Optimize f{max,min}imum_mag{,f}
Following the logic of the previous commit.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2025-02-28 11:44:19 +08:00
Xi Ruoyao
0195552e15 LoongArch: Optimize f{max,min}imum{,f}
The code now looks like:

	fclass.s        $fa2, $fa0
	movfr2gr.s      $t0, $fa2
	slli.w          $t0, $t0, 0x0
	fclass.s        $fa2, $fa1
	movfr2gr.s      $t1, $fa2
	or              $t0, $t0, $t1
	andi            $t0, $t0, 0x3
	bnez            $t0, 1f
	fmin.s          $fa0, $fa0, $fa1
	ret
	1:
	fmul.s		$fa0, $fa0, $fa1
	ret

This looks really bad, with expensive movfr2gr instructions, redundant
sign-extensions and masking (arguably it's a compiler
missed-optimzation), and a branch.  Rewrite it with inline assembly:

	fcmp.cor.s      $fcc0, $fa0, $fa0
	fcmp.cor.s      $fcc1, $fa1, $fa1
	fsel            $fa2, $fa0, $fa1, $fcc0
	fsel            $fa0, $fa1, $fa0, $fcc1
	fmax.s          $fa0, $fa2, $fa0
	ret

Note that we cannot make it more readable with
"double a = __builtin_isnanf (x) ? y : x" because this C statement only
happens to produce what we want with https://gcc.gnu.org/PR66462, if
this bug is fixed in the future the generated code may change.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2025-02-28 11:44:19 +08:00
Wilco Dijkstra
0f044be1da AArch64: Use prefer_sve_ifuncs for SVE memset
Use prefer_sve_ifuncs for SVE memset just like memcpy.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
2025-02-27 16:51:57 +00:00
Sergei Zimmerman
9e51ae3cd0 sysdeps/ieee754: Fix remainder sign of zero for FE_DOWNWARD (BZ #32711)
Single-precision remainderf() and quad-precision remainderl()
implementation derived from Sun is affected by an issue when the result
is +-0. IEEE754 requires that if remainder(x, y) = 0, its sign shall be
that of x regardless of the rounding direction.

The implementation seems to have assumed that x - x = +0 in all
rounding modes, which is not the case. When rounding direction is
roundTowardNegative the sign of an exact zero sum (or difference) is −0.

Regression tests that triggered this erroneous behavior are added to
math/libm-test-remainder.inc.

Tested for cross riscv64 and powerpc.

Original fix by: Bruce Evans <bde@FreeBSD.org> in FreeBSD's
a2ddfa5ea726c56dbf825763ad371c261b89b7c7.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-26 17:17:25 -03:00
John David Anglin
2fe5e2af09 math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
A number of fma tests started to fail on hppa when gcc was changed to
use Ranger rather than EVRP.  Eventually I found that the value of
a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
mode and not the original rounding mode:

    if (TININESS_AFTER_ROUNDING)
      {
        w.d = a1 + u.d;
        if (w.ieee.exponent == 109)
          return w.d * 0x1p-108;
      }

This caused the exponent value to be wrong and the wrong return path
to be used.

Here we add an optimization barrier after the rounding mode is reset
to ensure that the previous value of a1 + u.d is not reused.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2025-02-25 15:57:53 -05:00
Yangyu Chen
3fd2ff7685 RISC-V: Fix IFUNC resolver cannot access gp pointer
In some cases, an IFUNC resolver may need to access the gp pointer to
access global variables. Such an object may have l_relocated == 0 at
this time. In this case, an IFUNC resolver will fail to access a global
variable and cause a SIGSEGV.

This patch fixes this issue by relaxing the check of l_relocated in
elf_machine_runtime_setup, but added a check for SHARED case to avoid
using this code in static-linked executables. Such object have already
set up the gp pointer in load_gp function and l->l_scope will be NULL if
it is a pie object. So if we use these code to set up the gp pointer
again for static-pie, it will causing a SIGSEGV in glibc as original bug
on BZ #31317.

I have also reproduced and checked BZ #31317 using the mold commit
bed5b1731b ("illumos: Treat absolute symbols specially"), this patch can
fix the issue.

Also, we used the wrong gp pointer previously because ref->st_value is
not the relocated address but just the offset from the base address of
ELF. An edge case may happen if we reference gp pointer in a IFUNC
resolver in a PIE object, but it will not happen in compiler-generated
codes since -pie will disable relax to gp. In this case, the GP will be
initialized incorrectly since the ref->st_value is not the address after
relocation. This patch fixes this issue by adding the l->l_addr to
ref->st_value to get the relocated address for the gp pointer. We don't
use SYMBOL_ADDRESS macro here because __global_pointer$ is a special
symbol that has SHN_ABS type, but it will use PC-relative addressing in
the load_gp function using lla.

Closes: BZ #32269
Fixes: 96d1b9ac23 ("RISC-V: Fix the static-PIE non-relocated object check")

Co-authored-by: Vivian Wang <dramforever@live.com>
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
2025-02-25 13:08:53 +01:00
Wilco Dijkstra
935563754b AArch64: Remove LP64 and ILP32 ifdefs
Remove LP64 and ILP32 ifdefs.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-24 14:20:29 +00:00
Wilco Dijkstra
4c11379106 AArch64: Simplify lrint
Simplify lrint.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-24 14:20:03 +00:00
Wilco Dijkstra
0a021727bc AArch64: Remove AARCH64_R macro
Remove AArch64_R relocation macro.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-24 14:19:19 +00:00
Wilco Dijkstra
eb7ac024d9 AArch64: Cleanup pointer mangling
Cleanup pointer mangling.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-24 14:17:57 +00:00
Wilco Dijkstra
19860fd42e AArch64: Remove PTR_REG defines
Remove PTR_REG defines.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-24 14:16:55 +00:00
Wilco Dijkstra
ce2f26a22e AArch64: Remove PTR_ARG/SIZE_ARG defines
This series removes various ILP32 defines that are now
no longer needed.

Remove PTR_ARG/SIZE_ARG.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-24 14:15:15 +00:00
koraynilay
29803ed3ce math: Fix unknown type name '__float128' for clang 3.4 to 3.8.1 (bug 32694)
When compiling a program that includes <bits/floatn.h> using a clang version
between 3.4 (included) and 3.8.1 (included), clang will fail with `unknown type
name '__float128'; did you mean '__cfloat128'?`. This changes fixes the clang
prerequirements macro call in floatn.h to check for clang 3.9 instead of 3.4,
since support for __float128 was actually enabled in 3.9 by:

commit 50f29e06a1b6a38f0bba9360cbff72c82d46cdd4
Author: Nemanja Ivanovic <nemanja.i.ibm@gmail.com>
Date:   Wed Apr 13 09:49:45 2016 +0000

    Enable support for __float128 in Clang

This fixes bug 32694.

Signed-off-by: koraynilay <koray.fra@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-02-23 11:47:11 +08:00
Michael Jeanson
689a62a421 nptl: clear the whole rseq area before registration
Due to the extensible nature of the rseq area we can't explictly
initialize fields that are not part of the ABI yet. It was agreed with
upstream that all new fields will be documented as zero initialized by
userspace. Future kernels configured with CONFIG_DEBUG_RSEQ will
validate the content of all fields during registration.

Replace the explicit field initialization with a memset of the whole
rseq area which will cover fields as they are added to future kernels.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-02-21 22:21:25 +00:00
Yury Khrustalev
41f6684557 aarch64: Add GCS test with signal handler
Test that when we return from a function that enabled GCS at runtime
we get SIGSEGV. Also test that ucontext contains GCS block with the
GCS pointer.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-21 16:23:44 +00:00
Yury Khrustalev
15afd01e80 aarch64: Add GCS tests for dlopen
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-21 16:10:44 +00:00
Yury Khrustalev
57ee1deb1f aarch64: Add GCS tests for transitive dependencies
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-21 16:09:06 +00:00
Yury Khrustalev
82decb59bc aarch64: Add tests for Guarded Control Stack
These tests validate that GCS tunable works as expected depending
on the GCS markings in the test binaries.

Tests validate both static and dynamically linked binaries.

These new tests are AArch64 specific. Moreover, they are included only
if linker supports the "-z gcs=<value>" option. If built, these tests
will run on systems with and without HWCAP_GCS. In the latter case the
tests will be reported as UNSUPPORTED.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-21 16:08:00 +00:00
Wilco Dijkstra
163b1bbb76 AArch64: Add SVE memset
Add SVE memset based on the generic memset with predicated load for sizes < 16.
Unaligned memsets of 128-1024 are improved by ~20% on average by using aligned
stores for the last 64 bytes.  Performance of random memset benchmark improves
by ~2% on Neoverse V1.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
2025-02-20 15:31:50 +00:00
H.J. Lu
5a4573be6f x86 (__HAVE_FLOAT128): Defined to 0 for Intel SYCL compiler [BZ #32723]
Intel compiler always defines __INTEL_LLVM_COMPILER.  When SYCL is
enabled by -fsycl, it also defines SYCL_LANGUAGE_VERSION.  Since Intel
SYCL compiler doesn't support _Float128:

https://github.com/intel/llvm/issues/16903

define __HAVE_FLOAT128 to 0 for Intel SYCL compiler.

This fixes BZ #32723.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2025-02-20 08:42:37 +08:00
Adhemerval Zanella
0242c9f9e6 math: Consolidate acosf and asinf internal tables
The libm size improvement built with gcc-14, "--enable-stack-protector=strong
--enable-bind-now=yes --enable-fortify-source=2":

Before:

 582292     844      12  583148   8e5ec aarch64-linux-gnu/math/libm.so
 975133    1076      12  976221   ee55d x86_64-linux-gnu/math/libm.so
1203586    5608     368 1209562  1274da powerpc64le-linux-gnu/math/libm.so

After:

 581972     844      12  582828   8e4ac aarch64-linux-gnu/math/libm.so
 974941    1076      12  976029   ee49d x86_64-linux-gnu/math/libm.so
1203394    5608     368 1209370  12741a powerpc64le-linux-gnu/math/libm.so
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
2025-02-17 10:09:09 -03:00
Adhemerval Zanella
1faccf388a math: Consolidate acospif and asinpif internal tables
The libm size improvement built with gcc-14, "--enable-stack-protector=strong
--enable-bind-now=yes --enable-fortify-source=2":

Before:

   text    data     bss     dec     hex filename
 583444     844      12  584300   8ea6c aarch64-linux-gnu/math/libm.so
 976349    1076      12  977437   eea1d x86_64-linux-gnu/math/libm.so
1204738    5608     368 1210714  12795a powerpc64le-linux-gnu/math/libm.so

After:

 582292     844      12  583148   8e5ec aarch64-linux-gnu/math/libm.so
 975133    1076      12  976221   ee55d x86_64-linux-gnu/math/libm.so
1203586    5608     368 1209562  1274da powerpc64le-linux-gnu/math/libm.so
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
2025-02-17 10:09:09 -03:00
Adhemerval Zanella
246e52574d math: Consolidate cospif and sinpif internal tables
The libm size improvement built with gcc-14, "--enable-stack-protector=strong
--enable-bind-now=yes --enable-fortify-source=2":

Before:

   text    data     bss     dec     hex filename
 584500     844      12  585356   8ee8c aarch64-linux-gnu/math/libm.so
 977341    1076      12  978429   eedfd x86_64-linux-gnu/math/libm.so
1205762    5608     368 1211738  127d5a powerpc64le-linux-gnu/math/libm.so

After:

   text    data     bss     dec     hex filename
 583444     844      12  584300   8ea6c aarch64-linux-gnu/math/libm.so
 976349    1076      12  977437   eea1d x86_64-linux-gnu/math/libm.so
1204738    5608     368 1210714  12795a powerpc64le-linux-gnu/math/libm.so
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
2025-02-17 10:09:09 -03:00
gfleury
4afbc1aa2e htl: don't export __pthread_default_rwlockattr anymore.
since now all symbloy that use it are in libc
Message-ID: <20250216145434.7089-11-gfleury@disroot.org>
2025-02-16 23:43:04 +01:00
gfleury
6f6732c1c4 htl: move pthread_rwlock_init into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20250216145434.7089-10-gfleury@disroot.org>
2025-02-16 23:43:03 +01:00
gfleury
d3ef1b56aa htl: move pthread_rwlock_destroy into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20250216145434.7089-9-gfleury@disroot.org>
2025-02-16 23:42:38 +01:00
gfleury
25650ef6b9 htl: move pthread_rwlock_{rdlock, timedrdlock, timedwrlock, wrlock, clockrdlock, clockwrlock} into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20250216145434.7089-8-gfleury@disroot.org>
2025-02-16 23:08:54 +01:00
gfleury
119798a7b1 htl: move pthread_rwlock_unlock into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20250216145434.7089-7-gfleury@disroot.org>
2025-02-16 23:08:54 +01:00
gfleury
18accc19b9 htl: move pthread_rwlock_tryrdlock, pthread_rwlock_trywrlock into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20250216145434.7089-6-gfleury@disroot.org>
2025-02-16 22:59:34 +01:00
gfleury
4b25413df5 htl: move pthread_rwlockattr_getpshared, pthread_rwlockattr_setpshared into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20250216145434.7089-5-gfleury@disroot.org>
2025-02-16 22:59:25 +01:00
gfleury
cd2d31ed58 htl: move pthread_rwlockattr_destroy into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20250216145434.7089-4-gfleury@disroot.org>
2025-02-16 22:59:16 +01:00
gfleury
e618b671cd htl: move pthread_rwlockattr_init into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20250216145434.7089-3-gfleury@disroot.org>
2025-02-16 22:59:07 +01:00
gfleury
8f842ce13e htl: move __pthread_default_rwlockattr into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
Message-ID: <20250216145434.7089-2-gfleury@disroot.org>
2025-02-16 22:59:00 +01:00
Aurelien Jarno
60f2d6be65 Fix tst-aarch64-pkey to handle ENOSPC as not supported
The syscall pkey_alloc can return ENOSPC to indicate either that all
keys are in use or that the system runs in a mode in which memory
protection keys are disabled. In such case the test should not fail and
just return unsupported.

This matches the behaviour of the generic tst-pkey.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-02-15 11:08:43 +01:00
Yat Long Poon
95e807209b AArch64: Improve codegen for SVE powf
Improve memory access with indexed/unpredicated instructions.
Eliminate register spills.  Speedup on Neoverse V1: 3%.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-02-13 18:16:54 +00:00
Yat Long Poon
0b195651db AArch64: Improve codegen for SVE pow
Move constants to struct.  Improve memory access with indexed/unpredicated
instructions.  Eliminate register spills.  Speedup on Neoverse V1: 24%.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-02-13 18:16:54 +00:00
Yat Long Poon
f5ff34cb3c AArch64: Improve codegen for SVE erfcf
Reduce number of MOV/MOVPRFXs and use unpredicated FMUL.
Replace MUL with LSL.  Speedup on Neoverse V1: 6%.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-02-13 18:16:54 +00:00
Luna Lamb
c0ff447edf Aarch64: Improve codegen in SVE exp and users, and update expf_inline
Use unpredicted muls, and improve memory access.
7%, 3% and 1% improvement in throughput microbenchmark on Neoverse V1,
for exp, exp2 and cosh respectively.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-02-13 18:16:54 +00:00