Update i686 libm-test-ulps to fix
FAIL: math/test-float64x-cospi
FAIL: math/test-float64x-sinpi
FAIL: math/test-ldouble-cospi
FAIL: math/test-ldouble-sinpi
when building glibc with GCC 7.4.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Add an additional test of TLS variables, with different alignment,
accessed from different modules. The idea of the alignment test is
similar to tst-tlsalign and the same code is shared for setting up
test variables, but unlike the tst-tlsalign code, there are multiple
threads and variables are accessed from multiple objects to verify
that they get a consistent notion of the address of an object within a
thread. Threads are repeatedly created and shut down to verify proper
initialization in each new thread. The test is also repeated with TLS
descriptors when supported. (However, only initial-exec TLS is
covered in this test.)
Tested for x86_64.
There already was a branch checking for this case in _hurd_fd_read ()
when the data is returned out-of-line. Do the same for inline data, as
well as for _hurd_fd_write (). It's also not possible for the length to
be negative, since it's stored in an unsigned integer.
Not verifying the returned length can confuse the callers who assume
the returned length is always reasonable. This manifested as libzstd
test suite failing on writes to /dev/zero, even though the write () call
appeared to succeed. In fact, the zero store backing /dev/zero was
returning a larger written length than the size actually submitted to
it, which is a separate bug to be fixed on the Hurd side. With this
patch, EGRATUITOUS is now propagated to the caller.
Reported-by: Diego Nieto Cid <dnietoc@gmail.com>
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20241204112915.540032-1-bugaevc@gmail.com>
Fix variables in Makefiles:
1. There is a tab, not a space, between "variable" and =, +=, :=.
2. The last entry doesn't have a trailing \.
and sort them.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the sinpi functions (sin(pi*x)).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the cospi functions (cos(pi*x)).
Tested for x86_64 and x86, and with build-many-glibcs.py.
Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 bytes
on 64-bit targets) for calloc. Use repeated stores with 1 branch, instead
of up to 3 branches. On x86-64, it is faster than memset since calling
memset needs 1 indirect branch, 1 broadcast, and up to 4 branches.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Esperanto, as an international language and a bit of a non-locale,
usually defaults to international consensus. In this commit, I make the
Esperanto locale more in line with ISO 8601 by setting the first day as
Monday, and the first week as containing January 4.
Closes: BZ #32323
Signed-off-by: Carmen Bianca BAKKER <carmen@carmenbianca.eu>
Reviewed-by: Mike FABIAN <mfabian@redhat.com>
This does not describe how to use RTLD_DI_ORIGIN and l_name
to reconstruct a full path for the an object. The reason
is that I think we should not recommend further use of
RTLD_DI_ORIGIN due to its buffer overflow potential (bug 24298).
This should be covered by another dlinfo extension. It would
also obsolete the need for the dladdr approach to obtain
the file name for the main executable.
Obtaining the lowest address from load segments in program
headers is quite clumsy and should be provided directly
via dlinfo.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Add tests that the dynamic linker works correctly with symbol names
involving hash collisions, for both choices of hash style (and
--hash-style=both as well). I note that there weren't actually any
previous tests using --hash-style (so tests would only cover the
default linker configuration in that regard). Also test symbol
versions involving hash collisions.
Tested for x86_64.
Add a threaded test for pthread_spin_trylock attempting to lock already
acquired spin lock and checking for correct return code.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Large chunks get added to the unsorted bin since
sorting them takes time, for small chunks the
benefit of adding them to the unsorted bin is
non-existant, actually hurting performance.
Splitting and malloc_consolidate still add small
chunks to unsorted, but we can hint the compiler
that that is a relatively rare occurance.
Benchmarking shows this to be consistently good.
Authored-by: k4lizen <k4lizen@proton.me>
Signed-off-by: Aleksa Siriški <sir@tmina.org>
Remove ZVA 128 support from memset - the new memset no longer
guarantees count >= 256, which can result in underflow and a
crash if ZVA size is 128 ([1]). Since only one CPU uses a ZVA
size of 128 and its memcpy implementation was removed in commit
e162ab2bf1, remove this special
case too.
[1] https://sourceware.org/pipermail/libc-alpha/2024-November/161626.html
Reviewed-by: Andrew Pinski <quic_apinski@quicinc.com>
Add a descriptive comment to the tst-pthread-cpuclockid-invalid test and
also drop pthread_getcpuclockid from the TODO-testing list since it now
has full coverage.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
GCC 15 (e876acab6cdd84bb2b32c98fc69fb0ba29c81153) and binutils
(e7a16d9fd65098045ef5959bf98d990f12314111) both removed all Nios II
support, and the architecture has been EOL'ed by the vendor. The
kernel still has support, but without a proper compiler there
is no much sense in keep it on glibc.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Trivial cleanup to limit _IO_least_marker so that it's clear that it is
unused outside of genops.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Tcache is an important optimzation to accelerate memory free(), things
within this code path should be kept as simple as possible. This commit
try to remove the function call when free() invokes tcache code path by
inlining _int_free().
Result of bench-malloc-thread benchmark
Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)
Threads# | Ratio
-----------|------
1 thread | 0.879
4 threads | 0.874
The performance data shows it can improve bench-malloc-thread benchmark
by ~12% in both single thread and multi-thread scenario.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Some CORE-MATH routines uses roundeven and most of ISA do not have
an specific instruction for the operation. In this case, the call
will be routed to generic implementation.
However, if the ISA does support round() and ctz() there is a better
alternative (as used by CORE-MATH).
This patch adds such optimization and also enables it on powerpc.
On a power10 it shows the following improvement:
expm1f master patched improvement
latency 9.8574 7.0139 28.85%
reciprocal-throughput 4.3742 2.6592 39.21%
Checked on powerpc64le-linux-gnu and aarch64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The constants themselves were added to elf.h back in 8754a4133e but the
array in _dl_show_auxv wasn't modified accordingly, resulting in the
following output when running LD_SHOW_AUXV=1 /bin/true on recent Linux:
AT_??? (0x1b): 0x1c
AT_??? (0x1c): 0x20
With this patch:
AT_RSEQ_FEATURE_SIZE: 28
AT_RSEQ_ALIGN: 32
Tested on Linux 6.11 x86_64
Signed-off-by: Yannick Le Pennec <yannick.lepennec@live.fr>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The test was added in commit ac8cc9e300
without all the required Makefile scaffolding. Tweak the test
so that it actually builds (including with dynamic SIGSTKSZ).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
When adding explicit initialization of rseq fields prior to
registration, I glossed over the fact that 'cpu_id_start' is also
documented as initialized by user-space.
While current kernels don't validate the content of this field on
registration, future ones could.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Add ROP protect instructions to strncpy and ppc-mount functions.
Modify FRAME_MIN_SIZE to 48 bytes for ELFv2 to reserve additional
16 bytes for ROP save slot and padding.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The k>>31 in signgam = 1 - (((k&(k>>31))&1)<<1); is not portable:
* The ISO C standard says "If E1 has a signed type and a negative
value, the resulting value is implementation-defined." (this is
still in C23).
* If the int type is larger than 32 bits (e.g. a 64-bit type),
then k = INT_MAX; line 144 will make k>>31 put 1 in bit 0
(thus signgam will be -1) while 0 is expected.
Moreover, instead of the fx >= 0x1p31f condition, testing fx >= 0
is probably better for 2 reasons:
The signgam expression has more or less a condition on the sign
of fx (the goal of k>>31, which can be dropped with this new
condition). Since fx ≥ 0 should be the most common case, one can
get signgam directly in this case (value 1). And this simplifies
the expression for the other case (fx < 0).
This new condition may be easier/faster to test on the processor
(e.g. by avoiding a load of a constant from the memory).
This is commit d41459c731865516318f813cf4c966dafa0eecbf from CORE-MATH.
Checked on x86_64-linux-gnu.
Exercise the case where an exited thread will cause
pthread_getcpuclockid to fail.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Test coverage of sem_getvalue is fairly limited. Add a test that runs
it on threads on each CPU. For this purpose I adapted
tst-skeleton-thread-affinity.c; it didn't seem very suitable to use
as-is or include directly in a different test doing things per-CPU,
but did seem a suitable starting point (thus sharing
tst-skeleton-affinity.c) for such testing.
Tested for x86_64.
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.
The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):
latency master patched improvement
x86_64 82.3961 54.8052 33.49%
x86_64v2 82.3415 54.8052 33.44%
x86_64v3 69.3661 50.4864 27.22%
i686 219.271 45.5396 79.23%
aarch64 29.2127 19.1951 34.29%
power10 19.5060 16.2760 16.56%
reciprocal-throughput master patched improvement
x86_64 28.3976 19.7334 30.51%
x86_64v2 28.4568 19.7334 30.65%
x86_64v3 21.1815 16.1811 23.61%
i686 105.016 15.1426 85.58%
aarch64 18.1573 10.7681 40.70%
power10 8.7207 8.7097 0.13%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
It is based on binary64 erfc-inputs, with random inputs in
[0,b=0x1.41bbf6p+3] where b in the smallest number such that
erfcf(b) rounds to 0 (to nearest).
Reviewed-by: DJ Delorie <dj@redhat.com>
It is based on binary64 erf-inputs, with random inputs in [0,b=0x1.f5a888p+1]
where b in the smallest number such that erff(b) rounds to 1 (to nearest).
Reviewed-by: DJ Delorie <dj@redhat.com>
For a static PIE with non-zero load address, its PT_DYNAMIC segment
entries contain the relocated values for the load address in static PIE.
Since static PIE usually doesn't have PT_PHDR segment, use p_vaddr of
the PT_LOAD segment with offset == 0 as the load address in static PIE
and adjust the entries of PT_DYNAMIC segment in static PIE by properly
setting the l_addr field for static PIE. This fixes BZ #31799.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>