PPC64: Add libmvec SIMD double-precision power function [BZ #24210]
Based off the ./sysdeps/ieee754/dbl-64/pow.c implementation,
and provides identical results.
Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar pow to figure out what is causing the overflow or underflow.
I may have not normalized the data for benchmarking this properly,
but operating only on integers between 0-2^32 and floats between 0.5 and
1 I get the following:
Running 20 times over 32MiB
vector: mean 535.824919 (sd 0.246088)
scalar: mean 286.384220 (sd 0.027630)
Which is a very impressive speed boost.
Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>
PPC64: Add libmvec SIMD single-precision power function [BZ #24210]
Based off the ./sysdeps/ieee754/flt-32/powf.c implementation,
and thus provides identical results.
Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar powf to figure out what is causing the overflow or underflow.
I may have not normalized the data for benchmarking this properly,
but operating only on floats between 0.5 and 1 I get the following:
Running 20 times over 32MiB
vector: mean 307.659767 (sd 0.203217)
scalar: mean 221.837088 (sd 0.032256)
And with random data there is a decrease in performance:
vector: mean 265.366371 (sd 0.000626)
scalar: mean 279.598078 (sd 0.025592)
Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>
d91313e...
by
Tulio Magno Quites Machado Filho <email address hidden>
powerpc64: Add support for vec_cmpne for older compilers
vec_cmpne was added to GCC 7, requiring an alternative implementation
when building glibc with GCC 6.
PPC64: Add libmvec SIMD double-precision natural exponent function [BZ #24209]
Passes all tests.
Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar expf to figure out what is causing the overflow or underflow.
The special-case path is not vectorized, and performs much worse than
the scalar code.
Normalized data: 1 to 2^32 converted to double
Running 20 times over 32MiB
vector: mean 563.807107 MiB/s (sd 0.390922)
scalar: mean 226.527824 MiB/s (sd 0.077406)
Random data:
vector: mean 80.175986 MiB/s (sd 1.110948)
scalar: mean 244.738130 MiB/s (sd 0.029561)
Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>
PPC64: Add libmvec SIMD single-precision natural exponent function [BZ #24209]
Passes all tests.
Based off the ./sysdeps/ieee754/dbl-64/e_exp.c implementation,
and thus provides identical results.
Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar expf to figure out what is causing the overflow or underflow.
Suprisingly the special-case path performs as well as the normal path.
(both of which are vectorized)
Running 20 times over 32MiB
vector: mean 432.263032 MiB/s (sd 0.486733)
scalar: mean 178.646197 MiB/s (sd 0.050013)
Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>
287ae18...
by
Tulio Magno Quites Machado Filho <email address hidden>
powerpc64: Fix libmvec's logf4 build on GCC < 8
The built-in vec_float was added to GCC 8.0, requiring an alternative
implementation when using older GCC versions.
PPC64: Add libmvec SIMD single-precision logarithm function [BZ #24208]
Implements single-precision vector logarithm function. The algorithm is
an adaptation of the one in sysdeps/ieee754/flt-32/e_logf.c, modified for
PPC64 VSX hardware. The version of e_logf.c referenced here is from
commit #bf27d3973d.
The patch has been tested on both Little-Endian and Big-Endian. It
passes all the tests for single-precision logarithm run by make check with
max ULP of 1. Integration into the make check infrastructure is adapted from
similar x86_64 changes in commit #774488f88a.
Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>
PPC64: Add libmvec SIMD double-precision logarithm function [BZ #24208]
Implements double-precision vector logarithm function. The algorithm is
an adaptation of the one in sysdeps/ieee754/dbl-64, modified to exploit
PPC64 VSX hardware. The version of ieee754/dbl-64 is commit #f41b0a43e4.
The patch has been tested on both Little-Endian and Big-Endian. It
passes all the tests for double-precision logarithm run by make check.
Integration into the make check infrastructure closely follows corres-
ponding changes done for x86_64 in commit #6af25acc7b.
Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>
fcadb6e...
by
Tulio Magno Quites Machado Filho <email address hidden>
powerpc64: Fix mathvec build and tests on POWER < 8
vec_d_cos2_vsx.c, vec_d_sin2_vsx.c and vec_d_sincos2_vsx.c use
vec_sl(), which is only available on POWER8 processors.