glibc:tuliom/libmvec

Last commit made on 2020-02-19
Get this branch:
git clone -b tuliom/libmvec https://git.launchpad.net/glibc

Branch merges

Branch information

Name:
tuliom/libmvec
Repository:
lp:glibc

Recent commits

0209f78... by Bert Tenjy <email address hidden>

PPC64: Attach SIMD attribute to cosf, sin, sinf function declarations.

These changes were mistakenly left out of the patches that added SIMD
versions of these functions to libmvec.

Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>

01d11f7... by Shawn Landden

PPC64: Add libmvec SIMD double-precision power function [BZ #24210]

Based off the ./sysdeps/ieee754/dbl-64/pow.c implementation,
and provides identical results.

Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar pow to figure out what is causing the overflow or underflow.

I may have not normalized the data for benchmarking this properly,
but operating only on integers between 0-2^32 and floats between 0.5 and
1 I get the following:

Running 20 times over 32MiB
vector: mean 535.824919 (sd 0.246088)
scalar: mean 286.384220 (sd 0.027630)

Which is a very impressive speed boost.

Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>

4068846... by Shawn Landden

PPC64: Add libmvec SIMD single-precision power function [BZ #24210]

Based off the ./sysdeps/ieee754/flt-32/powf.c implementation,
and thus provides identical results.

Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar powf to figure out what is causing the overflow or underflow.

I may have not normalized the data for benchmarking this properly,
but operating only on floats between 0.5 and 1 I get the following:

Running 20 times over 32MiB
vector: mean 307.659767 (sd 0.203217)
scalar: mean 221.837088 (sd 0.032256)

And with random data there is a decrease in performance:
vector: mean 265.366371 (sd 0.000626)
scalar: mean 279.598078 (sd 0.025592)

Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>

d91313e... by Tulio Magno Quites Machado Filho <email address hidden>

powerpc64: Add support for vec_cmpne for older compilers

vec_cmpne was added to GCC 7, requiring an alternative implementation
when building glibc with GCC 6.

cedb794... by Shawn Landden

PPC64: Add libmvec SIMD double-precision natural exponent function [BZ #24209]

Passes all tests.

Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar expf to figure out what is causing the overflow or underflow.

The special-case path is not vectorized, and performs much worse than
the scalar code.
Normalized data: 1 to 2^32 converted to double
Running 20 times over 32MiB
vector: mean 563.807107 MiB/s (sd 0.390922)
scalar: mean 226.527824 MiB/s (sd 0.077406)

Random data:
vector: mean 80.175986 MiB/s (sd 1.110948)
scalar: mean 244.738130 MiB/s (sd 0.029561)

Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>

af60f86... by Shawn Landden

PPC64: Add libmvec SIMD single-precision natural exponent function [BZ #24209]

Passes all tests.

Based off the ./sysdeps/ieee754/dbl-64/e_exp.c implementation,
and thus provides identical results.

Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar expf to figure out what is causing the overflow or underflow.

Suprisingly the special-case path performs as well as the normal path.
(both of which are vectorized)
Running 20 times over 32MiB
vector: mean 432.263032 MiB/s (sd 0.486733)
scalar: mean 178.646197 MiB/s (sd 0.050013)

Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>

287ae18... by Tulio Magno Quites Machado Filho <email address hidden>

powerpc64: Fix libmvec's logf4 build on GCC < 8

The built-in vec_float was added to GCC 8.0, requiring an alternative
implementation when using older GCC versions.

0704d32... by Bert Tenjy <email address hidden>

PPC64: Add libmvec SIMD single-precision logarithm function [BZ #24208]

Implements single-precision vector logarithm function. The algorithm is
an adaptation of the one in sysdeps/ieee754/flt-32/e_logf.c, modified for
PPC64 VSX hardware. The version of e_logf.c referenced here is from
commit #bf27d3973d.

The patch has been tested on both Little-Endian and Big-Endian. It
passes all the tests for single-precision logarithm run by make check with
max ULP of 1. Integration into the make check infrastructure is adapted from
similar x86_64 changes in commit #774488f88a.

Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>

9c027f1... by Bert Tenjy <email address hidden>

PPC64: Add libmvec SIMD double-precision logarithm function [BZ #24208]

Implements double-precision vector logarithm function. The algorithm is
an adaptation of the one in sysdeps/ieee754/dbl-64, modified to exploit
PPC64 VSX hardware. The version of ieee754/dbl-64 is commit #f41b0a43e4.

The patch has been tested on both Little-Endian and Big-Endian. It
passes all the tests for double-precision logarithm run by make check.
Integration into the make check infrastructure closely follows corres-
ponding changes done for x86_64 in commit #6af25acc7b.

Reviewed-by: Tulio Magno Quites Machado Filho <email address hidden>

fcadb6e... by Tulio Magno Quites Machado Filho <email address hidden>

powerpc64: Fix mathvec build and tests on POWER < 8

vec_d_cos2_vsx.c, vec_d_sin2_vsx.c and vec_d_sincos2_vsx.c use
vec_sl(), which is only available on POWER8 processors.