glibc:hjl/erms/ifunc

Last commit made on 2016-05-25
Get this branch:
git clone -b hjl/erms/ifunc https://git.launchpad.net/glibc

Branch merges

Branch information

Name:
hjl/erms/ifunc
Repository:
lp:glibc

Recent commits

85702d3... by "H.J. Lu" <email address hidden>

X86-64: Add dummy memcopy.h and wordcopy.c

Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and
wordcopy.c to reduce code size. It reduces the size of libc.so by about
1 KB.

 * sysdeps/x86_64/memcopy.h: New file.
 * sysdeps/x86_64/wordcopy.c: Likewise.

550bdb5... by "H.J. Lu" <email address hidden>

X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove

Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones,
we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with
the new ones.

No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used
before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2
memcpy/memmove optimized with Enhanced REP MOVSB will be used for
processors with ERMS. The new AVX512 memcpy/memmove will be used for
processors with AVX512 which prefer vzeroupper.

Since the new SSE2 memcpy/memmove are faster than the previous default
memcpy/memmove used in libc.a and ld.so, we also remove the previous
default memcpy/memmove and make them the default memcpy/memmove, except
that non-temporal store isn't used in ld.so.

Together, it reduces the size of libc.so by about 6 KB and the size of
ld.so by about 2 KB.

 [BZ #19776]
 * sysdeps/x86_64/memcpy.S: Make it dummy.
 * sysdeps/x86_64/mempcpy.S: Likewise.
 * sysdeps/x86_64/memmove.S: New file.
 * sysdeps/x86_64/memmove_chk.S: Likewise.
 * sysdeps/x86_64/multiarch/memmove.S: Likewise.
 * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise.
 * sysdeps/x86_64/memmove.c: Removed.
 * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise.
 * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise.
 * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise.
 * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
 Likewise.
 * sysdeps/x86_64/multiarch/memmove.c: Likewise.
 * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
 * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
 memcpy-sse2-unaligned, memmove-avx-unaligned,
 memcpy-avx-unaligned and memmove-sse2-unaligned-erms.
 * sysdeps/x86_64/multiarch/ifunc-impl-list.c
 (__libc_ifunc_impl_list): Replace
 __memmove_chk_avx512_unaligned_2 with
 __memmove_chk_avx512_unaligned. Remove
 __memmove_chk_avx_unaligned_2. Replace
 __memmove_chk_sse2_unaligned_2 with
 __memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and
 __memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2
 with __memmove_avx512_unaligned. Replace
 __memmove_sse2_unaligned_2 with __memmove_sse2_unaligned.
 Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2
 with __memcpy_chk_avx512_unaligned. Remove
 __memcpy_chk_avx_unaligned_2. Replace
 __memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned.
 Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2.
 Replace __memcpy_avx512_unaligned_2 with
 __memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2
 and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2
 with __mempcpy_chk_avx512_unaligned. Remove
 __mempcpy_chk_avx_unaligned_2. Replace
 __mempcpy_chk_sse2_unaligned_2 with
 __mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2.
 Replace __mempcpy_avx512_unaligned_2 with
 __mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2.
 Replace __mempcpy_sse2_unaligned_2 with
 __mempcpy_sse2_unaligned. Remove __mempcpy_sse2.
 * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support
 __memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned.
 Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms
 if processor has ERMS. Default to __memcpy_sse2_unaligned.
 (ENTRY): Removed.
 (END): Likewise.
 (ENTRY_CHK): Likewise.
 (libc_hidden_builtin_def): Likewise.
 Don't include ../memcpy.S.
 * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support
 __memcpy_chk_avx512_unaligned_erms and
 __memcpy_chk_avx512_unaligned. Use
 __memcpy_chk_avx_unaligned_erms and
 __memcpy_chk_sse2_unaligned_erms if if processor has ERMS.
 Default to __memcpy_chk_sse2_unaligned.
 * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
 Change function suffix from unaligned_2 to unaligned.
 * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support
 __mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned.
 Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms
 if processor has ERMS. Default to __mempcpy_sse2_unaligned.
 (ENTRY): Removed.
 (END): Likewise.
 (ENTRY_CHK): Likewise.
 (libc_hidden_builtin_def): Likewise.
 Don't include ../mempcpy.S.
 (mempcpy): New. Add a weak alias.
 * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support
 __mempcpy_chk_avx512_unaligned_erms and
 __mempcpy_chk_avx512_unaligned. Use
 __mempcpy_chk_avx_unaligned_erms and
 __mempcpy_chk_sse2_unaligned_erms if if processor has ERMS.
 Default to __mempcpy_chk_sse2_unaligned.

1d96e81... by "H.J. Lu" <email address hidden>

X86-64: Remove the previous SSE2/AVX2 memsets

Since the new SSE2/AVX2 memsets are faster than the previous ones, we
can remove the previous SSE2/AVX2 memsets and replace them with the
new ones. This reduces the size of libc.so by about 900 bytes.

No change in IFUNC selection if SSE2 and AVX2 memsets weren't used
before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset
optimized with Enhanced REP STOSB will be used for processors with
ERMS. The new AVX512 memset will be used for processors with AVX512
which prefer vzeroupper.

 [BZ #19881]
 * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded
 into ...
 * sysdeps/x86_64/memset.S: This.
 (__bzero): Removed.
 (__memset_tail): Likewise.
 (__memset_chk): Likewise.
 (memset): Likewise.
 (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't
 defined.
 (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined.
 * sysdeps/x86_64/multiarch/memset-avx2.S: Removed.
 (__memset_zero_constant_len_parameter): Check SHARED instead of
 PIC.
 * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
 memset-avx2 and memset-sse2-unaligned-erms.
 * sysdeps/x86_64/multiarch/ifunc-impl-list.c
 (__libc_ifunc_impl_list): Remove __memset_chk_sse2,
 __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned.
 * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
 (__bzero): Enabled.
 * sysdeps/x86_64/multiarch/memset.S (memset): Replace
 __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned
 and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms
 or __memset_avx2_unaligned_erms if processor has ERMS. Support
 __memset_avx512_unaligned_erms and __memset_avx512_unaligned.
 (memset): Removed.
 (__memset_chk): Likewise.
 (MEMSET_SYMBOL): New.
 (libc_hidden_builtin_def): Replace __memset_sse2 with
 __memset_sse2_unaligned.
 * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace
 __memset_chk_sse2 and __memset_chk_avx2 with
 __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms.
 Use __memset_chk_sse2_unaligned_erms or
 __memset_chk_avx2_unaligned_erms if processor has ERMS. Support
 __memset_chk_avx512_unaligned_erms and
 __memset_chk_avx512_unaligned.

1f921a9... by Joseph Myers <email address hidden>

Do not raise "inexact" from powerpc32 ceil, floor, trunc (bug 15479).

Continuing fixes for ceil, floor and trunc functions not to raise the
"inexact" exception, this patch fixes the versions used on older
powerpc32 processors. As was done with the round implementations some
time ago, the save of floating-point state is moved after the first
floating-point operation on the input to ensure that any "invalid"
exception from signaling NaN input is included in the saved state, and
then the whole state gets restored rather than just the rounding mode.

This has no effect on configurations using the power5+ code, since
such processors can do these operations with a single instruction (and
those instructions do not set "inexact", so are correct for TS 18661-1
semantics).

Tested for powerpc32.

 [BZ #15479]
 * sysdeps/powerpc/powerpc32/fpu/s_ceil.S (__ceil): Move save of
 floating-point state after first floating-point operation on
 input. Restore full floating-point state instead of just rounding
 mode.
 * sysdeps/powerpc/powerpc32/fpu/s_ceilf.S (__ceilf): Likewise.
 * sysdeps/powerpc/powerpc32/fpu/s_floor.S (__floor): Likewise.
 * sysdeps/powerpc/powerpc32/fpu/s_floorf.S (__floorf): Likewise.
 * sysdeps/powerpc/powerpc32/fpu/s_trunc.S (__trunc): Likewise.
 * sysdeps/powerpc/powerpc32/fpu/s_truncf.S (__truncf): Likewise.

7ab1de2... by Stefan Liebler <email address hidden>

Fix UTF-16 surrogate handling. [BZ #19727]

According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
Thus this patch fixes this behaviour for converting from utf32 to internal and
from internal to utf8.

Furthermore the conversion from utf16 to internal does not report an error if the
input-stream consists of two low-surrogate values. If an uint16_t value is in the
range of 0xd800 .. 0xdfff, the next uint16_t value is checked, if it is in the
range of a low surrogate (0xdc00 .. 0xdfff). Afterwards these two uint16_t
values are interpreted as a high- and low-surrogates pair. But there is no test
if the first uint16_t value is really in the range of a high-surrogate
(0xd800 .. 0xdbff). If there would be two uint16_t values in the range of a low
surrogate, then they will be treated as a valid high- and low-surrogates pair.
This patch adds this test.

This patch also adds a new testcase, which checks UTF conversions with input
values in range of UTF16 surrogates. The test converts from UTF-xx to INTERNAL,
INTERNAL to UTF-xx and directly between UTF-xx to UTF-yy. The latter conversion
is needed because s390 has iconv-modules, which converts from/to UTF in one step.
The new testcase was tested on a s390, power and intel machine.

ChangeLog:

 [BZ #19727]
 * iconvdata/utf-16.c (BODY): Report an error if first word is not a
 valid high surrogate.
 * iconvdata/utf-32.c (BODY): Report an error if the value is in range
 of an utf16 surrogate.
 * iconv/gconv_simple.c (BODY): Likewise.
 * iconvdata/bug-iconv12.c: New file.
 * iconvdata/Makefile (tests): Add bug-iconv12.

rename test

8f25676... by Stefan Liebler <email address hidden>

Fix ucs4le_internal_loop in error case. [BZ #19726]

When converting from UCS4LE to INTERNAL, the input-value is checked for a too
large value and the iconv() call sets errno to EILSEQ. In this case the inbuf
argument of the iconv() call should point to the invalid character, but it
points to the beginning of the inbuf.
Thus this patch updates the pointers inptrp and outptrp before returning in
this error case.

This patch also adds a new testcase for this issue.
The new test was tested on a s390, power, intel machine.

ChangeLog:

 [BZ #19726]
 * iconv/gconv_simple.c (ucs4le_internal_loop): Update inptrp and
 outptrp in case of an illegal input.
 * iconv/tst-iconv6.c: New file.
 * iconv/Makefile (tests): Add tst-iconv6.

a42a95c... by Stefan Liebler <email address hidden>

S390: Fix utf32 to utf16 handling of low surrogates (disable cu42).

According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.

Thus the cu42 instruction, which converts from utf32 to utf16, has to be
disabled because it does not report an error in case of a value in range of
a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
vector variant is adjusted to handle the value in range of an utf16 low
surrogate correctly.

ChangeLog:

 * sysdeps/s390/utf16-utf32-z9.c: Disable cu42 instruction and report
 an error in case of a value in range of an utf16 low surrogate.

52f8a48... by Stefan Liebler <email address hidden>

S390: Fix utf32 to utf8 handling of low surrogates (disable cu41).

According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.

Thus the cu41 instruction, which converts from utf32 to utf8, has to be
disabled because it does not report an error in case of a value in range of
a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
vector variant is adjusted to handle the value in range of an utf16 low
surrogate correctly.

ChangeLog:

 * sysdeps/s390/utf8-utf32-z9.c: Disable cu41 instruction and report
 an error in case of a value in range of an utf16 low surrogate.

ee518b7... by Stefan Liebler <email address hidden>

S390: Use s390-64 specific ionv-modules on s390-32, too.

This patch reworks the existing s390 64bit specific iconv modules in order
to use them on s390 31bit, too.

Thus the parts for subdirectory iconvdata in sysdeps/s390/s390-64/Makefile
were moved to sysdeps/s390/Makefile so that they apply on 31bit, too.
All those modules are moved from sysdeps/s390/s390-64 directory to sysdeps/s390.

The iso-8859-1 to/from cp037 module was adjusted, to use brct (branch relative
on count) instruction on 31bit s390 instead of brctg, because the brctg is a
zarch instruction and is not available on a 31bit kernel.

The utf modules are using zarch instructions, thus the directive machinemode
zarch_nohighgprs was added to the inline assemblies to omit the high-gprs flag
in the shared libraries. Otherwise they can't be loaded on a 31bit kernel.
The ifunc resolvers were adjusted in order to call the etf3eh or vector variants
only if zarch instructions are available (64bit kernel in 31bit compat-mode).
Furthermore some variable types were changed. E.g. unsigned long long would be
a register pair on s390 31bit, but we want only one single register.
For variables of type size_t the register contents have to be enlarged from a
32bit to a 64bit value on 31bit, because the inline assemblies uses 64bit values
in such cases.

ChangeLog:

 * sysdeps/s390/s390-64/Makefile (iconvdata-subdirectory):
 Move to ...
 * sysdeps/s390/Makefile: ... here.
 * sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c: Move to ...
 * sysdeps/s390/iso-8859-1_cp037_z900.c: ... here.
 (BRANCH_ON_COUNT): New define.
 (TR_LOOP): Use BRANCH_ON_COUNT instead of brctg.
 * sysdeps/s390/s390-64/utf16-utf32-z9.c: Move to ...
 * sysdeps/s390/utf16-utf32-z9.c: ... here and adjust to
 run on s390-32, too.
 * sysdeps/s390/s390-64/utf8-utf16-z9.c: Move to ...
 * sysdeps/s390/utf8-utf16-z9.c: ... here and adjust to
 run on s390-32, too.
 * sysdeps/s390/s390-64/utf8-utf32-z9.c: Move to ...
 * sysdeps/s390/utf8-utf32-z9.c: ... here and adjust to
 run on s390-32, too.

6896776... by Stefan Liebler <email address hidden>

S390: Optimize utf16-utf32 module.

This patch reworks the s390 specific module to convert between utf16 and utf32.
Now ifunc is used to choose either the c or etf3eh (with convert utf
instruction) variants at runtime.
Furthermore a new vector variant for z13 is introduced which will be build
and chosen if vector support is available at build / runtime.

In case of converting utf 32 to utf16, the vector variant optimizes input of
2byte utf16 characters. The convert utf instruction is used if an utf16
surrogate is found.

For the other direction utf16 to utf32, the cu24 instruction can't be re-
enabled, because it does not report an error, if the input-stream consists of
a single low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13,
too. Thus there is only the c or the new vector variant, which can handle utf16
surrogate characters.

This patch also fixes some whitespace errors. Furthermore, the etf3eh variant is
handling the "UTF-xx//IGNORE" case now. Before they ignored the ignore-case and
always stopped at an error.

ChangeLog:

 * sysdeps/s390/s390-64/utf16-utf32-z9.c: Use ifunc to select c,
 etf3eh or new vector loop-variant.