- Get this branch:
-
git clone
-b hjl/erms/ifunc
https://git.launchpad.net/glibc
Branch merges
Related source package recipes
Branch information
- Name:
- hjl/erms/ifunc
- Repository:
- lp:glibc
Recent commits
- 85702d3... by "H.J. Lu" <email address hidden>
-
X86-64: Add dummy memcopy.h and wordcopy.c
Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and
wordcopy.c to reduce code size. It reduces the size of libc.so by about
1 KB.* sysdeps/
x86_64/ memcopy. h: New file.
* sysdeps/x86_64/ wordcopy. c: Likewise. - 550bdb5... by "H.J. Lu" <email address hidden>
-
X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove
Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones,
we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with
the new ones.No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used
before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2
memcpy/memmove optimized with Enhanced REP MOVSB will be used for
processors with ERMS. The new AVX512 memcpy/memmove will be used for
processors with AVX512 which prefer vzeroupper.Since the new SSE2 memcpy/memmove are faster than the previous default
memcpy/memmove used in libc.a and ld.so, we also remove the previous
default memcpy/memmove and make them the default memcpy/memmove, except
that non-temporal store isn't used in ld.so.Together, it reduces the size of libc.so by about 6 KB and the size of
ld.so by about 2 KB.[BZ #19776]
* sysdeps/x86_64/ memcpy. S: Make it dummy.
* sysdeps/x86_64/ mempcpy. S: Likewise.
* sysdeps/x86_64/ memmove. S: New file.
* sysdeps/x86_64/ memmove_ chk.S: Likewise.
* sysdeps/x86_64/ multiarch/ memmove. S: Likewise.
* sysdeps/x86_64/ multiarch/ memmove_ chk.S: Likewise.
* sysdeps/x86_64/ memmove. c: Removed.
* sysdeps/x86_64/ multiarch/ memcpy- avx-unaligned. S: Likewise.
* sysdeps/x86_64/ multiarch/ memcpy- sse2-unaligned. S: Likewise.
* sysdeps/x86_64/ multiarch/ memmove- avx-unaligned. S: Likewise.
* sysdeps/x86_64/ multiarch/ memmove- sse2-unaligned- erms.S:
Likewise.
* sysdeps/x86_64/ multiarch/ memmove. c: Likewise.
* sysdeps/x86_64/ multiarch/ memmove_ chk.c: Likewise.
* sysdeps/x86_64/ multiarch/ Makefile (sysdep_routines): Remove
memcpy-sse2-unaligned, memmove- avx-unaligned,
memcpy-avx-unaligned and memmove- sse2-unaligned- erms.
* sysdeps/x86_64/ multiarch/ ifunc-impl- list.c
(__libc_ifunc_impl_ list): Replace
__memmove_chk_avx512_ unaligned_ 2 with
__memmove_chk_avx512_ unaligned. Remove
__memmove_chk_avx_ unaligned_ 2. Replace
__memmove_chk_sse2_ unaligned_ 2 with
__memmove_chk_sse2_ unaligned. Remove __memmove_chk_sse2 and
__memmove_avx_unaligned_ 2. Replace __memmove_ avx512_ unaligned_ 2
with __memmove_avx512_ unaligned. Replace
__memmove_sse2_unaligned_ 2 with __memmove_ sse2_unaligned.
Remove __memmove_sse2. Replace __memcpy_chk_avx512_ unaligned_ 2
with __memcpy_chk_avx512_ unaligned. Remove
__memcpy_chk_avx_ unaligned_ 2. Replace
__memcpy_chk_sse2_ unaligned_ 2 with __memcpy_ chk_sse2_ unaligned.
Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_ 2.
Replace __memcpy_avx512_ unaligned_ 2 with
__memcpy_avx512_ unaligned. Remove __memcpy_ sse2_unaligned_ 2
and __memcpy_sse2. Replace __mempcpy_chk_avx512_ unaligned_ 2
with __mempcpy_chk_avx512_ unaligned. Remove
__mempcpy_chk_avx_ unaligned_ 2. Replace
__mempcpy_chk_sse2_ unaligned_ 2 with
__mempcpy_chk_sse2_ unaligned. Remove __mempcpy_chk_sse2.
Replace __mempcpy_avx512_ unaligned_ 2 with
__mempcpy_avx512_ unaligned. Remove __mempcpy_ avx_unaligned_ 2.
Replace __mempcpy_sse2_unaligned_ 2 with
__mempcpy_sse2_unaligned. Remove __mempcpy_sse2.
* sysdeps/x86_64/ multiarch/ memcpy. S (__new_memcpy): Support
__memcpy_avx512_ unaligned_ erms and __memcpy_ avx512_ unaligned.
Use __memcpy_avx_unaligned_ erms and __memcpy_ sse2_unaligned_ erms
if processor has ERMS. Default to __memcpy_sse2_unaligned.
(ENTRY): Removed.
(END): Likewise.
(ENTRY_CHK): Likewise.
(libc_hidden_ builtin_ def): Likewise.
Don't include ../memcpy.S.
* sysdeps/x86_64/ multiarch/ memcpy_ chk.S (__memcpy_chk): Support
__memcpy_chk_avx512_ unaligned_ erms and
__memcpy_chk_avx512_ unaligned. Use
__memcpy_chk_avx_ unaligned_ erms and
__memcpy_chk_sse2_ unaligned_ erms if if processor has ERMS.
Default to __memcpy_chk_sse2_ unaligned.
* sysdeps/x86_64/ multiarch/ memmove- vec-unaligned- erms.S
Change function suffix from unaligned_2 to unaligned.
* sysdeps/x86_64/ multiarch/ mempcpy. S (__mempcpy): Support
__mempcpy_avx512_ unaligned_ erms and __mempcpy_ avx512_ unaligned.
Use __mempcpy_avx_unaligned_ erms and __mempcpy_ sse2_unaligned_ erms
if processor has ERMS. Default to __mempcpy_sse2_unaligned.
(ENTRY): Removed.
(END): Likewise.
(ENTRY_CHK): Likewise.
(libc_hidden_ builtin_ def): Likewise.
Don't include ../mempcpy.S.
(mempcpy): New. Add a weak alias.
* sysdeps/x86_64/ multiarch/ mempcpy_ chk.S (__mempcpy_chk): Support
__mempcpy_chk_avx512_ unaligned_ erms and
__mempcpy_chk_avx512_ unaligned. Use
__mempcpy_chk_avx_ unaligned_ erms and
__mempcpy_chk_sse2_ unaligned_ erms if if processor has ERMS.
Default to __mempcpy_chk_sse2_ unaligned. - 1d96e81... by "H.J. Lu" <email address hidden>
-
X86-64: Remove the previous SSE2/AVX2 memsets
Since the new SSE2/AVX2 memsets are faster than the previous ones, we
can remove the previous SSE2/AVX2 memsets and replace them with the
new ones. This reduces the size of libc.so by about 900 bytes.No change in IFUNC selection if SSE2 and AVX2 memsets weren't used
before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset
optimized with Enhanced REP STOSB will be used for processors with
ERMS. The new AVX512 memset will be used for processors with AVX512
which prefer vzeroupper.[BZ #19881]
* sysdeps/x86_64/ multiarch/ memset- sse2-unaligned- erms.S: Folded
into ...
* sysdeps/x86_64/ memset. S: This.
(__bzero): Removed.
(__memset_tail): Likewise.
(__memset_chk): Likewise.
(memset): Likewise.
(MEMSET_CHK_SYMBOL) : New. Define only if MEMSET_SYMBOL isn't
defined.
(MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined.
* sysdeps/x86_64/ multiarch/ memset- avx2.S: Removed.
(__memset_zero_constant_ len_parameter) : Check SHARED instead of
PIC.
* sysdeps/x86_64/ multiarch/ Makefile (sysdep_routines): Remove
memset-avx2 and memset-sse2-unaligned- erms.
* sysdeps/x86_64/ multiarch/ ifunc-impl- list.c
(__libc_ifunc_impl_ list): Remove __memset_chk_sse2,
__memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned.
* sysdeps/x86_64/ multiarch/ memset- vec-unaligned- erms.S
(__bzero): Enabled.
* sysdeps/x86_64/ multiarch/ memset. S (memset): Replace
__memset_sse2 and __memset_avx2 with __memset_sse2_unaligned
and __memset_avx2_unaligned. Use __memset_ sse2_unaligned_ erms
or __memset_avx2_unaligned_ erms if processor has ERMS. Support
__memset_avx512_ unaligned_ erms and __memset_ avx512_ unaligned.
(memset): Removed.
(__memset_chk): Likewise.
(MEMSET_SYMBOL): New.
(libc_hidden_ builtin_ def): Replace __memset_sse2 with
__memset_sse2_unaligned.
* sysdeps/x86_64/ multiarch/ memset_ chk.S (__memset_chk): Replace
__memset_chk_sse2 and __memset_chk_avx2 with
__memset_chk_sse2_ unaligned and __memset_ chk_avx2_ unaligned_ erms.
Use __memset_chk_sse2_ unaligned_ erms or
__memset_chk_avx2_ unaligned_ erms if processor has ERMS. Support
__memset_chk_avx512_ unaligned_ erms and
__memset_chk_avx512_ unaligned. - 1f921a9... by Joseph Myers <email address hidden>
-
Do not raise "inexact" from powerpc32 ceil, floor, trunc (bug 15479).
Continuing fixes for ceil, floor and trunc functions not to raise the
"inexact" exception, this patch fixes the versions used on older
powerpc32 processors. As was done with the round implementations some
time ago, the save of floating-point state is moved after the first
floating-point operation on the input to ensure that any "invalid"
exception from signaling NaN input is included in the saved state, and
then the whole state gets restored rather than just the rounding mode.This has no effect on configurations using the power5+ code, since
such processors can do these operations with a single instruction (and
those instructions do not set "inexact", so are correct for TS 18661-1
semantics).Tested for powerpc32.
[BZ #15479]
* sysdeps/powerpc/ powerpc32/ fpu/s_ceil. S (__ceil): Move save of
floating-point state after first floating-point operation on
input. Restore full floating-point state instead of just rounding
mode.
* sysdeps/powerpc/ powerpc32/ fpu/s_ceilf. S (__ceilf): Likewise.
* sysdeps/powerpc/ powerpc32/ fpu/s_floor. S (__floor): Likewise.
* sysdeps/powerpc/ powerpc32/ fpu/s_floorf. S (__floorf): Likewise.
* sysdeps/powerpc/ powerpc32/ fpu/s_trunc. S (__trunc): Likewise.
* sysdeps/powerpc/ powerpc32/ fpu/s_truncf. S (__truncf): Likewise. - 7ab1de2... by Stefan Liebler <email address hidden>
-
Fix UTF-16 surrogate handling. [BZ #19727]
According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware. org/ml/ libc-help/ 2015-12/ msg00015. html.
Thus this patch fixes this behaviour for converting from utf32 to internal and
from internal to utf8.Furthermore the conversion from utf16 to internal does not report an error if the
input-stream consists of two low-surrogate values. If an uint16_t value is in the
range of 0xd800 .. 0xdfff, the next uint16_t value is checked, if it is in the
range of a low surrogate (0xdc00 .. 0xdfff). Afterwards these two uint16_t
values are interpreted as a high- and low-surrogates pair. But there is no test
if the first uint16_t value is really in the range of a high-surrogate
(0xd800 .. 0xdbff). If there would be two uint16_t values in the range of a low
surrogate, then they will be treated as a valid high- and low-surrogates pair.
This patch adds this test.This patch also adds a new testcase, which checks UTF conversions with input
values in range of UTF16 surrogates. The test converts from UTF-xx to INTERNAL,
INTERNAL to UTF-xx and directly between UTF-xx to UTF-yy. The latter conversion
is needed because s390 has iconv-modules, which converts from/to UTF in one step.
The new testcase was tested on a s390, power and intel machine.ChangeLog:
[BZ #19727]
* iconvdata/utf-16.c (BODY): Report an error if first word is not a
valid high surrogate.
* iconvdata/utf-32.c (BODY): Report an error if the value is in range
of an utf16 surrogate.
* iconv/gconv_simple. c (BODY): Likewise.
* iconvdata/bug-iconv12. c: New file.
* iconvdata/Makefile (tests): Add bug-iconv12.rename test
- 8f25676... by Stefan Liebler <email address hidden>
-
Fix ucs4le_
internal_ loop in error case. [BZ #19726] When converting from UCS4LE to INTERNAL, the input-value is checked for a too
large value and the iconv() call sets errno to EILSEQ. In this case the inbuf
argument of the iconv() call should point to the invalid character, but it
points to the beginning of the inbuf.
Thus this patch updates the pointers inptrp and outptrp before returning in
this error case.This patch also adds a new testcase for this issue.
The new test was tested on a s390, power, intel machine.ChangeLog:
[BZ #19726]
* iconv/gconv_simple. c (ucs4le_ internal_ loop): Update inptrp and
outptrp in case of an illegal input.
* iconv/tst-iconv6.c: New file.
* iconv/Makefile (tests): Add tst-iconv6. - a42a95c... by Stefan Liebler <email address hidden>
-
S390: Fix utf32 to utf16 handling of low surrogates (disable cu42).
According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware. org/ml/ libc-help/ 2015-12/ msg00015. html. Thus the cu42 instruction, which converts from utf32 to utf16, has to be
disabled because it does not report an error in case of a value in range of
a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
vector variant is adjusted to handle the value in range of an utf16 low
surrogate correctly.ChangeLog:
* sysdeps/
s390/utf16- utf32-z9. c: Disable cu42 instruction and report
an error in case of a value in range of an utf16 low surrogate. - 52f8a48... by Stefan Liebler <email address hidden>
-
S390: Fix utf32 to utf8 handling of low surrogates (disable cu41).
According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware. org/ml/ libc-help/ 2015-12/ msg00015. html. Thus the cu41 instruction, which converts from utf32 to utf8, has to be
disabled because it does not report an error in case of a value in range of
a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
vector variant is adjusted to handle the value in range of an utf16 low
surrogate correctly.ChangeLog:
* sysdeps/
s390/utf8- utf32-z9. c: Disable cu41 instruction and report
an error in case of a value in range of an utf16 low surrogate. - ee518b7... by Stefan Liebler <email address hidden>
-
S390: Use s390-64 specific ionv-modules on s390-32, too.
This patch reworks the existing s390 64bit specific iconv modules in order
to use them on s390 31bit, too.Thus the parts for subdirectory iconvdata in sysdeps/
s390/s390- 64/Makefile
were moved to sysdeps/s390/Makefile so that they apply on 31bit, too.
All those modules are moved from sysdeps/s390/s390- 64 directory to sysdeps/s390. The iso-8859-1 to/from cp037 module was adjusted, to use brct (branch relative
on count) instruction on 31bit s390 instead of brctg, because the brctg is a
zarch instruction and is not available on a 31bit kernel.The utf modules are using zarch instructions, thus the directive machinemode
zarch_nohighgprs was added to the inline assemblies to omit the high-gprs flag
in the shared libraries. Otherwise they can't be loaded on a 31bit kernel.
The ifunc resolvers were adjusted in order to call the etf3eh or vector variants
only if zarch instructions are available (64bit kernel in 31bit compat-mode).
Furthermore some variable types were changed. E.g. unsigned long long would be
a register pair on s390 31bit, but we want only one single register.
For variables of type size_t the register contents have to be enlarged from a
32bit to a 64bit value on 31bit, because the inline assemblies uses 64bit values
in such cases.ChangeLog:
* sysdeps/
s390/s390- 64/Makefile (iconvdata- subdirectory) :
Move to ...
* sysdeps/s390/Makefile: ... here.
* sysdeps/s390/s390- 64/iso- 8859-1_ cp037_z900. c: Move to ...
* sysdeps/s390/iso- 8859-1_ cp037_z900. c: ... here.
(BRANCH_ON_COUNT): New define.
(TR_LOOP): Use BRANCH_ON_COUNT instead of brctg.
* sysdeps/s390/s390- 64/utf16- utf32-z9. c: Move to ...
* sysdeps/s390/utf16- utf32-z9. c: ... here and adjust to
run on s390-32, too.
* sysdeps/s390/s390- 64/utf8- utf16-z9. c: Move to ...
* sysdeps/s390/utf8- utf16-z9. c: ... here and adjust to
run on s390-32, too.
* sysdeps/s390/s390- 64/utf8- utf32-z9. c: Move to ...
* sysdeps/s390/utf8- utf32-z9. c: ... here and adjust to
run on s390-32, too. - 6896776... by Stefan Liebler <email address hidden>
-
S390: Optimize utf16-utf32 module.
This patch reworks the s390 specific module to convert between utf16 and utf32.
Now ifunc is used to choose either the c or etf3eh (with convert utf
instruction) variants at runtime.
Furthermore a new vector variant for z13 is introduced which will be build
and chosen if vector support is available at build / runtime.In case of converting utf 32 to utf16, the vector variant optimizes input of
2byte utf16 characters. The convert utf instruction is used if an utf16
surrogate is found.For the other direction utf16 to utf32, the cu24 instruction can't be re-
enabled, because it does not report an error, if the input-stream consists of
a single low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13,
too. Thus there is only the c or the new vector variant, which can handle utf16
surrogate characters.This patch also fixes some whitespace errors. Furthermore, the etf3eh variant is
handling the "UTF-xx//IGNORE" case now. Before they ignored the ignore-case and
always stopped at an error.ChangeLog:
* sysdeps/
s390/s390- 64/utf16- utf32-z9. c: Use ifunc to select c,
etf3eh or new vector loop-variant.