glibc:hjl/cacheline/ifunc

Last commit made on 2016-04-25
Get this branch:
git clone -b hjl/cacheline/ifunc https://git.launchpad.net/glibc

Branch merges

Branch information

Name:
hjl/cacheline/ifunc
Repository:
lp:glibc

Recent commits

38d75b3... by "H.J. Lu" <email address hidden>

X86-64: Add dummy memcopy.h and wordcopy.c

Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and
wordcopy.c to reduce code size. It reduces the size of libc.so by about
1 KB.

 * sysdeps/x86_64/memcopy.h: New file.
 * sysdeps/x86_64/wordcopy.c: Likewise.

a3fbfb0... by "H.J. Lu" <email address hidden>

X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove

Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones,
we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with
the new ones.

No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used
before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2
memcpy/memmove optimized with Enhanced REP MOVSB will be used for
processors with ERMS. The new AVX512 memcpy/memmove will be used for
processors with AVX512 which prefer vzeroupper.

Since the new SSE2 memcpy/memmove are faster than the previous default
memcpy/memmove used in libc.a and ld.so, we also remove the previous
default memcpy/memmove and make them the default memcpy/memmove, except
that non-temporal store isn't used in ld.so.

Together, it reduces the size of libc.so by about 6 KB and the size of
ld.so by about 2 KB.

 [BZ #19776]
 * sysdeps/x86_64/memcpy.S: Make it dummy.
 * sysdeps/x86_64/mempcpy.S: Likewise.
 * sysdeps/x86_64/memmove.S: New file.
 * sysdeps/x86_64/memmove_chk.S: Likewise.
 * sysdeps/x86_64/multiarch/memmove.S: Likewise.
 * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise.
 * sysdeps/x86_64/memmove.c: Removed.
 * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise.
 * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise.
 * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise.
 * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
 Likewise.
 * sysdeps/x86_64/multiarch/memmove.c: Likewise.
 * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
 * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
 memcpy-sse2-unaligned, memmove-avx-unaligned,
 memcpy-avx-unaligned and memmove-sse2-unaligned-erms.
 * sysdeps/x86_64/multiarch/ifunc-impl-list.c
 (__libc_ifunc_impl_list): Replace
 __memmove_chk_avx512_unaligned_2 with
 __memmove_chk_avx512_unaligned. Remove
 __memmove_chk_avx_unaligned_2. Replace
 __memmove_chk_sse2_unaligned_2 with
 __memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and
 __memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2
 with __memmove_avx512_unaligned. Replace
 __memmove_sse2_unaligned_2 with __memmove_sse2_unaligned.
 Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2
 with __memcpy_chk_avx512_unaligned. Remove
 __memcpy_chk_avx_unaligned_2. Replace
 __memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned.
 Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2.
 Replace __memcpy_avx512_unaligned_2 with
 __memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2
 and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2
 with __mempcpy_chk_avx512_unaligned. Remove
 __mempcpy_chk_avx_unaligned_2. Replace
 __mempcpy_chk_sse2_unaligned_2 with
 __mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2.
 Replace __mempcpy_avx512_unaligned_2 with
 __mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2.
 Replace __mempcpy_sse2_unaligned_2 with
 __mempcpy_sse2_unaligned. Remove __mempcpy_sse2.
 * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support
 __memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned.
 Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms
 if processor has ERMS. Default to __memcpy_sse2_unaligned.
 (ENTRY): Removed.
 (END): Likewise.
 (ENTRY_CHK): Likewise.
 (libc_hidden_builtin_def): Likewise.
 Don't include ../memcpy.S.
 * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support
 __memcpy_chk_avx512_unaligned_erms and
 __memcpy_chk_avx512_unaligned. Use
 __memcpy_chk_avx_unaligned_erms and
 __memcpy_chk_sse2_unaligned_erms if if processor has ERMS.
 Default to __memcpy_chk_sse2_unaligned.
 * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
 Change function suffix from unaligned_2 to unaligned.
 * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support
 __mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned.
 Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms
 if processor has ERMS. Default to __mempcpy_sse2_unaligned.
 (ENTRY): Removed.
 (END): Likewise.
 (ENTRY_CHK): Likewise.
 (libc_hidden_builtin_def): Likewise.
 Don't include ../mempcpy.S.
 (mempcpy): New. Add a weak alias.
 * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support
 __mempcpy_chk_avx512_unaligned_erms and
 __mempcpy_chk_avx512_unaligned. Use
 __mempcpy_chk_avx_unaligned_erms and
 __mempcpy_chk_sse2_unaligned_erms if if processor has ERMS.
 Default to __mempcpy_chk_sse2_unaligned.

0c91887... by "H.J. Lu" <email address hidden>

X86-64: Remove the previous SSE2/AVX2 memsets

Since the new SSE2/AVX2 memsets are faster than the previous ones, we
can remove the previous SSE2/AVX2 memsets and replace them with the
new ones. This reduces the size of libc.so by about 900 bytes.

No change in IFUNC selection if SSE2 and AVX2 memsets weren't used
before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset
optimized with Enhanced REP STOSB will be used for processors with
ERMS. The new AVX512 memset will be used for processors with AVX512
which prefer vzeroupper.

 [BZ #19881]
 * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded
 into ...
 * sysdeps/x86_64/memset.S: This.
 (__bzero): Removed.
 (__memset_tail): Likewise.
 (__memset_chk): Likewise.
 (memset): Likewise.
 (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't
 defined.
 (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined.
 * sysdeps/x86_64/multiarch/memset-avx2.S: Removed.
 (__memset_zero_constant_len_parameter): Check SHARED instead of
 PIC.
 * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
 memset-avx2 and memset-sse2-unaligned-erms.
 * sysdeps/x86_64/multiarch/ifunc-impl-list.c
 (__libc_ifunc_impl_list): Remove __memset_chk_sse2,
 __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned.
 * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
 (__bzero): Enabled.
 * sysdeps/x86_64/multiarch/memset.S (memset): Replace
 __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned
 and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms
 or __memset_avx2_unaligned_erms if processor has ERMS. Support
 __memset_avx512_unaligned_erms and __memset_avx512_unaligned.
 (memset): Removed.
 (__memset_chk): Likewise.
 (MEMSET_SYMBOL): New.
 (libc_hidden_builtin_def): Replace __memset_sse2 with
 __memset_sse2_unaligned.
 * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace
 __memset_chk_sse2 and __memset_chk_avx2 with
 __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms.
 Use __memset_chk_sse2_unaligned_erms or
 __memset_chk_avx2_unaligned_erms if processor has ERMS. Support
 __memset_chk_avx512_unaligned_erms and
 __memset_chk_avx512_unaligned.

343b5e4... by "H.J. Lu" <email address hidden>

Align to cacheline

8dd19b0... by "H.J. Lu" <email address hidden>

Use PREFETCHED_LOAD_SIZE in loop_4x_vec_xxx

13fd5ab... by "H.J. Lu" <email address hidden>

Rename PREFETCH_SIZE to CACHELINE_SIZE

2a517d9... by Samuel thibault

non-linux: Apply RFC3542 obsoletion of RFC2292 macros

 RFC2292 macros were obsoleted by RFC3542, and should not be exposed
 any more. Notably since IPV6_PKTINFO has been reintroduced with a
 completely different API.

 * bits/in.h (IPV6_PKTINFO): Rename to IPV6_2292PKTINFO.
 (IPV6_HOPOPTS): Rename to IPV6_2292HOPOPTS.
 (IPV6_DSTOPTS): Rename to IPV6_2292DSTOPTS.
 (IPV6_RTHDR): Rename to IPV6_2292RTHDR.
 (IPV6_PKTOPTIONS): Rename to IPV6_2292PKTOPTIONS.
 (IPV6_HOPLIMIT): Rename to IPV6_2292HOPLIMIT.
 (IPV6_RECVPKTINFO): New macro.
 (IPV6_PKTINFO): New macro.

b2cae5d... by Mike Frysinger

tst-fmon/tst-numeric: switch malloc to static stack space [BZ #19671]

The current test code doesn't check the return value of malloc.
This should rarely (if ever) cause a problem, but rather than add
some return value checks, just statically allocate the buffer on
the stack. This will never fail (or if it does, we've got much
bigger problems that don't matter to the test).

4964bb4... by Mike Frysinger

tst-langinfo: update yesexpr/noexpr baselines

2bc983b... by "H.J. Lu" <email address hidden>

Reduce number of mmap calls from __libc_memalign in ld.so

__libc_memalign in ld.so allocates one page at a time and tries to
optimize consecutive __libc_memalign calls by hoping that the next
mmap is after the current memory allocation.

However, the kernel hands out mmap addresses in top-down order, so
this optimization in practice never happens, with the result that we
have more mmap calls and waste a bunch of space for each __libc_memalign.

This change makes __libc_memalign to mmap one page extra. Worst case,
the kernel never puts a backing page behind it, but best case it allows
__libc_memalign to operate much much better. For elf/tst-align --direct,
it reduces number of mmap calls from 12 to 9.

 * elf/dl-minimal.c (__libc_memalign): Mmap one extra page.