glibc:hjl/erms/i386

Last commit made on 2016-03-28
Get this branch:
git clone -b hjl/erms/i386 https://git.launchpad.net/glibc

Branch merges

Branch information

Name:
hjl/erms/i386
Repository:
lp:glibc

Recent commits

c835e4c... by "H.J. Lu" <email address hidden>

Add 32-bit Enhanced REP MOVSB/STOSB (ERMS) memcpy/memset

Add and test 32-bit memcpy/memset with Enhanced REP MOVSB/STOSB (ERMS).

 * sysdeps/i386/i686/multiarch/Makefile (sysdep_routines): Add
 bcopy-erms, memcpy-erms, memmove-erms, mempcpy-erms, bzero-erms
 and memset-erms.
 * sysdeps/i386/i686/multiarch/bcopy-erms.S: New file.
 * sysdeps/i386/i686/multiarch/bzero-erms.S: Likewise.
 * sysdeps/i386/i686/multiarch/memcpy-erms.S: Likewise.
 * sysdeps/i386/i686/multiarch/memmove-erms.S: Likewise.
 * sysdeps/i386/i686/multiarch/mempcpy-erms.S: Likewise.
 * sysdeps/i386/i686/multiarch/memset-erms.S: Likewise.
 * sysdeps/i386/i686/multiarch/ifunc-impl-list.c
 (__libc_ifunc_impl_list): Add __bcopy_erms, __bzero_erms,
 __memmove_chk_erms, __memmove_erms, __memset_chk_erms,
 __memset_erms, __memcpy_chk_erms, __memcpy_erms,
 __mempcpy_chk_erms and __mempcpy_erms.

e41b395... by "H.J. Lu" <email address hidden>

[x86] Add a feature bit: Fast_Unaligned_Copy

On AMD processors, memcpy optimized with unaligned SSE load is
slower than emcpy optimized with aligned SSSE3 while other string
functions are faster with unaligned SSE load. A feature bit,
Fast_Unaligned_Copy, is added to select memcpy optimized with
unaligned SSE load.

 [BZ #19583]
 * sysdeps/x86/cpu-features.c (init_cpu_features): Set
 Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel
 processors. Set Fast_Copy_Backward for AMD Excavator
 processors.
 * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy):
 New.
 (index_arch_Fast_Unaligned_Copy): Likewise.
 * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check
 Fast_Unaligned_Copy instead of Fast_Unaligned_Load.

b66d837... by Florian Weimer

resolv: Always set *resplen2 out parameter in send_dg [BZ #19791]

Since commit 44d20bca52ace85850012b0ead37b360e3ecd96e (Implement
second fallback mode for DNS requests), there is a code path which
returns early, before *resplen2 is initialized. This happens if the
name server address is immediately recognized as invalid (because of
lack of protocol support, or if it is a broadcast address such
255.255.255.255, or another invalid address).

If this happens and *resplen2 was non-zero (which is the case if a
previous query resulted in a failure), __libc_res_nquery would reuse
an existing second answer buffer. This answer has been previously
identified as unusable (for example, it could be an NXDOMAIN
response). Due to the presence of a second answer, no name server
switching will occur. The result is a name resolution failure,
although a successful resolution would have been possible if name
servers have been switched and queries had proceeded along the search
path.

The above paragraph still simplifies the situation. Before glibc
2.23, if the second answer needed malloc, the stub resolver would
still attempt to reuse the second answer, but this is not possible
because __libc_res_nsearch has freed it, after the unsuccessful call
to __libc_res_nquerydomain, and set the buffer pointer to NULL. This
eventually leads to an assertion failure in __libc_res_nquery:

 /* Make sure both hp and hp2 are defined */
 assert((hp != NULL) && (hp2 != NULL));

If assertions are disabled, the consequence is a NULL pointer
dereference on the next line.

Starting with glibc 2.23, as a result of commit
e9db92d3acfe1822d56d11abcea5bfc4c41cf6ca (CVE-2015-7547: getaddrinfo()
stack-based buffer overflow (Bug 18665)), the second answer is always
allocated with malloc. This means that the assertion failure happens
with small responses as well because there is no buffer to reuse, as
soon as there is a name resolution failure which triggers a search for
an answer along the search path.

This commit addresses the issue by ensuring that *resplen2 is
initialized before the send_dg function returns.

This commit also addresses a bug where an invalid second reply is
incorrectly returned as a valid to the caller.

f327f5b... by Florian Weimer

tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860]

 [BZ# 19860]
 * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return
 zero if the compiler does not provide the AVX512F bit.

c898991... by Joseph Myers <email address hidden>

Fix x86_64 / x86 powl inaccuracy for integer exponents (bug 19848).

Bug 19848 reports cases where powl on x86 / x86_64 has error
accumulation, for small integer exponents, larger than permitted by
glibc's accuracy goals, at least in some rounding modes. This patch
further restricts the exponent range for which the
small-integer-exponent logic is used to limit the possible error
accumulation.

Tested for x86_64 and x86 and ulps updated accordingly.

 [BZ #19848]
 * sysdeps/i386/fpu/e_powl.S (p3): Rename to p2 and change value
 from 8 to 4.
 (__ieee754_powl): Compare integer exponent against 4 not 8.
 * sysdeps/x86_64/fpu/e_powl.S (p3): Rename to p2 and change value
 from 8 to 4.
 (__ieee754_powl): Compare integer exponent against 4 not 8.
 * math/auto-libm-test-in: Add more tests of pow.
 * math/auto-libm-test-out: Regenerated.
 * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Update.
 * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.

7e1ff08... by Aurelien Jarno <email address hidden>

Assume __NR_utimensat is always defined

With the 2.6.32 minimum kernel on x86 and 3.2 on other architectures,
__NR_utimensat is always defined.

Changelog:
 * sysdeps/unix/sysv/linux/futimens.c (futimens) [__NR_utimensat]:
 Make code unconditional.
 [!__NR_utimensat]: Remove conditional code.
 * sysdeps/unix/sysv/linux/lutimes.c (lutimes) [__NR_utimensat]:
 Make code unconditional.
 [!__NR_utimensat]: Remove conditional code.
 * sysdeps/unix/sysv/linux/utimensat.c (utimensat) [__NR_utimensat]:
 Make code unconditional.
 [!__NR_utimensat]: Remove conditional code.

16d94f6... by Aurelien Jarno <email address hidden>

Assume __NR_openat is always defined

With the 2.6.32 minimum kernel on x86 and 3.2 on other architectures,
__NR_openat is always defined.

Changelog:
 * sysdeps/unix/sysv/linux/dl-openat64.c (openat64) [__NR_openat]:
 Make code unconditional.

7a25d6a... by Nick Alcock <email address hidden>

x86, pthread_cond_*wait: Do not depend on %eax not being clobbered

The x86-specific versions of both pthread_cond_wait and
pthread_cond_timedwait have (in their fall-back-to-futex-wait slow
paths) calls to __pthread_mutex_cond_lock_adjust followed by
__pthread_mutex_unlock_usercnt, which load the parameters before the
first call but then assume that the first parameter, in %eax, will
survive unaffected. This happens to have been true before now, but %eax
is a call-clobbered register, and this assumption is not safe: it could
change at any time, at GCC's whim, and indeed the stack-protector canary
checking code clobbers %eax while checking that the canary is
uncorrupted.

So reload %eax before calling __pthread_mutex_unlock_usercnt. (Do this
unconditionally, even when stack-protection is not in use, because it's
the right thing to do, it's a slow path, and anything else is dicing
with death.)

 * sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S: Reload
 call-clobbered %eax on retry path.
 * sysdeps/unix/sysv/linux/i386/pthread_cond_wait.S: Likewise.

3c9a4cd... by "H.J. Lu" <email address hidden>

Don't set %rcx twice before "rep movsb"

 * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY):
 Don't set %rcx twice before "rep movsb".

f781a9e... by "H.J. Lu" <email address hidden>

Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors

Since only Intel processors with AVX2 have fast unaligned load, we
should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors.

Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces
and call get_common_indeces for other processors.

Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading
GLRO(dl_x86_cpu_features) in cpu-features.c.

 [BZ #19583]
 * sysdeps/x86/cpu-features.c (get_common_indeces): Remove
 inline. Check family before setting family, model and
 extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable
 bits here.
 (init_cpu_features): Replace HAS_CPU_FEATURE and
 HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and
 CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load
 for Intel processors with usable AVX2. Call get_common_indeces
 for other processors with family == NULL.
 * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro.
 (CPU_FEATURES_ARCH_P): Likewise.
 (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P.
 (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.