We add a new C.UTF-8 locale. This locale is not builtin to glibc, but
is provided as a distinct locale. The locale provides full support
for UTF-8 and this includes full code point sorting via strcmp-based
collation.
The collation uses a new keyword 'strcmp_collation' which drops all
collation rules and generates an empty zero rules collation to enable
strcmp usage in collation. This ensures that we get full code point
sorting for C.UTF-8 with a minimal 92 bytes of overhead (LC_COLLATE
structure information).
The new locale is added to SUPPORTED. Minimal test data for specific
code points (minus those not supported by collate-test) is provided
in C.UTF-8.in, and this verifies code point sorting is working
reasonably across the range. The locale was tested manually with the
full set of code points without failure.
The locale is harmonized with locales already shipping in Gentoo,
Debian, Ubuntu, Fedora, CentOS Stream, and RHEL. A new tst-iconv9 test
is added which verifies the C.UTF-8 locale is generally usable.
Testing for fnmatch, regexec, and recomp is provided by extending
bug-regex1, bugregex19, bug-regex4, bug-regex6, transbug, tst-fnmatch,
tst-regcomp-truncated, and tst-regex to use C.UTF-8.
Support a new directive 'strcmp_collation' in the LC_COLLATE
section of a locale source file. This new directive causes all
collation rules to be dropped and instead 'strcmp' is used for
collation of the input character set. This is required to allow
for a C.UTF-8 that contains zero collation rules (minimal size)
and sorts using code point sorting.
Add support for locales with zero collation rules.
While there is code to handle 'nrules == 0' in various locations
within posix/fnmatch_loop.c, posix/regcomp.c and posix/regexec.c,
these conditionals do not work. The only collation with zero
rules in effect today is the builtin C/POSIX locale which is
built by hand, and despite have zero rules it has a collseqmb
and collseqwc tables stored in the locale data. These tables are
simple identity tables which are not actually required and could
be removed at a later date after this change. The changes are in
order to prepare for C.UTF-8 which has zero rules and has no
collation sequence tables (multibyte or widechar).
commit 3ec5d83d2a237d39e7fd6ef7a0bc8ac4c171a4a5
Author: H.J. Lu <email address hidden>
Date: Sat Jan 25 14:19:40 2020 -0800
x86-64: Avoid rep movsb with short distance [BZ #27130]
introduced some regressions on Intel processors without Fast Short REP
MOV (FSRM). Add Avoid_Short_Distance_REP_MOVSB to avoid rep movsb with
short distance only on Intel processors with FSRM. bench-memmove-large
on Skylake server shows that cycles of __memmove_evex_unaligned_erms
improves for the following data size:
This configuration exercises various --disable-* configure options.
It is expected to catch -Werror failures that only affect these
configurations.
70d08ba...
by
Siddhesh Poyarekar <email address hidden>
tests: use xmalloc to allocate implementation array
The benchmark and tests must fail in case of allocation failure in the
implementation array. Also annotate the x* allocators in support.h so
that the compiler has more information about them.
b8e8bb3...
by
Siddhesh Poyarekar <email address hidden>
xmalloc: Fix warnings with gcc analyzer
Tell the compiler that xmalloc family of allocators always return
non-NULL. xrealloc in locale/programs also always returns non-NULL,
but that conflicts with default realloc behaviour and that of xrealloc
in libsupport, so keep it as is for now and resolve the differences
later.