glibc:mfabian/collation-update-2.27

Last commit made on 2018-03-02
Get this branch:
git clone -b mfabian/collation-update-2.27 https://git.launchpad.net/glibc

Branch merges

Branch information

Name:
mfabian/collation-update-2.27
Repository:
lp:glibc

Recent commits

9589174... by Mike FABIAN <email address hidden>

Remove the lines from cmn_TW.UTF-8.in which cannot work at the moment.

See this bug https://sourceware.org/bugzilla/show_bug.cgi?id=22898

These lines don’t yet work because of a glibc bug, not because of
problems in the locale data. No matter what sorting rules one uses,
these characters cannot be sorted at all at the moment.

As soon as that bug is fixed, these lines should be added back to the
test file.

 * localedata/cmn_TW.UTF-8.in: Remove the lines which cannot
        be sorted correctly at the moment because of a bug.

e289a7d... by Mike FABIAN <email address hidden>

Adapt collation in several locales to the new iso14651_t1_common file

[BZ #22550] - es_ES locale (and other es_* locales): collation should
treat ñ as a primary different character, sync the collation
for Spanish with CLDR
[BZ #21547] - Tibetan script collation broken (Dzongkha and Tibetan)

 * localedata/Makefile: Add new test files.
 * localedata/lv_LV.UTF-8.in: Adapt test file to new collation order.
 * localedata/sv_SE.ISO-8859-1.in: Adapt test file to new collation order.
 * localedata/uk_UA.UTF-8.in: Adapt test file to new collation order.
 * localedata/am_ET.UTF-8.in: New test file.
 * localedata/az_AZ.UTF-8.in: Likewise.
 * localedata/be_BY.UTF-8.in: Likewise.
 * localedata/ber_DZ.UTF-8.in: Likewise.
 * localedata/ber_MA.UTF-8.in: Likewise.
 * localedata/bg_BG.UTF-8.in: Likewise.
 * localedata/br_FR.UTF-8.in: Likewise.
 * localedata/cmn_TW.UTF-8.in: Likewise.
 * localedata/crh_UA.UTF-8.in: Likewise.
 * localedata/csb_PL.UTF-8.in: Likewise.
 * localedata/cv_RU.UTF-8.in: Likewise.
 * localedata/cy_GB.UTF-8.in: Likewise.
 * localedata/dz_BT.UTF-8.in: Likewise.
 * localedata/eo.UTF-8.in: Likewise.
 * localedata/es_ES.UTF-8.in: Likewise.
 * localedata/fa_IR.UTF-8.in: Likewise.
 * localedata/fi_FI.UTF-8.in: Likewise.
 * localedata/fil_PH.UTF-8.in: Likewise.
 * localedata/fur_IT.UTF-8.in: Likewise.
 * <email address hidden>: Likewise.
 * localedata/ha_NG.UTF-8.in: Likewise.
 * localedata/ig_NG.UTF-8.in: Likewise.
 * localedata/ik_CA.UTF-8.in: Likewise.
 * localedata/kk_KZ.UTF-8.in: Likewise.
 * localedata/ku_TR.UTF-8.in: Likewise.
 * localedata/ky_KG.UTF-8.in: Likewise.
 * localedata/ln_CD.UTF-8.in: Likewise.
 * localedata/mi_NZ.UTF-8.in: Likewise.
 * localedata/ml_IN.UTF-8.in: Likewise.
 * localedata/mn_MN.UTF-8.in: Likewise.
 * localedata/mr_IN.UTF-8.in: Likewise.
 * localedata/mt_MT.UTF-8.in: Likewise.
 * localedata/nb_NO.UTF-8.in: Likewise.
 * localedata/om_KE.UTF-8.in: Likewise.
 * localedata/os_RU.UTF-8.in: Likewise.
 * localedata/ps_AF.UTF-8.in: Likewise.
 * localedata/ro_RO.UTF-8.in: Likewise.
 * localedata/ru_RU.UTF-8.in: Likewise.
 * localedata/sc_IT.UTF-8.in: Likewise.
 * localedata/se_NO.UTF-8.in: Likewise.
 * localedata/sq_AL.UTF-8.in: Likewise.
 * localedata/sv_SE.UTF-8.in: Likewise.
 * localedata/szl_PL.UTF-8.in: Likewise.
 * localedata/tg_TJ.UTF-8.in: Likewise.
 * localedata/tk_TM.UTF-8.in: Likewise.
 * localedata/tt_RU.UTF-8.in: Likewise.
 * <email address hidden>: Likewise.
 * localedata/ug_CN.UTF-8.in: Likewise.
 * localedata/uz_UZ.UTF-8.in: Likewise.
 * localedata/vi_VN.UTF-8.in: Likewise.
 * localedata/yi_US.UTF-8.in: Likewise.
 * localedata/yo_NG.UTF-8.in: Likewise.
 * localedata/zh_CN.UTF-8.in: Likewise.
 * localedata/locales/am_ET: Adapt collation rules to new iso14651_t1_common
        file and fix bugs in the collation.
 * localedata/locales/az_AZ: Likewise.
 * localedata/locales/be_BY: Likewise.
 * localedata/locales/ber_DZ: Likewise.
 * localedata/locales/ber_MA: Likewise.
 * localedata/locales/bg_BG: Likewise.
 * localedata/locales/br_FR: Likewise.
 * localedata/locales/br_FR@euro: Likewise.
 * localedata/locales/ca_ES: Likewise.
 * localedata/locales/cns11643_stroke: Likewise.
 * localedata/locales/crh_UA: Likewise.
 * localedata/locales/cs_CZ: Likewise.
 * localedata/locales/csb_PL: Likewise.
 * localedata/locales/cv_RU: Likewise.
 * localedata/locales/cy_GB: Likewise.
 * localedata/locales/da_DK: Likewise.
 * localedata/locales/dz_BT: Likewise.
 * localedata/locales/en_CA: Likewise.
 * localedata/locales/eo: Likewise.
 * localedata/locales/es_CU: Likewise.
 * localedata/locales/es_EC: Likewise.
 * localedata/locales/es_ES: Likewise.
 * localedata/locales/es_US: Likewise.
 * localedata/locales/et_EE: Likewise.
 * localedata/locales/fa_IR: Likewise.
 * localedata/locales/fi_FI: Likewise.
 * localedata/locales/fil_PH: Likewise.
 * localedata/locales/fur_IT: Likewise.
 * localedata/locales/gez_ER@abegede: Likewise.
 * localedata/locales/ha_NG: Likewise.
 * localedata/locales/hr_HR: Likewise.
 * localedata/locales/hsb_DE: Likewise.
 * localedata/locales/hu_HU: Likewise.
 * localedata/locales/ig_NG: Likewise.
 * localedata/locales/ik_CA: Likewise.
 * localedata/locales/is_IS: Likewise.
 * localedata/locales/iso14651_t1_pinyin: Likewise.
 * localedata/locales/kk_KZ: Likewise.
 * localedata/locales/ku_TR: Likewise.
 * localedata/locales/ky_KG: Likewise.
 * localedata/locales/ln_CD: Likewise.
 * localedata/locales/lt_LT: Likewise.
 * localedata/locales/lv_LV: Likewise.
 * localedata/locales/mi_NZ: Likewise.
 * localedata/locales/ml_IN: Likewise.
 * localedata/locales/mn_MN: Likewise.
 * localedata/locales/mr_IN: Likewise.
 * localedata/locales/mt_MT: Likewise.
 * localedata/locales/nb_NO: Likewise.
 * localedata/locales/om_KE: Likewise.
 * localedata/locales/os_RU: Likewise.
 * localedata/locales/pl_PL: Likewise.
 * localedata/locales/ps_AF: Likewise.
 * localedata/locales/ro_RO: Likewise.
 * localedata/locales/ru_RU: Likewise.
 * localedata/locales/ru_UA: Likewise.
 * localedata/locales/sc_IT: Likewise.
 * localedata/locales/se_NO: Likewise.
 * localedata/locales/si_LK: Likewise.
 * localedata/locales/sq_AL: Likewise.
 * localedata/locales/sv_FI: Likewise.
 * localedata/locales/sv_FI@euro: Likewise.
 * localedata/locales/sv_SE: Likewise.
 * localedata/locales/szl_PL: Likewise.
 * localedata/locales/tg_TJ: Likewise.
 * localedata/locales/ti_ER: Likewise.
 * localedata/locales/tk_TM: Likewise.
 * localedata/locales/tl_PH: Likewise.
 * localedata/locales/tr_TR: Likewise.
 * localedata/locales/tt_RU: Likewise.
 * localedata/locales/tt_RU@iqtelif: Likewise.
 * localedata/locales/ug_CN: Likewise.
 * localedata/locales/uk_UA: Likewise.
 * localedata/locales/uz_UZ: Likewise.
 * localedata/locales/uz_UZ@cyrillic: Likewise.
 * localedata/locales/vi_VN: Likewise.
 * localedata/locales/yi_US: Likewise.
 * localedata/locales/yo_NG: Likewise.

2425963... by Mike FABIAN <email address hidden>

Improve gen-locales.mk and gen-locale.sh to make test files with @ options work

With out this, adding collation test files like <email address hidden>
does not work for locales which contain @ modifiers.

 * gen-locales.mk: Make test files which contain @ modifiers in their
        name work.
 * localedata/gen-locale.sh: Likewise.

cc5351f... by Mike FABIAN <email address hidden>

Fix test cases tst-fnmatch and tst-regexloc for the new iso14651_t1_common file.

See:

http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html

> A range expression represents the set of collating elements that fall
> between two elements in the current collation sequence,
> inclusively. It is expressed as the starting point and the ending
> point separated by a hyphen (-).
>
> Range expressions must not be used in portable applications because
> their behaviour is dependent on the collating sequence. Ranges will be
> treated according to the current collating sequence, and include such
> characters that fall within the range based on that collating
> sequence, regardless of character values. This, however, means that
> the interpretation will differ depending on collating sequence. If,
> for instance, one collating sequence defines ä as a variant of a,
> while another defines it as a letter following z, then the expression
> [ä-z] is valid in the first language and invalid in the second.

Therefore, using [a-z] does not make much sense except in the C/POSIX locale.
The new iso14651_t1_common lists upper case and lower case Latin characters
in a different order than the old one which causes surprising results
for example in the de_DE locale: [a-z] now includes A because A comes
after a in iso14651_t1_common but does not include Z because that comes
after z in iso14651_t1_common.

 * posix/tst-fnmatch.input: Fix results for range expressions
        for non C locales.
 * posix/tst-regexloc.c: Do not use a range expression for
        de_DE.ISO-8859-1 locale.

ffa8106... by Mike FABIAN <email address hidden>

Fix posix/bug-regex5.c test case, adapt to iso14651_t1_common upate

This test case tests how many collating elements are defined in
da_DK.ISO-8859-1 locale. The da_DK locale source defines 4:

collating-element <A-A> from "<U0041><U0041>"
collating-element <A-a> from "<U0041><U0061>"
collating-element <a-A> from "<U0061><U0041>"
collating-element <a-a> from "<U0061><U0061>"

The new iso14651_t1_common file defines more collating elements, two
of them are in the ISO-8859-1 range:

collating-element <U004C_00B7> from "<U004C><U00B7>" % decomposition of LATIN CAPITAL LETTER L WITH MIDDLE DOT
collating-element <U006C_00B7> from "<U006C><U00B7>" % decomposition of LATIN SMALL LETTER L WITH MIDDLE DOT

So the total count is now 6 instead of 4.

 * posix/bug-regex5.c: Fix test case because with the new
        iso14651_t1_common file, the da_DK locale now has 6 collating elements
        in the ISO-8859-1 range instead of 4 with the old iso14651_t1_common
        file.

61e613f... by Mike FABIAN <email address hidden>

Collation order of @-. and space has changed in new iso14651_t1_common file, adapt test files

 * localedata/da_DK.ISO-8859-1.in: In the new iso14651_t1_common file
        downloaded from ISO, the collation order of @-. and space has changed.
        Therefore, this test file needed to be adapted.
 * localedata/fr_CA.UTF-8.in: Likewise.
 * localedata/fr_FR.UTF-8.in: Likewise.
 * localedata/uk_UA.UTF-8.in: Likewise.

059454d... by Mike FABIAN <email address hidden>

Collation order of ȥ has changed in new iso14651_t1_common file, adapt test files

 * localedata/cs_CZ.UTF-8.in: adapt this test file to the collation
        order of ȥ in the new iso14651_t1_common file.
 * localedata/pl_PL.UTF-8.in: Likewise.

1f4df3b... by Mike FABIAN <email address hidden>

Add sections for various scripts to the iso14651_t1_common file

 * localedata/locales/iso14651_t1_common: Add sections for various
 scripts to the iso14651_t1_common file.

a93fecd... by Mike FABIAN <email address hidden>

iso14651_t1_common: make the fourth level the codepoint for characters which are ignorable on all 4 levels

Entries for characters which have “IGNORE” on all 4 levels like:

 <U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)

are changed into:

 <U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)

i.e. putting the code point of the character into the fourth level
instead of “IGNORE”. Without that change, all such characters
would compare equal which would make a wcscoll test case fail.
It is better to have a clearly defined sort order even for characters
like this so it is good to use the code point as a tie-break.

 * localedata/locales/iso14651_t1_common: Use the code point of a
        character in the fourth collation level instead of IGNORE for all
        entries which have IGNORE on all 4 levels.

3e7089b... by Mike FABIAN <email address hidden>

Add convenience symbols like <AFTER-A>, <BEFORE-A> to iso14651_t1_common

 * localedata/locales/iso14651_t1_common: Add some convenient collation
 symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using
 rules similar to those in CLDR.