Comment 6 for bug 9026

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-Id: <20041011.232915.193711624.tats05%<email address hidden>>
Date: Mon, 11 Oct 2004 23:29:15 +0900 (JST)
From: Tatsuya Kinoshita <email address hidden>
To: <email address hidden>, <email address hidden>
Cc: <email address hidden>, <email address hidden>
Subject: Re: gawk: Odd regexp matching problem if LANG=ja_JP

----Security_Multipart(Mon_Oct_11_23_29_15_2004_186)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

On August 18, 2004 at 2:57PM +0900,
miles (at lsi.nec.co.jp) wrote:

> Package: gawk
> Version: 1:3.1.4-1

> Executing the following line in a shell:
>
> echo -e '--- orig/lisp/ChangeLog\n+++ mod/lisp/ChangeLog' | LANG=ja_JP gawk '/[Cc]hangeLog/ { print }'
>
> yields not the expected two lines of output, but instead only the first one:
>
> --- orig/lisp/ChangeLog
>
>
> If the LANG-setting portion is changed to use C, then it works as
> expected (others such as "de" seem to work too):
>
> echo -e '--- orig/lisp/ChangeLog\n+++ mod/lisp/ChangeLog' | LANG=C gawk '/[Cc]hangeLog/ { print }'
>
> yields:
>
> --- orig/lisp/ChangeLog
> +++ mod/lisp/ChangeLog
>
>
> I'm not sure if the actual encoding has any impact -- ja_JP, ja_JP.utf8,
> and ja_JP.eucjp all exhibit the same problem.

ko_KR, zh_CN, and zh_TW exhibit the same problem. On CJK
locales, this bug causes gawk scripts unusable.

Downgrading gawk to version 1:3.1.3-3 prevents the problem.

Could anyone fix this bug?

Thanks,
--
Tatsuya Kinoshita

----Security_Multipart(Mon_Oct_11_23_29_15_2004_186)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQBBapi+gV4LPvpMUpgRAoGVAJ92rG0y8+0H5GzQOnKVYa9cHV+yPgCguchQ
xEDvdADGk+eu6BVk3dqMf5s=
=iLC+
-----END PGP SIGNATURE-----

----Security_Multipart(Mon_Oct_11_23_29_15_2004_186)----