On August 18, 2004 at 2:57PM +0900,
miles (at lsi.nec.co.jp) wrote:
> Package: gawk
> Version: 1:3.1.4-1
> Executing the following line in a shell:
>
> echo -e '--- orig/lisp/ChangeLog\n+++ mod/lisp/ChangeLog' | LANG=ja_JP gawk '/[Cc]hangeLog/ { print }'
>
> yields not the expected two lines of output, but instead only the first one:
>
> --- orig/lisp/ChangeLog
>
>
> If the LANG-setting portion is changed to use C, then it works as
> expected (others such as "de" seem to work too):
>
> echo -e '--- orig/lisp/ChangeLog\n+++ mod/lisp/ChangeLog' | LANG=C gawk '/[Cc]hangeLog/ { print }'
>
> yields:
>
> --- orig/lisp/ChangeLog
> +++ mod/lisp/ChangeLog
>
>
> I'm not sure if the actual encoding has any impact -- ja_JP, ja_JP.utf8,
> and ja_JP.eucjp all exhibit the same problem.
ko_KR, zh_CN, and zh_TW exhibit the same problem. On CJK
locales, this bug causes gawk scripts unusable.
Downgrading gawk to version 1:3.1.3-3 prevents the problem.
Message-Id: <20041011. 232915. 193711624. tats05% <email address hidden>>
Date: Mon, 11 Oct 2004 23:29:15 +0900 (JST)
From: Tatsuya Kinoshita <email address hidden>
To: <email address hidden>, <email address hidden>
Cc: <email address hidden>, <email address hidden>
Subject: Re: gawk: Odd regexp matching problem if LANG=ja_JP
----Security_ Multipart( Mon_Oct_ 11_23_29_ 15_2004_ 186)-- Transfer- Encoding: 7bit
Content-Type: Text/Plain; charset=us-ascii
Content-
On August 18, 2004 at 2:57PM +0900,
miles (at lsi.nec.co.jp) wrote:
> Package: gawk
> Version: 1:3.1.4-1
> Executing the following line in a shell: ChangeLog\ n+++ mod/lisp/ChangeLog' | LANG=ja_JP gawk '/[Cc]hangeLog/ { print }' ChangeLog\ n+++ mod/lisp/ChangeLog' | LANG=C gawk '/[Cc]hangeLog/ { print }'
>
> echo -e '--- orig/lisp/
>
> yields not the expected two lines of output, but instead only the first one:
>
> --- orig/lisp/ChangeLog
>
>
> If the LANG-setting portion is changed to use C, then it works as
> expected (others such as "de" seem to work too):
>
> echo -e '--- orig/lisp/
>
> yields:
>
> --- orig/lisp/ChangeLog
> +++ mod/lisp/ChangeLog
>
>
> I'm not sure if the actual encoding has any impact -- ja_JP, ja_JP.utf8,
> and ja_JP.eucjp all exhibit the same problem.
ko_KR, zh_CN, and zh_TW exhibit the same problem. On CJK
locales, this bug causes gawk scripts unusable.
Downgrading gawk to version 1:3.1.3-3 prevents the problem.
Could anyone fix this bug?
Thanks,
--
Tatsuya Kinoshita
----Security_ Multipart( Mon_Oct_ 11_23_29_ 15_2004_ 186)-- pgp-signature Transfer- Encoding: 7bit
Content-Type: application/
Content-
-----BEGIN PGP SIGNATURE-----
gV4LPvpMUpgRAoG VAJ92rG0y8+ 0H5GzQOnKVYa9cH V+yPgCguchQ eu6BVk3dqMf5s=
Version: GnuPG v1.2.5 (GNU/Linux)
iD8DBQBBapi+
xEDvdADGk+
=iLC+
-----END PGP SIGNATURE-----
----Security_ Multipart( Mon_Oct_ 11_23_29_ 15_2004_ 186)--- -