Merge lp:~nobuto/ubuntu/wily/unzip/fallback-encoding into lp:ubuntu/wily/unzip
Proposed by
Nobuto Murata
Status: | Work in progress |
---|---|
Proposed branch: | lp:~nobuto/ubuntu/wily/unzip/fallback-encoding |
Merge into: | lp:ubuntu/wily/unzip |
Diff against target: |
147 lines (+106/-0) 6 files modified
debian/changelog (+9/-0) debian/control (+1/-0) debian/tests/control (+2/-0) debian/tests/fallback-encoding (+57/-0) debian/unzip-fallback-charset.sh (+36/-0) debian/unzip.install (+1/-0) |
To merge this branch: | bzr merge lp:~nobuto/ubuntu/wily/unzip/fallback-encoding |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Steve Langasek | Needs Fixing | ||
Aron Xu | Pending | ||
Review via email: mp+268850@code.launchpad.net |
Description of the change
* add fallback charsets for Windows archives in CJKV+th locale,
inspired by Ubuntu Kylin way, LP: #1422290
* add autopkgtest to make sure all of 3 types of zip files can be
extracted properly
To post a comment you must log in.
Unmerged revisions
- 31. By Nobuto Murata
-
* add fallback charsets for Windows archives in CJKV+th locale,
inspired by Ubuntu Kylin way, LP: #1422290
* add autopkgtest to make sure all of 3 types of zip files can be
extracted properly
This leaks a variable with a common name into the user's shell environment and is not appropriate. It is also against Debian policy, which specifies that packages should not rely on environment variables for correct operation. I can see three possible ways that this could be done instead:
- use a 'local' variable in the profile snippet to avoid leaking the $charset variable into the shell's variables;
- provide these as wrapper scripts instead around the real zip,unzip commands which set the -O option if not set; or
- (preferred) patch this default behavior into the upstream code.
Note that the -O option to unzip is already a Debian-local patch (debian/ patches/ 20-unzip60- alt-iconv- utf8). Support for these defaults should be merged into that patch.
In particular, it appears that dos_charset_map in unix/unix.c can be extended with the following mappings for equivalent results:
{ "EUC-JP", "CP932" },
{ "EUC-KR", "CP949" },
{ "TIS-620", "CP874" },
{ "GB2312", "CP936" },
{ "BIG5", "CP950" },
This covers all of the languages included in your patch *except* for Vietnamese; Vietnamese uses a UTF-8 locale by default, so it's impossible to infer the correct codepage of the DOS-encoded zip file from the Unix charset for that case. You could handle the (non-default) vi_VN.TCVN locale by adding:
{ "TCVN5712-1", "CP1258" },
All Unix charset values taken from /usr/share/ i18n/SUPPORTED.
Personally I would find it somewhat unsatisfactory that the codepage is only correctly deduced when using a non-UTF8 locale for the *other* languages, but this is a limitation of your existing patch and the existing Debian code and UTF-8 is not commonly used as the locale for these other languages, so changing as above is probably sufficient.