Merge lp:~nobuto/ubuntu/wily/unzip/fallback-encoding into lp:ubuntu/wily/unzip

Proposed by Nobuto Murata
Status: Work in progress
Proposed branch: lp:~nobuto/ubuntu/wily/unzip/fallback-encoding
Merge into: lp:ubuntu/wily/unzip
Diff against target: 147 lines (+106/-0)
6 files modified
debian/changelog (+9/-0)
debian/control (+1/-0)
debian/tests/control (+2/-0)
debian/tests/fallback-encoding (+57/-0)
debian/unzip-fallback-charset.sh (+36/-0)
debian/unzip.install (+1/-0)
To merge this branch: bzr merge lp:~nobuto/ubuntu/wily/unzip/fallback-encoding
Reviewer Review Type Date Requested Status
Steve Langasek Needs Fixing
Aron Xu Pending
Review via email: mp+268850@code.launchpad.net

Description of the change

* add fallback charsets for Windows archives in CJKV+th locale,
  inspired by Ubuntu Kylin way, LP: #1422290
* add autopkgtest to make sure all of 3 types of zip files can be
  extracted properly

To post a comment you must log in.
Revision history for this message
Steve Langasek (vorlon) wrote :

This leaks a variable with a common name into the user's shell environment and is not appropriate. It is also against Debian policy, which specifies that packages should not rely on environment variables for correct operation. I can see three possible ways that this could be done instead:

 - use a 'local' variable in the profile snippet to avoid leaking the $charset variable into the shell's variables;
 - provide these as wrapper scripts instead around the real zip,unzip commands which set the -O option if not set; or
 - (preferred) patch this default behavior into the upstream code.

Note that the -O option to unzip is already a Debian-local patch (debian/patches/20-unzip60-alt-iconv-utf8). Support for these defaults should be merged into that patch.

In particular, it appears that dos_charset_map in unix/unix.c can be extended with the following mappings for equivalent results:

{ "EUC-JP", "CP932" },
{ "EUC-KR", "CP949" },
{ "TIS-620", "CP874" },
{ "GB2312", "CP936" },
{ "BIG5", "CP950" },

This covers all of the languages included in your patch *except* for Vietnamese; Vietnamese uses a UTF-8 locale by default, so it's impossible to infer the correct codepage of the DOS-encoded zip file from the Unix charset for that case. You could handle the (non-default) vi_VN.TCVN locale by adding:

{ "TCVN5712-1", "CP1258" },

All Unix charset values taken from /usr/share/i18n/SUPPORTED.

Personally I would find it somewhat unsatisfactory that the codepage is only correctly deduced when using a non-UTF8 locale for the *other* languages, but this is a limitation of your existing patch and the existing Debian code and UTF-8 is not commonly used as the locale for these other languages, so changing as above is probably sufficient.

review: Needs Fixing

Unmerged revisions

31. By Nobuto Murata

* add fallback charsets for Windows archives in CJKV+th locale,
  inspired by Ubuntu Kylin way, LP: #1422290
* add autopkgtest to make sure all of 3 types of zip files can be
  extracted properly

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'debian/changelog'
2--- debian/changelog 2015-05-22 12:31:51 +0000
3+++ debian/changelog 2015-08-23 14:28:47 +0000
4@@ -1,3 +1,12 @@
5+unzip (6.0-17ubuntu2) UNRELEASED; urgency=medium
6+
7+ * add fallback charsets for Windows archives in CJKV+th locale,
8+ inspired by Ubuntu Kylin way, LP: #1422290
9+ * add autopkgtest to make sure all of 3 types of zip files can be
10+ extracted properly
11+
12+ -- Nobuto Murata <nobuto@ubuntu.com> Sun, 23 Aug 2015 22:38:52 +0900
13+
14 unzip (6.0-17ubuntu1) wily; urgency=medium
15
16 * Resynchronise with Debian. Remaining changes:
17
18=== modified file 'debian/control'
19--- debian/control 2015-05-22 12:31:51 +0000
20+++ debian/control 2015-08-23 14:28:47 +0000
21@@ -6,6 +6,7 @@
22 Standards-Version: 3.9.6
23 Build-Depends: debhelper (>= 9), libbz2-dev
24 Homepage: http://www.info-zip.org/UnZip.html
25+XS-Testsuite: autopkgtest
26
27 Package: unzip
28 Architecture: any
29
30=== added directory 'debian/tests'
31=== added file 'debian/tests/control'
32--- debian/tests/control 1970-01-01 00:00:00 +0000
33+++ debian/tests/control 2015-08-23 14:28:47 +0000
34@@ -0,0 +1,2 @@
35+Tests: fallback-encoding
36+Depends: unzip, language-pack-ja
37
38=== added file 'debian/tests/fallback-encoding'
39--- debian/tests/fallback-encoding 1970-01-01 00:00:00 +0000
40+++ debian/tests/fallback-encoding 2015-08-23 14:28:47 +0000
41@@ -0,0 +1,57 @@
42+#!/bin/sh
43+# autopkgtest check: Extract 3 types of zip file and verify those can be
44+# extracted with expected path names.
45+# Author: Nobuto Murata <nobuto@ubuntu.com>
46+
47+set -e
48+
49+WORKDIR=$(mktemp -d)
50+trap "rm -rf $WORKDIR" 0 INT QUIT ABRT PIPE TERM
51+cd $WORKDIR
52+
53+#### zip file type 1: unx and UTF-8 ####
54+# Created by zip command on Ubuntu
55+mkdir unx-utf8
56+cat <<EOF | base64 -d > unx-utf8/test.zip
57+UEsDBAoAAAgAAEcGvUYAAAAAAAAAAAAAAAAWABwA44OG44K544OI44OV44Kp44Or44OAL1VUCQAD
58+NjlnVTc5Z1V1eAsAAQToAwAABOgDAABQSwMECgAACAAAFXy8RgAAAAAAAAAAAAAAAC8AHADjg4bj
59+grnjg4jjg5Xjgqnjg6vjg4Av44OG44K544OI44OV44Kh44Kk44OrLnR4dFVUCQADirZmVbs7Z1V1
60+eAsAAQToAwAABOgDAABQSwECHgMKAAAIAABHBr1GAAAAAAAAAAAAAAAAFgAYAAAAAAAAABAA/UEA
61+AAAA44OG44K544OI44OV44Kp44Or44OAL1VUBQADNjlnVXV4CwABBOgDAAAE6AMAAFBLAQIeAwoA
62+AAgAABV8vEYAAAAAAAAAAAAAAAAvABgAAAAAAAAAAAC0gVAAAADjg4bjgrnjg4jjg5Xjgqnjg6vj
63+g4Av44OG44K544OI44OV44Kh44Kk44OrLnR4dFVUBQADirZmVXV4CwABBOgDAAAE6AMAAFBLBQYA
64+AAAAAgACANEAAAC5AAAAAAA=
65+EOF
66+
67+#### zip file type 2: fat and UTF-8 ####
68+# Created by Google Drive folder export
69+mkdir fat-utf8
70+cat <<EOF | base64 -d > fat-utf8/test.zip
71+UEsDBBQACAgIAC1IvEYAAAAAAAAAAAAAAAAvADgA44OG44K544OI44OV44Kp44Or44OAL+ODhuOC
72+ueODiOODleOCoeOCpOODqy50eHR1cDQAAQvM2DTjg4bjgrnjg4jjg5Xjgqnjg6vjg4Av44OG44K5
73+44OI44OV44Kh44Kk44OrLnR4dAMAUEsHCAAAAAACAAAAAAAAAFBLAQIUABQACAgIAC1IvEYAAAAA
74+AgAAAAAAAAAvADgAAAAAAAAAAAAAAAAAAADjg4bjgrnjg4jjg5Xjgqnjg6vjg4Av44OG44K544OI
75+44OV44Kh44Kk44OrLnR4dHVwNAABC8zYNOODhuOCueODiOODleOCqeODq+ODgC/jg4bjgrnjg4jj
76+g5XjgqHjgqTjg6sudHh0UEsFBgAAAAABAAEAlQAAAJcAAAAAAA==
77+EOF
78+
79+#### zip file type 3: fat and CP932 ####
80+# Created by embeded tool on Windows Server 2012 R2 with Japanese locale
81+mkdir fat-cp932
82+cat <<EOF | base64 -d > fat-cp932/test.zip
83+UEsDBBQAAAAAABV8vEYAAAAAAAAAAAAAAAAhAAAAg2WDWINng3SDSIOLg18vg2WDWINng3SDQIND
84+g4sudHh0UEsBAhQAFAAAAAAAFXy8RgAAAAAAAAAAAAAAACEAAAAAAAAAAAAgAAAAAAAAAINlg1iD
85+Z4N0g0iDi4NfL4Nlg1iDZ4N0g0CDQ4OLLnR4dFBLBQYAAAAAAQABAE8AAAA/AAAAAAA=
86+EOF
87+
88+
89+# spawn login shell to read fallback charset per locale from /etc/profile.d/
90+env LC_CTYPE='ja_JP.UTF-8' sh -e -l -c '
91+ for type in unx-utf8 fat-utf8 fat-cp932; do
92+ # all zip files contain the path "テストフォルダ/テストファイル.txt"
93+ zipinfo $type/test.zip
94+ unzip $type/test.zip -d $type
95+ stat $type/テストフォルダ/テストファイル.txt >/dev/null
96+ echo "extract $type: OK\n"
97+ done
98+'
99
100=== added file 'debian/unzip-fallback-charset.sh'
101--- debian/unzip-fallback-charset.sh 1970-01-01 00:00:00 +0000
102+++ debian/unzip-fallback-charset.sh 2015-08-23 14:28:47 +0000
103@@ -0,0 +1,36 @@
104+# LP: #1422290
105+# export fallback charset per locale in addition to UTF-8. Those fallbacks
106+# are needed to handle zip files created on Windows systems.
107+
108+locale=${LC_ALL:-$LC_CTYPE}
109+locale=${locale:-$LANG}
110+locale=${locale%%.*}
111+
112+case "$locale" in
113+ ja_JP)
114+ charset=CP932
115+ ;;
116+ ko_KR)
117+ charset=CP949
118+ ;;
119+ th_TH)
120+ charset=CP874
121+ ;;
122+ vi_VN)
123+ charset=CP1258
124+ ;;
125+ zh_CN)
126+ charset=CP936
127+ ;;
128+ zh_TW)
129+ charset=CP950
130+ ;;
131+ *)
132+ charset=
133+ ;;
134+esac
135+
136+if [ -n "$charset" ]; then
137+ export UNZIP="-O $charset"
138+ export ZIPINFO="-O $charset"
139+fi
140
141=== modified file 'debian/unzip.install'
142--- debian/unzip.install 2015-05-17 12:41:52 +0000
143+++ debian/unzip.install 2015-08-23 14:28:47 +0000
144@@ -1,2 +1,3 @@
145+debian/unzip-fallback-charset.sh etc/profile.d
146 usr/bin/*
147 usr/man/* usr/share/man

Subscribers

People subscribed via source and target branches

to all changes: