Merge lp:~widelands-dev/widelands/japanese into lp:widelands

Proposed by GunChleoc
Status: Merged
Merged at revision: 7543
Proposed branch: lp:~widelands-dev/widelands/japanese
Merge into: lp:widelands
Diff against target: 253 lines (+191/-3)
3 files modified
src/graphic/text/bidi.cc (+176/-2)
src/graphic/text/bidi.h (+6/-0)
src/graphic/text/rt_render.cc (+9/-1)
To merge this branch: bzr merge lp:~widelands-dev/widelands/japanese
Reviewer Review Type Date Requested Status
TiborB Approve
Review via email: mp+273419@code.launchpad.net

This proposal supersedes a proposal from 2015-09-28.

Description of the change

Implemented line wrapping for Japanese.

Some characters block having a new line, so I group them into vector entries.

Testing can be done by removing the \n characters from win_conditions/ja.po and looking at the tooltips - rather than a crash or an endless line, we get properly wrapped lines.

To post a comment you must log in.
Revision history for this message
TiborB (tiborb95) wrote :

I have not tested it in japanesse, I think I dont have proper fonts installed probably anyway. So if there is any potential tester here, you can wait for him.

But the code looks good to me

review: Approve
Revision history for this message
GunChleoc (gunchleoc) wrote :

Actually, the fonts come with Widelands, so you can switch anytime if you remember where which button is - I don't read Japanese either *lol

I don't know if your Japanese localizers can compile, so I will merge this and ask them for testing later.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/graphic/text/bidi.cc'
2--- src/graphic/text/bidi.cc 2015-09-28 06:41:58 +0000
3+++ src/graphic/text/bidi.cc 2015-10-05 15:03:22 +0000
4@@ -22,7 +22,6 @@
5 #include <map>
6 #include <string>
7
8-#include <unicode/uchar.h>
9 #include <unicode/unistr.h>
10 #include <unicode/utypes.h>
11
12@@ -32,6 +31,139 @@
13 // TODO(GunChleoc): Have a look at the ICU API to see which helper functions can be gained from there.
14 // TODO(GunChleoc): Arabic: Turn this into a proper class
15
16+// http://www.w3.org/TR/jlreq/#characters_not_starting_a_line
17+const std::set<UChar> kCannottStartLineJapanese = {
18+ {0x2019}, // RIGHT SINGLE QUOTATION MARK
19+ {0x201D}, // RIGHT DOUBLE QUOTATION MARK
20+ {0x0029}, // RIGHT PARENTHESIS
21+ {0x3015}, // RIGHT TORTOISE SHELL BRACKET
22+ {0x005D}, // RIGHT SQUARE BRACKET
23+ {0x007D}, // RIGHT CURLY BRACKET
24+ {0x3009}, // RIGHT ANGLE BRACKET
25+ {0x300B}, // RIGHT DOUBLE ANGLE BRACKET
26+ {0x300D}, // RIGHT CORNER BRACKET
27+ {0x300F}, // RIGHT WHITE CORNER BRACKET
28+ {0x3011}, // RIGHT BLACK LENTICULAR BRACKET
29+ {0x2986}, // RIGHT WHITE PARENTHESIS
30+ {0x3019}, // RIGHT WHITE TORTOISE SHELL BRACKET
31+ {0x3017}, // RIGHT WHITE LENTICULAR BRACKET
32+ {0xFF09}, // Fullwidth Right Parenthesis
33+ {0x00BB}, // RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
34+ {0x301F}, // LOW DOUBLE PRIME QUOTATION MARK
35+ {0x2010}, // HYPHEN
36+ {0x301C}, // WAVE DASH
37+ {0x30A0}, // KATAKANA-HIRAGANA DOUBLE HYPHEN
38+ {0x2013}, // EN DASH
39+ {0x0021}, // EXCLAMATION MARK
40+ {0x003F}, // QUESTION MARK
41+ {0x203C}, // DOUBLE EXCLAMATION MARK
42+ {0x2047}, // DOUBLE QUESTION MARK
43+ {0x2048}, // QUESTION EXCLAMATION MARK
44+ {0x2049}, // EXCLAMATION QUESTION MARK
45+ {0x30FB}, // KATAKANA MIDDLE DOT
46+ {0x003A}, // COLON
47+ {0x003B}, // SEMICOLON
48+ {0x3002}, // IDEOGRAPHIC FULL STOP
49+ {0x002E}, // FULL STOP
50+ {0x3001}, // IDEOGRAPHIC COMMA
51+ {0x002C}, // COMMA
52+ {0x30FD}, // KATAKANA ITERATION MARK
53+ {0x30FE}, // KATAKANA VOICED ITERATION MARK
54+ {0x309D}, // HIRAGANA ITERATION MARK
55+ {0x309E}, // HIRAGANA VOICED ITERATION MARK
56+ {0x3005}, // IDEOGRAPHIC ITERATION MARK
57+ {0x303B}, // VERTICAL IDEOGRAPHIC ITERATION MARK
58+ {0x30FC}, // KATAKANA-HIRAGANA PROLONGED SOUND MARK
59+ {0x3041}, // HIRAGANA LETTER SMALL A
60+ {0x3043}, // HIRAGANA LETTER SMALL I
61+ {0x3045}, // HIRAGANA LETTER SMALL U
62+ {0x3047}, // HIRAGANA LETTER SMALL E
63+ {0x3049}, // HIRAGANA LETTER SMALL O
64+ {0x30A1}, // KATAKANA LETTER SMALL A
65+ {0x30A3}, // KATAKANA LETTER SMALL I
66+ {0x30A5}, // KATAKANA LETTER SMALL U
67+ {0x30A7}, // KATAKANA LETTER SMALL E
68+ {0x30A9}, // KATAKANA LETTER SMALL O
69+ {0x3063}, // HIRAGANA LETTER SMALL TU
70+ {0x3083}, // HIRAGANA LETTER SMALL YA
71+ {0x3085}, // HIRAGANA LETTER SMALL YU
72+ {0x3087}, // HIRAGANA LETTER SMALL YO
73+ {0x308E}, // HIRAGANA LETTER SMALL WA
74+ {0x3095}, // HIRAGANA LETTER SMALL KA
75+ {0x3096}, // HIRAGANA LETTER SMALL KE
76+ {0x30C3}, // KATAKANA LETTER SMALL TU
77+ {0x30E3}, // KATAKANA LETTER SMALL YA
78+ {0x30E5}, // KATAKANA LETTER SMALL YU
79+ {0x30E7}, // KATAKANA LETTER SMALL YO
80+ {0x30EE}, // KATAKANA LETTER SMALL WA
81+ {0x30F5}, // KATAKANA LETTER SMALL KA
82+ {0x30F6}, // KATAKANA LETTER SMALL KE
83+ {0x31F0}, // KATAKANA LETTER SMALL KU
84+ {0x31F1}, // KATAKANA LETTER SMALL SI
85+ {0x31F2}, // KATAKANA LETTER SMALL SU
86+ {0x31F3}, // KATAKANA LETTER SMALL TO
87+ {0x31F4}, // KATAKANA LETTER SMALL NU
88+ {0x31F5}, // KATAKANA LETTER SMALL HA
89+ {0x31F6}, // KATAKANA LETTER SMALL HI
90+ {0x31F7}, // KATAKANA LETTER SMALL HU
91+ {0x31F8}, // KATAKANA LETTER SMALL HE
92+ {0x31F9}, // KATAKANA LETTER SMALL HO
93+ {0x31FA}, // KATAKANA LETTER SMALL MU
94+ {0x31FB}, // KATAKANA LETTER SMALL RA
95+ {0x31FC}, // KATAKANA LETTER SMALL RI
96+ {0x31FD}, // KATAKANA LETTER SMALL RU
97+ {0x31FE}, // KATAKANA LETTER SMALL RE
98+ {0x31FF}, // KATAKANA LETTER SMALL RO
99+};
100+
101+// http://www.w3.org/TR/jlreq/#characters_not_ending_a_line
102+const std::set<UChar> kCannotEndLineJapanese = {
103+ {0x2018}, // LEFT SINGLE QUOTATION MARK
104+ {0x201C}, // LEFT DOUBLE QUOTATION MARK
105+ {0x0028}, // LEFT PARENTHESIS
106+ {0x3014}, // LEFT TORTOISE SHELL BRACKET
107+ {0x005B}, // LEFT SQUARE BRACKET
108+ {0x007B}, // LEFT CURLY BRACKET
109+ {0x3008}, // LEFT ANGLE BRACKET
110+ {0x300A}, // LEFT DOUBLE ANGLE BRACKET
111+ {0x300C}, // LEFT CORNER BRACKET
112+ {0x300E}, // LEFT WHITE CORNER BRACKET
113+ {0x3010}, // LEFT BLACK LENTICULAR BRACKET
114+ {0x2985}, // LEFT WHITE PARENTHESIS
115+ {0x3018}, // LEFT WHITE TORTOISE SHELL BRACKET
116+ {0x3016}, // LEFT WHITE LENTICULAR BRACKET
117+ {0xFF08}, // Fullwidth Left Parenthesis
118+ {0x00AB}, // LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
119+ {0x301D}, // REVERSED DOUBLE PRIME QUOTATION MARK
120+};
121+
122+
123+// http://unicode.org/faq/blocks_ranges.html
124+// http://unicode-table.com/en/blocks/
125+const std::set<UBlockCode> kCJKCodeBlocks = {
126+ {
127+ UBlockCode::UBLOCK_CJK_COMPATIBILITY,
128+ UBlockCode::UBLOCK_CJK_COMPATIBILITY_FORMS,
129+ UBlockCode::UBLOCK_CJK_COMPATIBILITY_IDEOGRAPHS,
130+ UBlockCode::UBLOCK_CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT,
131+ UBlockCode::UBLOCK_CJK_RADICALS_SUPPLEMENT,
132+ UBlockCode::UBLOCK_CJK_STROKES,
133+ UBlockCode::UBLOCK_CJK_SYMBOLS_AND_PUNCTUATION,
134+ UBlockCode::UBLOCK_CJK_UNIFIED_IDEOGRAPHS,
135+ UBlockCode::UBLOCK_CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A,
136+ UBlockCode::UBLOCK_CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B,
137+ UBlockCode::UBLOCK_CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C,
138+ UBlockCode::UBLOCK_CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D,
139+ UBlockCode::UBLOCK_HIRAGANA,
140+ UBlockCode::UBLOCK_KATAKANA,
141+ },
142+};
143+
144+bool is_cjk_character(UChar32 c) {
145+ return kCJKCodeBlocks.count(ublock_getCode(c)) == 1;
146+}
147+
148+
149 // Need to mirror () etc. for LTR languages, so we're sticking them in a map.
150 const std::map<UChar, UChar> kSymmetricChars = {
151 {0x0028, 0x0029}, // ()
152@@ -378,7 +510,7 @@
153 }
154
155
156-// True if a string does not contain Latin characters
157+// True if a string contains a character from an Arabic code block
158 bool has_arabic_character(const char* input) {
159 bool result = false;
160 const icu::UnicodeString parseme(input);
161@@ -590,4 +722,46 @@
162 return result;
163 }
164
165+// True if a string contains a character from a CJK code block
166+bool has_cjk_character(const char* input) {
167+ bool result = false;
168+ const icu::UnicodeString parseme(input);
169+ for (int32_t i = 0; i < parseme.length(); ++i) {
170+ if (is_cjk_character(parseme.char32At(i))) {
171+ result = true;
172+ break;
173+ }
174+ }
175+ return result;
176+}
177+
178+// Split a string of CJK characters into units that can have line breaks between them.
179+std::vector<std::string> split_cjk_word(const char* input) {
180+ const icu::UnicodeString parseme(input);
181+ std::vector<std::string> result;
182+ for (int i = 0; i < parseme.length(); ++i) {
183+ icu::UnicodeString temp;
184+ UChar c = parseme.charAt(i);
185+ temp += c;
186+ if (i < parseme.length() - 1) {
187+ UChar next = parseme.charAt(i + 1);
188+ if (cannot_end_line(c) || cannot_start_line(next)) {
189+ temp += next;
190+ ++i;
191+ }
192+ }
193+ std::string temp2;
194+ result.push_back(temp.toUTF8String(temp2));
195+ }
196+ return result;
197+}
198+
199+bool cannot_start_line(const UChar& c) {
200+ return kCannottStartLineJapanese.count(c) == 1;
201+}
202+
203+bool cannot_end_line(const UChar& c) {
204+ return kCannotEndLineJapanese.count(c) == 1;
205+}
206+
207 } // namespace UI
208
209=== modified file 'src/graphic/text/bidi.h'
210--- src/graphic/text/bidi.h 2015-09-26 09:34:20 +0000
211+++ src/graphic/text/bidi.h 2015-10-05 15:03:22 +0000
212@@ -23,14 +23,20 @@
213 #include <string>
214 #include <vector>
215
216+#include <unicode/uchar.h>
217+
218 #include "graphic/text/font_set.h"
219
220 // BiDi support for RTL languages
221 namespace i18n {
222 std::string make_ligatures(const char* input);
223 std::string line2bidi(const char* input);
224+ std::vector<std::string> split_cjk_word(const char* input);
225 bool has_rtl_character(const char* input);
226 bool has_rtl_character(std::vector<std::string> input);
227+ bool has_cjk_character(const char* input);
228+ bool cannot_start_line(const UChar& c);
229+ bool cannot_end_line(const UChar& c);
230
231 } // namespace UI
232
233
234=== modified file 'src/graphic/text/rt_render.cc'
235--- src/graphic/text/rt_render.cc 2015-09-26 18:04:24 +0000
236+++ src/graphic/text/rt_render.cc 2015-10-05 15:03:22 +0000
237@@ -754,7 +754,15 @@
238 }
239 word = ts.till_any_or_end(" \t\n\r");
240 if (!word.empty()) {
241- nodes.push_back(new TextNode(font_cache_.get_font(&ns), ns, i18n::make_ligatures(word.c_str())));
242+ word = i18n::make_ligatures(word.c_str());
243+ if (i18n::has_cjk_character(word.c_str())) {
244+ std::vector<std::string> units = i18n::split_cjk_word(word.c_str());
245+ for (const std::string& unit: units) {
246+ nodes.push_back(new TextNode(font_cache_.get_font(&ns), ns, unit));
247+ }
248+ } else {
249+ nodes.push_back(new TextNode(font_cache_.get_font(&ns), ns, word));
250+ }
251 }
252 }
253 }

Subscribers

People subscribed via source and target branches

to status/vote changes: