Merge lp:~widelands-dev/widelands/glossary_checks into lp:widelands
- glossary_checks
- Merge into trunk
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | 8315 | ||||
Proposed branch: | lp:~widelands-dev/widelands/glossary_checks | ||||
Merge into: | lp:widelands | ||||
Diff against target: |
605 lines (+601/-0) 1 file modified
utils/glossary_checks.py (+601/-0) |
||||
To merge this branch: | bzr merge lp:~widelands-dev/widelands/glossary_checks | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
GunChleoc | Needs Resubmitting | ||
Review via email: mp+312430@code.launchpad.net |
Commit message
Added a Python script to do automated glossary checks for translations. It enlists the help of Hunspell and 'misuses' the Transifex note field in order to reduce noise. Functionality for translators is documented in the wiki:
Description of the change
After the British English fiasco in Build 19, I decided it would be good to have some glossary checks for translations. We are using this kind of check at my workplace, and they help with keeping consistency on big projects.
Downloading the glossary from Transifex can't be automated, so we have to download it manually each time before we do the checks. So, I decided against committing it to the ode base - we won't want to accidentally check against an outdated glossary.
Translators can hack the glossary's comment fields to provide inflected word forms, so in the long run, it won't annoy the translators with false positive hits. For example, for "worker" = "Arbeiter", "workers" = "Arbeitern" can pass the check if a translator has added the relevant data.
I am also using the Hunspell stem function to reduce the noise. This is slow, but any entry that a translator doesn't have to look at needlessly is a good entry.
These check will be a service for the translation teams and NOT mandatory - we can't require of volunteers to go through them. Some of the translators gladly snapped up my last round of validations though, so some will like using this.
bunnybot (widelandsofficial) wrote : | # |
bunnybot (widelandsofficial) wrote : | # |
Continuous integration builds have changed state:
Travis build 1728. State: errored. Details: https:/
Appveyor build 1568. State: success. Details: https:/
bunnybot (widelandsofficial) wrote : | # |
Continuous integration builds have changed state:
Travis build 1731. State: passed. Details: https:/
Appveyor build 1571. State: success. Details: https:/
GunChleoc (gunchleoc) wrote : | # |
I guess I had quite a few new ideas after submitting this merge request.... should be done now. I have already dogfooded this with my own locale and fixed up a number of translations thanks to this check :)
Will create a zip of the results for the translators so they can check it out.
bunnybot (widelandsofficial) wrote : | # |
Bunnybot encountered an error while working on this merge proposal:
HTTP Error 500: Internal Server Error
bunnybot (widelandsofficial) wrote : | # |
Continuous integration builds have changed state:
Travis build 1757. State: passed. Details: https:/
Appveyor build 1597. State: success. Details: https:/
bunnybot (widelandsofficial) wrote : | # |
Bunnybot encountered an error while working on this merge proposal:
('The read operation timed out',)
bunnybot (widelandsofficial) wrote : | # |
Continuous integration builds have changed state:
Travis build 1757. State: passed. Details: https:/
Appveyor build 1597. State: success. Details: https:/
bunnybot (widelandsofficial) wrote : | # |
Bunnybot encountered an error while working on this merge proposal:
HTTP Error 500: Internal Server Error
bunnybot (widelandsofficial) wrote : | # |
Continuous integration builds have changed state:
Travis build 1757. State: passed. Details: https:/
Appveyor build 1597. State: success. Details: https:/
bunnybot (widelandsofficial) wrote : | # |
Continuous integration builds have changed state:
Travis build 1864. State: failed. Details: https:/
Appveyor build 1700. State: success. Details: https:/
GunChleoc (gunchleoc) wrote : | # |
It's getting a bit annoying to run these from a separate branch. Since none of this affects the Widelands code or translations directly, I'm gonna merge this now.
@bunnybot merge
bunnybot (widelandsofficial) wrote : | # |
Continuous integration builds have changed state:
Travis build 2031. State: passed. Details: https:/
Appveyor build 1700. State: success. Details: https:/
Preview Diff
1 | === added file 'utils/glossary_checks.py' |
2 | --- utils/glossary_checks.py 1970-01-01 00:00:00 +0000 |
3 | +++ utils/glossary_checks.py 2017-03-04 12:24:08 +0000 |
4 | @@ -0,0 +1,601 @@ |
5 | +#!/usr/bin/env python |
6 | +# encoding: utf-8 |
7 | + |
8 | +"""Runs a glossary check on all po files and writes the check results to |
9 | +po_validation/glossary. |
10 | + |
11 | +You will need to have the Translate Toolkit installed in order for the checks to work: |
12 | +http://toolkit.translatehouse.org/ |
13 | + |
14 | +This script also uses hunspell to reduce the number of false positive hits, so |
15 | +install as many of the needed hunspell dictionaries as you can find. This script |
16 | +will inform you about missing hunspell locales. |
17 | + |
18 | +For Debian-based Linux: sudo apt-get install translate-toolkit hunspell hunspell-ar hunspell-bg hunspell-br hunspell-ca hunspell-cs hunspell-da hunspell-de-de hunspell-el hunspell-en-ca hunspell-en-gb hunspell-en-us hunspell-eu hunspell-fr hunspell-gd hunspell-gl hunspell-he hunspell-hr hunspell-hu hunspell-it hunspell-ko hunspell-lt hunspell-nl hunspell-no hunspell-pl hunspell-pt-br hunspell-pt-pt hunspell-ro hunspell-ru hunspell-si hunspell-sk hunspell-sl hunspell-sr hunspell-sv hunspell-uk hunspell-vi |
19 | + |
20 | +You will need to provide an export of the Transifex glossary and specify it at |
21 | +the command line. Make sure to select "Include glossary notes in file" when |
22 | +exporting the csv from Transifex. |
23 | + |
24 | +Translators can 'misuse' their languages' comment field on Transifex to add |
25 | +inflected forms of their glossary translations. We use the delimiter '|' to |
26 | +signal that the field has inflected forms in it. Examples: |
27 | + |
28 | +Source Translation Comment Translation will be matched against |
29 | +------ ----------- ---------------- ----------------------------------- |
30 | +sheep sheep Nice, fluffy! 'sheep' |
31 | +ax axe axes| 'axe', 'axes' |
32 | +click click clicking|clicked 'click', 'clicking', 'clicked' |
33 | +click click clicking | clicked 'click', 'clicking', 'clicked' |
34 | + |
35 | +""" |
36 | + |
37 | +from collections import defaultdict |
38 | +from subprocess import call, CalledProcessError, Popen, PIPE |
39 | +import csv |
40 | +import os.path |
41 | +import re |
42 | +import subprocess |
43 | +import sys |
44 | +import time |
45 | +import traceback |
46 | + |
47 | +############################################################################# |
48 | +# Data Containers # |
49 | +############################################################################# |
50 | + |
51 | + |
52 | +class GlossaryEntry: |
53 | + """An entry in our parsed glossaries.""" |
54 | + |
55 | + def __init__(self): |
56 | + # Base form of the term, followed by any inflected forms |
57 | + self.terms = [] |
58 | + # Base form of the translation, followed by any inflected forms |
59 | + self.translations = [] |
60 | + |
61 | + |
62 | +class FailedTranslation: |
63 | + """Information about a translation that failed a check.""" |
64 | + |
65 | + def __init__(self): |
66 | + # The locale where the check failed |
67 | + self.locale = '' |
68 | + # The po file containing the failed translation |
69 | + self.po_file = '' |
70 | + # Source text |
71 | + self.source = '' |
72 | + # Target text |
73 | + self.target = '' |
74 | + # Location in the source code |
75 | + self.location = '' |
76 | + # The glossary term that failed the check |
77 | + self.term = '' |
78 | + # The base form of the translated glossary term |
79 | + self.translation = '' |
80 | + |
81 | + |
82 | +class HunspellLocale: |
83 | + """A specific locale for Hunspell, plus whether its dictionary is |
84 | + installed.""" |
85 | + |
86 | + def __init__(self, locale): |
87 | + # Specific language/country code for Hunspell, e.g. el_GR |
88 | + self.locale = locale |
89 | + # Whether a dictionary has been found for the locale |
90 | + self.is_available = False |
91 | + |
92 | +hunspell_locales = defaultdict(list) |
93 | +""" Hunspell needs specific locales""" |
94 | + |
95 | +############################################################################# |
96 | +# File System Functions # |
97 | +############################################################################# |
98 | + |
99 | + |
100 | +def read_csv_file(filepath): |
101 | + """Parses a CSV file into a 2-dimensional array.""" |
102 | + result = [] |
103 | + with open(filepath) as csvfile: |
104 | + csvreader = csv.reader(csvfile, delimiter=',', quotechar='"') |
105 | + for row in csvreader: |
106 | + result.append(row) |
107 | + return result |
108 | + |
109 | + |
110 | +def make_path(base_path, subdir): |
111 | + """Creates the correct form of the path and makes sure that it exists.""" |
112 | + result = os.path.abspath(os.path.join(base_path, subdir)) |
113 | + if not os.path.exists(result): |
114 | + os.makedirs(result) |
115 | + return result |
116 | + |
117 | + |
118 | +def delete_path(path): |
119 | + """Deletes the directory specified by 'path' and all its subdirectories and |
120 | + file contents.""" |
121 | + if os.path.exists(path) and not os.path.isfile(path): |
122 | + files = sorted(os.listdir(path), key=str.lower) |
123 | + for deletefile in files: |
124 | + deleteme = os.path.abspath(os.path.join(path, deletefile)) |
125 | + if os.path.isfile(deleteme): |
126 | + try: |
127 | + os.remove(deleteme) |
128 | + except Exception: |
129 | + print('Failed to delete file ' + deleteme) |
130 | + else: |
131 | + delete_path(deleteme) |
132 | + try: |
133 | + os.rmdir(path) |
134 | + except Exception: |
135 | + print('Failed to delete path ' + deleteme) |
136 | + |
137 | +############################################################################# |
138 | +# Glossary Loading # |
139 | +############################################################################# |
140 | + |
141 | + |
142 | +def set_has_hunspell_locale(hunspell_locale): |
143 | + """Tries calling hunspell with the given locale and returns false if it has |
144 | + failed.""" |
145 | + try: |
146 | + process = Popen(['hunspell', '-d', hunspell_locale.locale, |
147 | + '-s'], stderr=PIPE, stdout=PIPE, stdin=PIPE) |
148 | + hunspell_result = process.communicate('foo') |
149 | + if hunspell_result[1] == None: |
150 | + hunspell_locale.is_available = True |
151 | + return True |
152 | + else: |
153 | + print('Error loading Hunspell dictionary for locale ' + |
154 | + hunspell_locale.locale + ': ' + hunspell_result[1]) |
155 | + return False |
156 | + |
157 | + except CalledProcessError: |
158 | + print('Failed to run hunspell for locale: ' + hunspell_locale.locale) |
159 | + return False |
160 | + |
161 | + |
162 | +def get_hunspell_locale(locale): |
163 | + """Returns the corresponding Hunspell locale for this locale, or empty |
164 | + string if not available.""" |
165 | + if len(hunspell_locales[locale]) == 1 and hunspell_locales[locale][0].is_available: |
166 | + return hunspell_locales[locale][0].locale |
167 | + return '' |
168 | + |
169 | + |
170 | +def load_hunspell_locales(locale): |
171 | + """Registers locales for Hunspell. |
172 | + |
173 | + Maps a list of generic locales to specific locales and checks which |
174 | + dictionaries are available. If locale != "all", load only the |
175 | + dictionary for the given locale. |
176 | + |
177 | + """ |
178 | + hunspell_locales['bg'].append(HunspellLocale('bg_BG')) |
179 | + hunspell_locales['br'].append(HunspellLocale('br_FR')) |
180 | + hunspell_locales['ca'].append(HunspellLocale('ca_ES')) |
181 | + hunspell_locales['da'].append(HunspellLocale('da_DK')) |
182 | + hunspell_locales['cs'].append(HunspellLocale('cs_CZ')) |
183 | + hunspell_locales['de'].append(HunspellLocale('de_DE')) |
184 | + hunspell_locales['el'].append(HunspellLocale('el_GR')) |
185 | + hunspell_locales['en_CA'].append(HunspellLocale('en_CA')) |
186 | + hunspell_locales['en_GB'].append(HunspellLocale('en_GB')) |
187 | + hunspell_locales['en_US'].append(HunspellLocale('en_US')) |
188 | + hunspell_locales['eo'].append(HunspellLocale('eo')) |
189 | + hunspell_locales['es'].append(HunspellLocale('es_ES')) |
190 | + hunspell_locales['et'].append(HunspellLocale('et_EE')) |
191 | + hunspell_locales['eu'].append(HunspellLocale('eu_ES')) |
192 | + hunspell_locales['fa'].append(HunspellLocale('fa_IR')) |
193 | + hunspell_locales['fi'].append(HunspellLocale('fi_FI')) |
194 | + hunspell_locales['fr'].append(HunspellLocale('fr_FR')) |
195 | + hunspell_locales['gd'].append(HunspellLocale('gd_GB')) |
196 | + hunspell_locales['gl'].append(HunspellLocale('gl_ES')) |
197 | + hunspell_locales['he'].append(HunspellLocale('he_IL')) |
198 | + hunspell_locales['hr'].append(HunspellLocale('hr_HR')) |
199 | + hunspell_locales['hu'].append(HunspellLocale('hu_HU')) |
200 | + hunspell_locales['ia'].append(HunspellLocale('ia')) |
201 | + hunspell_locales['id'].append(HunspellLocale('id_ID')) |
202 | + hunspell_locales['it'].append(HunspellLocale('it_IT')) |
203 | + hunspell_locales['ja'].append(HunspellLocale('ja_JP')) |
204 | + hunspell_locales['jv'].append(HunspellLocale('jv_ID')) |
205 | + hunspell_locales['ka'].append(HunspellLocale('ka_GE')) |
206 | + hunspell_locales['ko'].append(HunspellLocale('ko_KR')) |
207 | + hunspell_locales['krl'].append(HunspellLocale('krl_RU')) |
208 | + hunspell_locales['la'].append(HunspellLocale('la')) |
209 | + hunspell_locales['lt'].append(HunspellLocale('lt_LT')) |
210 | + hunspell_locales['mr'].append(HunspellLocale('mr_IN')) |
211 | + hunspell_locales['ms'].append(HunspellLocale('ms_MY')) |
212 | + hunspell_locales['my'].append(HunspellLocale('my_MM')) |
213 | + hunspell_locales['nb'].append(HunspellLocale('nb_NO')) |
214 | + hunspell_locales['nds'].append(HunspellLocale('nds_DE')) |
215 | + hunspell_locales['nl'].append(HunspellLocale('nl_NL')) |
216 | + hunspell_locales['nn'].append(HunspellLocale('nn_NO')) |
217 | + hunspell_locales['oc'].append(HunspellLocale('oc_FR')) |
218 | + hunspell_locales['pl'].append(HunspellLocale('pl_PL')) |
219 | + hunspell_locales['pt'].append(HunspellLocale('pt_PT')) |
220 | + hunspell_locales['ro'].append(HunspellLocale('ro_RO')) |
221 | + hunspell_locales['ru'].append(HunspellLocale('ru_RU')) |
222 | + hunspell_locales['rw'].append(HunspellLocale('rw_RW')) |
223 | + hunspell_locales['si'].append(HunspellLocale('si_LK')) |
224 | + hunspell_locales['sk'].append(HunspellLocale('sk_SK')) |
225 | + hunspell_locales['sl'].append(HunspellLocale('sl_SI')) |
226 | + hunspell_locales['sr'].append(HunspellLocale('sr_RS')) |
227 | + hunspell_locales['sv'].append(HunspellLocale('sv_SE')) |
228 | + hunspell_locales['tr'].append(HunspellLocale('tr_TR')) |
229 | + hunspell_locales['uk'].append(HunspellLocale('uk_UA')) |
230 | + hunspell_locales['vi'].append(HunspellLocale('vi_VN')) |
231 | + hunspell_locales['zh_CN'].append(HunspellLocale('zh_CN')) |
232 | + hunspell_locales['zh_TW'].append(HunspellLocale('zh_TW')) |
233 | + if locale == 'all': |
234 | + print('Looking for Hunspell dictionaries') |
235 | + for locale in hunspell_locales: |
236 | + set_has_hunspell_locale(hunspell_locales[locale][0]) |
237 | + else: |
238 | + print('Looking for Hunspell dictionary') |
239 | + set_has_hunspell_locale(hunspell_locales[locale][0]) |
240 | + |
241 | + |
242 | +def is_vowel(character): |
243 | + """Helper function for creating inflections of English words.""" |
244 | + return character == 'a' or character == 'e' or character == 'i' \ |
245 | + or character == 'o' or character == 'u' or character == 'y' |
246 | + |
247 | + |
248 | +def make_english_plural(word): |
249 | + """Create plural forms for nouns. |
250 | + |
251 | + This will create a few nonsense entries for irregular plurals, but |
252 | + it's good enough for our purpose. Glossary contains pluralized |
253 | + terms, so we don't add any plural forms for strings ending in 's'. |
254 | + |
255 | + """ |
256 | + result = '' |
257 | + if not word.endswith('s'): |
258 | + if word.endswith('y') and not is_vowel(word[-2:-1]): |
259 | + result = word[0:-1] + 'ies' |
260 | + elif word.endswith('z') or word.endswith('x') or word.endswith('ch') or word.endswith('sh') or word.endswith('o'): |
261 | + result = word + 'es' |
262 | + else: |
263 | + result = word + 's' |
264 | + return result |
265 | + |
266 | + |
267 | +def make_english_verb_forms(word): |
268 | + """Create inflected forms of an English verb: -ed and -ing forms. |
269 | + |
270 | + Will create nonsense for irregular verbs. |
271 | + |
272 | + """ |
273 | + result = [] |
274 | + if word.endswith('e'): |
275 | + result.append(word[0:-1] + 'ing') |
276 | + result.append(word + 'd') |
277 | + elif is_vowel(word[-2:-1]) and not is_vowel(word[-1]): |
278 | + # The consonant is duplicated here if the last syllable is stressed. |
279 | + # We can't detect stress, so we add both variants. |
280 | + result.append(word + word[-1] + 'ing') |
281 | + result.append(word + 'ing') |
282 | + result.append(word + word[-1] + 'ed') |
283 | + result.append(word + 'ed') |
284 | + elif word.endswith('y') and not is_vowel(word[-2:-1]): |
285 | + result.append(word + 'ing') |
286 | + result.append(word[0:-1] + 'ed') |
287 | + else: |
288 | + result.append(word + 'ing') |
289 | + result.append(word + 'ed') |
290 | + # 3rd person s has the same pattern as noun plurals. |
291 | + # We ommitted words ending on s i the plural, so we add them here. |
292 | + if word.endswith('s'): |
293 | + result.append(word + 'es') |
294 | + else: |
295 | + result.append(make_english_plural(word)) |
296 | + return result |
297 | + |
298 | + |
299 | +def load_glossary(glossary_file, locale): |
300 | + """Build a glossary from the given Transifex glossary csv file for the |
301 | + given locale.""" |
302 | + result = [] |
303 | + counter = 0 |
304 | + term_index = 0 |
305 | + term_comment_index = 0 |
306 | + wordclass_index = 0 |
307 | + translation_index = 0 |
308 | + comment_index = 0 |
309 | + for row in read_csv_file(glossary_file): |
310 | + # Detect the column indices |
311 | + if counter == 0: |
312 | + colum_counter = 0 |
313 | + for header in row: |
314 | + if header == 'term': |
315 | + term_index = colum_counter |
316 | + elif header == 'comment': |
317 | + term_comment_index = colum_counter |
318 | + elif header == 'pos': |
319 | + wordclass_index = colum_counter |
320 | + elif header == 'translation_' + locale or header == locale: |
321 | + translation_index = colum_counter |
322 | + elif header == 'comment_' + locale: |
323 | + comment_index = colum_counter |
324 | + colum_counter = colum_counter + 1 |
325 | + # If there is a translation, parse the entry |
326 | + # We also have some obsolete terms in the glossary that we want to |
327 | + # filter out. |
328 | + elif len(row[translation_index].strip()) > 0 and not row[term_comment_index].startswith('OBSOLETE'): |
329 | + if translation_index == 0: |
330 | + raise Exception( |
331 | + 'Locale %s is missing from glossary file.' % locale) |
332 | + if comment_index == 0: |
333 | + raise Exception( |
334 | + 'Comment field for locale %s is missing from glossary file.' % locale) |
335 | + entry = GlossaryEntry() |
336 | + entry.terms.append(row[term_index].strip()) |
337 | + if row[wordclass_index] == 'Noun': |
338 | + plural = make_english_plural(entry.terms[0]) |
339 | + if len(plural) > 0: |
340 | + entry.terms.append(plural) |
341 | + elif row[wordclass_index] == 'Verb': |
342 | + verb_forms = make_english_verb_forms(entry.terms[0]) |
343 | + for verb_form in verb_forms: |
344 | + entry.terms.append(verb_form) |
345 | + |
346 | + entry.translations.append(row[translation_index].strip()) |
347 | + |
348 | + # Misuse the comment field to provide a list of inflected forms. |
349 | + # Otherwise, we would get tons of false positive hits in the checks |
350 | + # later on and the translators would have our heads on a platter. |
351 | + delimiter = '|' |
352 | + if len(row[comment_index].strip()) > 1 and delimiter in row[comment_index]: |
353 | + inflections = row[comment_index].split(delimiter) |
354 | + for inflection in inflections: |
355 | + entry.translations.append(inflection.strip()) |
356 | + |
357 | + result.append(entry) |
358 | + counter = counter + 1 |
359 | + return result |
360 | + |
361 | + |
362 | +############################################################################# |
363 | +# Term Checking # |
364 | +############################################################################# |
365 | + |
366 | + |
367 | +def contains_term(string, term): |
368 | + """Checks whether 'string' contains 'term' as a whole word. |
369 | + |
370 | + This check is case-ionsensitive. |
371 | + |
372 | + """ |
373 | + result = False |
374 | + # Regex is slow, so we do this preliminary check |
375 | + if term.lower() in string.lower(): |
376 | + # Now make sure that it's whole words! |
377 | + # We won't want to match "AI" against "again" etc. |
378 | + regex = re.compile('^|(.+\W)' + term + '(\W.+)|$', re.IGNORECASE) |
379 | + result = regex.match(string) |
380 | + return result |
381 | + |
382 | + |
383 | +def source_contains_term(source_to_check, entry, glossary): |
384 | + """Checks if the source string contains the glossary entry while filtering |
385 | + out superstrings from the glossary, e.g. we don't want to check 'arena' |
386 | + against 'battle arena'.""" |
387 | + source_to_check = source_to_check.lower() |
388 | + for term in entry.terms: |
389 | + term = term.lower() |
390 | + if term in source_to_check: |
391 | + source_regex = re.compile('.+[\s,.]' + term + '[\s,.].+') |
392 | + if source_regex.match(source_to_check): |
393 | + for entry2 in glossary: |
394 | + if entry.terms[0] != entry2.terms[0]: |
395 | + for term2 in entry2.terms: |
396 | + term2 = term2.lower() |
397 | + if term2 != term and term in term2 and term2 in source_to_check: |
398 | + source_to_check = source_to_check.replace( |
399 | + term2, '') |
400 | + # Check if the source still contains the term to check |
401 | + return contains_term(source_to_check, term) |
402 | + return False |
403 | + |
404 | + |
405 | +def append_hunspell_stems(hunspell_locale, translation): |
406 | + """ Use hunspell to append the stems for terms found = less work for glossary editors. |
407 | + The effectiveness of this check depends on how good the hunspell data is.""" |
408 | + try: |
409 | + process = Popen(['hunspell', '-d', hunspell_locale, |
410 | + '-s'], stdout=PIPE, stdin=PIPE) |
411 | + hunspell_result = process.communicate(translation) |
412 | + if hunspell_result[0] != '': |
413 | + translation = ' '.join([translation, hunspell_result[0]]) |
414 | + except CalledProcessError: |
415 | + print('Failed to run hunspell for locale: ' + hunspell_locale) |
416 | + return translation |
417 | + |
418 | + |
419 | +def translation_has_term(entry, target): |
420 | + """Verify the target translation against all translation variations from |
421 | + the glossary.""" |
422 | + result = False |
423 | + for translation in entry.translations: |
424 | + if contains_term(target, translation): |
425 | + result = True |
426 | + break |
427 | + return result |
428 | + |
429 | + |
430 | +def check_file(csv_file, glossaries, locale, po_file): |
431 | + """Run the actual check.""" |
432 | + translations = read_csv_file(csv_file) |
433 | + source_index = 0 |
434 | + target_index = 0 |
435 | + location_index = 0 |
436 | + hits = [] |
437 | + counter = 0 |
438 | + has_hunspell = True |
439 | + hunspell_locale = get_hunspell_locale(locale) |
440 | + for row in translations: |
441 | + # Detect the column indices |
442 | + if counter == 0: |
443 | + colum_counter = 0 |
444 | + for header in row: |
445 | + if header == 'source': |
446 | + source_index = colum_counter |
447 | + elif header == 'target': |
448 | + target_index = colum_counter |
449 | + elif header == 'location': |
450 | + location_index = colum_counter |
451 | + colum_counter = colum_counter + 1 |
452 | + else: |
453 | + for entry in glossaries[locale][0]: |
454 | + # Check if the source text contains the glossary term. |
455 | + # Filter out superstrings, e.g. we don't want to check |
456 | + # "arena" against "battle arena" |
457 | + if source_contains_term(row[source_index], entry, glossaries[locale][0]): |
458 | + # Now verify the translation against all translation |
459 | + # variations from the glossary |
460 | + term_found = translation_has_term(entry, row[target_index]) |
461 | + # Add Hunspell stems for better matches and try again |
462 | + # We do it here because the Hunspell manipulation is slow. |
463 | + if not term_found and hunspell_locale != '': |
464 | + target_to_check = append_hunspell_stems( |
465 | + hunspell_locale, row[target_index]) |
466 | + term_found = translation_has_term( |
467 | + entry, target_to_check) |
468 | + if not term_found: |
469 | + hit = FailedTranslation() |
470 | + hit.source = row[source_index] |
471 | + hit.target = row[target_index] |
472 | + hit.location = row[location_index] |
473 | + hit.term = entry.terms[0] |
474 | + hit.translation = entry.translations[0] |
475 | + hit.locale = locale |
476 | + hit.po_file = po_file |
477 | + hits.append(hit) |
478 | + counter = counter + 1 |
479 | + return hits |
480 | + |
481 | + |
482 | +############################################################################# |
483 | +# Main Loop # |
484 | +############################################################################# |
485 | + |
486 | + |
487 | +def check_translations_with_glossary(input_path, output_path, glossary_file, only_locale): |
488 | + """Main loop. |
489 | + |
490 | + Loads the Transifex and Hunspell glossaries, converts all po files |
491 | + for languages that have glossary entries to temporary csv files, |
492 | + runs the check and then reports any hits to csv files. |
493 | + |
494 | + """ |
495 | + print('Locale: ' + only_locale) |
496 | + temp_path = make_path(output_path, 'temp_glossary') |
497 | + hits = [] |
498 | + locale_list = defaultdict(list) |
499 | + |
500 | + glossaries = defaultdict(list) |
501 | + load_hunspell_locales(only_locale) |
502 | + |
503 | + source_directories = sorted(os.listdir(input_path), key=str.lower) |
504 | + for dirname in source_directories: |
505 | + dirpath = os.path.join(input_path, dirname) |
506 | + if os.path.isdir(dirpath): |
507 | + source_files = sorted(os.listdir(dirpath), key=str.lower) |
508 | + sys.stdout.write("\nChecking text domain '" + dirname + "': ") |
509 | + sys.stdout.flush() |
510 | + failed = 0 |
511 | + for source_filename in source_files: |
512 | + po_file = dirpath + '/' + source_filename |
513 | + if source_filename.endswith('.po'): |
514 | + locale = source_filename[0:-3] |
515 | + if only_locale == 'all' or locale == only_locale: |
516 | + # Load the glossary if we haven't seen this locale |
517 | + # before |
518 | + if len(glossaries[locale]) < 1: |
519 | + sys.stdout.write( |
520 | + '\nLoading glossary for ' + locale) |
521 | + glossaries[locale].append( |
522 | + load_glossary(glossary_file, locale)) |
523 | + sys.stdout.write(' - %d entries ' % |
524 | + len(glossaries[locale][0])) |
525 | + sys.stdout.flush() |
526 | + # Only bother with locales that have glossary entries |
527 | + if len(glossaries[locale][0]) > 0: |
528 | + sys.stdout.write(locale + ' ') |
529 | + sys.stdout.flush() |
530 | + if len(locale_list[locale]) < 1: |
531 | + locale_list[locale].append(locale) |
532 | + csv_file = os.path.abspath(os.path.join( |
533 | + temp_path, dirname + '_' + locale + '.csv')) |
534 | + # Convert to csv for easy parsing |
535 | + call(['po2csv', '--progress=none', po_file, csv_file]) |
536 | + |
537 | + # Now run the actual check |
538 | + current_hits = check_file( |
539 | + csv_file, glossaries, locale, dirname) |
540 | + for hit in current_hits: |
541 | + hits.append(hit) |
542 | + |
543 | + # The csv file is no longer needed, delete it. |
544 | + os.remove(csv_file) |
545 | + |
546 | + hits = sorted(hits, key=lambda FailedTranslation: [ |
547 | + FailedTranslation.locale, FailedTranslation.translation]) |
548 | + for locale in locale_list: |
549 | + locale_result = '"glossary_term","glossary_translation","source","target","file","location"\n' |
550 | + counter = 0 |
551 | + for hit in hits: |
552 | + if hit.locale == locale: |
553 | + row = '"%s","%s","%s","%s","%s","%s"\n' % ( |
554 | + hit.term, hit.translation, hit.source, hit.target, hit.po_file, hit.location) |
555 | + locale_result = locale_result + row |
556 | + counter = counter + 1 |
557 | + dest_filepath = output_path + '/glossary_check_' + locale + '.csv' |
558 | + with open(dest_filepath, 'wt') as dest_file: |
559 | + dest_file.write(locale_result) |
560 | + # Uncomment this line to print a statistic of the number of hits for each locale |
561 | + # print("%s\t%d"%(locale, counter)) |
562 | + |
563 | + delete_path(temp_path) |
564 | + return 0 |
565 | + |
566 | + |
567 | +def main(): |
568 | + """Checks whether we are in the correct directory and everything's there, |
569 | + then runs a glossary check over all PO files.""" |
570 | + if len(sys.argv) == 2 or len(sys.argv) == 3: |
571 | + print('Running glossary checks:') |
572 | + else: |
573 | + print( |
574 | + 'Usage: glossary_checks.py <relative-path-to-glossary> [locale]') |
575 | + return 1 |
576 | + |
577 | + try: |
578 | + print('Current time: %s' % time.ctime()) |
579 | + # Prepare the paths |
580 | + glossary_file = os.path.abspath(os.path.join( |
581 | + os.path.dirname(__file__), sys.argv[1])) |
582 | + locale = 'all' |
583 | + if len(sys.argv) == 3: |
584 | + locale = sys.argv[2] |
585 | + |
586 | + if (not (os.path.exists(glossary_file) and os.path.isfile(glossary_file))): |
587 | + print('There is no glossary file at ' + glossary_file) |
588 | + return 1 |
589 | + |
590 | + input_path = os.path.abspath(os.path.join( |
591 | + os.path.dirname(__file__), '../po')) |
592 | + output_path = make_path(os.path.dirname(__file__), '../po_validation') |
593 | + result = check_translations_with_glossary( |
594 | + input_path, output_path, glossary_file, locale) |
595 | + print('Current time: %s' % time.ctime()) |
596 | + return result |
597 | + |
598 | + except Exception: |
599 | + print('Something went wrong:') |
600 | + traceback.print_exc() |
601 | + delete_path(make_path(output_path, 'temp_glossary')) |
602 | + return 1 |
603 | + |
604 | +if __name__ == '__main__': |
605 | + sys.exit(main()) |
Continuous integration builds have changed state:
Travis build 1708. State: passed. Details: https:/ /travis- ci.org/ widelands/ widelands/ builds/ 181170109. /ci.appveyor. com/project/ widelands- dev/widelands/ build/_ widelands_ dev_widelands_ glossary_ checks- 1548.
Appveyor build 1548. State: success. Details: https:/