Merge lp:~widelands-dev/widelands/glossary_checks into lp:widelands

Proposed by GunChleoc
Status: Merged
Merged at revision: 8315
Proposed branch: lp:~widelands-dev/widelands/glossary_checks
Merge into: lp:widelands
Diff against target: 605 lines (+601/-0)
1 file modified
utils/glossary_checks.py (+601/-0)
To merge this branch: bzr merge lp:~widelands-dev/widelands/glossary_checks
Reviewer Review Type Date Requested Status
GunChleoc Needs Resubmitting
Review via email: mp+312430@code.launchpad.net

Commit message

Added a Python script to do automated glossary checks for translations. It enlists the help of Hunspell and 'misuses' the Transifex note field in order to reduce noise. Functionality for translators is documented in the wiki:

https://wl.widelands.org/wiki/TranslatingWidelands/#preparing-your-glossary-for-automated-keyword-checks

Description of the change

After the British English fiasco in Build 19, I decided it would be good to have some glossary checks for translations. We are using this kind of check at my workplace, and they help with keeping consistency on big projects.

Downloading the glossary from Transifex can't be automated, so we have to download it manually each time before we do the checks. So, I decided against committing it to the ode base - we won't want to accidentally check against an outdated glossary.

Translators can hack the glossary's comment fields to provide inflected word forms, so in the long run, it won't annoy the translators with false positive hits. For example, for "worker" = "Arbeiter", "workers" = "Arbeitern" can pass the check if a translator has added the relevant data.

I am also using the Hunspell stem function to reduce the noise. This is slow, but any entry that a translator doesn't have to look at needlessly is a good entry.

These check will be a service for the translation teams and NOT mandatory - we can't require of volunteers to go through them. Some of the translators gladly snapped up my last round of validations though, so some will like using this.

To post a comment you must log in.
Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1708. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/181170109.
Appveyor build 1548. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1548.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1728. State: errored. Details: https://travis-ci.org/widelands/widelands/builds/182020101.
Appveyor build 1568. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1568.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1731. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/182101596.
Appveyor build 1571. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1571.

Revision history for this message
GunChleoc (gunchleoc) wrote :

I guess I had quite a few new ideas after submitting this merge request.... should be done now. I have already dogfooded this with my own locale and fixed up a number of translations thanks to this check :)

Will create a zip of the results for the translators so they can check it out.

review: Needs Resubmitting
Revision history for this message
bunnybot (widelandsofficial) wrote :

Bunnybot encountered an error while working on this merge proposal:

HTTP Error 500: Internal Server Error

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1757. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/182873856.
Appveyor build 1597. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1597.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Bunnybot encountered an error while working on this merge proposal:

('The read operation timed out',)

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1757. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/182873856.
Appveyor build 1597. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1597.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Bunnybot encountered an error while working on this merge proposal:

HTTP Error 500: Internal Server Error

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1757. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/182873856.
Appveyor build 1597. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1597.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1864. State: failed. Details: https://travis-ci.org/widelands/widelands/builds/194595810.
Appveyor build 1700. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1700.

Revision history for this message
GunChleoc (gunchleoc) wrote :

It's getting a bit annoying to run these from a separate branch. Since none of this affects the Widelands code or translations directly, I'm gonna merge this now.

@bunnybot merge

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 2031. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/207665942.
Appveyor build 1700. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1700.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== added file 'utils/glossary_checks.py'
--- utils/glossary_checks.py 1970-01-01 00:00:00 +0000
+++ utils/glossary_checks.py 2017-03-04 12:24:08 +0000
@@ -0,0 +1,601 @@
1#!/usr/bin/env python
2# encoding: utf-8
3
4"""Runs a glossary check on all po files and writes the check results to
5po_validation/glossary.
6
7You will need to have the Translate Toolkit installed in order for the checks to work:
8http://toolkit.translatehouse.org/
9
10This script also uses hunspell to reduce the number of false positive hits, so
11install as many of the needed hunspell dictionaries as you can find. This script
12will inform you about missing hunspell locales.
13
14For Debian-based Linux: sudo apt-get install translate-toolkit hunspell hunspell-ar hunspell-bg hunspell-br hunspell-ca hunspell-cs hunspell-da hunspell-de-de hunspell-el hunspell-en-ca hunspell-en-gb hunspell-en-us hunspell-eu hunspell-fr hunspell-gd hunspell-gl hunspell-he hunspell-hr hunspell-hu hunspell-it hunspell-ko hunspell-lt hunspell-nl hunspell-no hunspell-pl hunspell-pt-br hunspell-pt-pt hunspell-ro hunspell-ru hunspell-si hunspell-sk hunspell-sl hunspell-sr hunspell-sv hunspell-uk hunspell-vi
15
16You will need to provide an export of the Transifex glossary and specify it at
17the command line. Make sure to select "Include glossary notes in file" when
18exporting the csv from Transifex.
19
20Translators can 'misuse' their languages' comment field on Transifex to add
21inflected forms of their glossary translations. We use the delimiter '|' to
22signal that the field has inflected forms in it. Examples:
23
24Source Translation Comment Translation will be matched against
25------ ----------- ---------------- -----------------------------------
26sheep sheep Nice, fluffy! 'sheep'
27ax axe axes| 'axe', 'axes'
28click click clicking|clicked 'click', 'clicking', 'clicked'
29click click clicking | clicked 'click', 'clicking', 'clicked'
30
31"""
32
33from collections import defaultdict
34from subprocess import call, CalledProcessError, Popen, PIPE
35import csv
36import os.path
37import re
38import subprocess
39import sys
40import time
41import traceback
42
43#############################################################################
44# Data Containers #
45#############################################################################
46
47
48class GlossaryEntry:
49 """An entry in our parsed glossaries."""
50
51 def __init__(self):
52 # Base form of the term, followed by any inflected forms
53 self.terms = []
54 # Base form of the translation, followed by any inflected forms
55 self.translations = []
56
57
58class FailedTranslation:
59 """Information about a translation that failed a check."""
60
61 def __init__(self):
62 # The locale where the check failed
63 self.locale = ''
64 # The po file containing the failed translation
65 self.po_file = ''
66 # Source text
67 self.source = ''
68 # Target text
69 self.target = ''
70 # Location in the source code
71 self.location = ''
72 # The glossary term that failed the check
73 self.term = ''
74 # The base form of the translated glossary term
75 self.translation = ''
76
77
78class HunspellLocale:
79 """A specific locale for Hunspell, plus whether its dictionary is
80 installed."""
81
82 def __init__(self, locale):
83 # Specific language/country code for Hunspell, e.g. el_GR
84 self.locale = locale
85 # Whether a dictionary has been found for the locale
86 self.is_available = False
87
88hunspell_locales = defaultdict(list)
89""" Hunspell needs specific locales"""
90
91#############################################################################
92# File System Functions #
93#############################################################################
94
95
96def read_csv_file(filepath):
97 """Parses a CSV file into a 2-dimensional array."""
98 result = []
99 with open(filepath) as csvfile:
100 csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
101 for row in csvreader:
102 result.append(row)
103 return result
104
105
106def make_path(base_path, subdir):
107 """Creates the correct form of the path and makes sure that it exists."""
108 result = os.path.abspath(os.path.join(base_path, subdir))
109 if not os.path.exists(result):
110 os.makedirs(result)
111 return result
112
113
114def delete_path(path):
115 """Deletes the directory specified by 'path' and all its subdirectories and
116 file contents."""
117 if os.path.exists(path) and not os.path.isfile(path):
118 files = sorted(os.listdir(path), key=str.lower)
119 for deletefile in files:
120 deleteme = os.path.abspath(os.path.join(path, deletefile))
121 if os.path.isfile(deleteme):
122 try:
123 os.remove(deleteme)
124 except Exception:
125 print('Failed to delete file ' + deleteme)
126 else:
127 delete_path(deleteme)
128 try:
129 os.rmdir(path)
130 except Exception:
131 print('Failed to delete path ' + deleteme)
132
133#############################################################################
134# Glossary Loading #
135#############################################################################
136
137
138def set_has_hunspell_locale(hunspell_locale):
139 """Tries calling hunspell with the given locale and returns false if it has
140 failed."""
141 try:
142 process = Popen(['hunspell', '-d', hunspell_locale.locale,
143 '-s'], stderr=PIPE, stdout=PIPE, stdin=PIPE)
144 hunspell_result = process.communicate('foo')
145 if hunspell_result[1] == None:
146 hunspell_locale.is_available = True
147 return True
148 else:
149 print('Error loading Hunspell dictionary for locale ' +
150 hunspell_locale.locale + ': ' + hunspell_result[1])
151 return False
152
153 except CalledProcessError:
154 print('Failed to run hunspell for locale: ' + hunspell_locale.locale)
155 return False
156
157
158def get_hunspell_locale(locale):
159 """Returns the corresponding Hunspell locale for this locale, or empty
160 string if not available."""
161 if len(hunspell_locales[locale]) == 1 and hunspell_locales[locale][0].is_available:
162 return hunspell_locales[locale][0].locale
163 return ''
164
165
166def load_hunspell_locales(locale):
167 """Registers locales for Hunspell.
168
169 Maps a list of generic locales to specific locales and checks which
170 dictionaries are available. If locale != "all", load only the
171 dictionary for the given locale.
172
173 """
174 hunspell_locales['bg'].append(HunspellLocale('bg_BG'))
175 hunspell_locales['br'].append(HunspellLocale('br_FR'))
176 hunspell_locales['ca'].append(HunspellLocale('ca_ES'))
177 hunspell_locales['da'].append(HunspellLocale('da_DK'))
178 hunspell_locales['cs'].append(HunspellLocale('cs_CZ'))
179 hunspell_locales['de'].append(HunspellLocale('de_DE'))
180 hunspell_locales['el'].append(HunspellLocale('el_GR'))
181 hunspell_locales['en_CA'].append(HunspellLocale('en_CA'))
182 hunspell_locales['en_GB'].append(HunspellLocale('en_GB'))
183 hunspell_locales['en_US'].append(HunspellLocale('en_US'))
184 hunspell_locales['eo'].append(HunspellLocale('eo'))
185 hunspell_locales['es'].append(HunspellLocale('es_ES'))
186 hunspell_locales['et'].append(HunspellLocale('et_EE'))
187 hunspell_locales['eu'].append(HunspellLocale('eu_ES'))
188 hunspell_locales['fa'].append(HunspellLocale('fa_IR'))
189 hunspell_locales['fi'].append(HunspellLocale('fi_FI'))
190 hunspell_locales['fr'].append(HunspellLocale('fr_FR'))
191 hunspell_locales['gd'].append(HunspellLocale('gd_GB'))
192 hunspell_locales['gl'].append(HunspellLocale('gl_ES'))
193 hunspell_locales['he'].append(HunspellLocale('he_IL'))
194 hunspell_locales['hr'].append(HunspellLocale('hr_HR'))
195 hunspell_locales['hu'].append(HunspellLocale('hu_HU'))
196 hunspell_locales['ia'].append(HunspellLocale('ia'))
197 hunspell_locales['id'].append(HunspellLocale('id_ID'))
198 hunspell_locales['it'].append(HunspellLocale('it_IT'))
199 hunspell_locales['ja'].append(HunspellLocale('ja_JP'))
200 hunspell_locales['jv'].append(HunspellLocale('jv_ID'))
201 hunspell_locales['ka'].append(HunspellLocale('ka_GE'))
202 hunspell_locales['ko'].append(HunspellLocale('ko_KR'))
203 hunspell_locales['krl'].append(HunspellLocale('krl_RU'))
204 hunspell_locales['la'].append(HunspellLocale('la'))
205 hunspell_locales['lt'].append(HunspellLocale('lt_LT'))
206 hunspell_locales['mr'].append(HunspellLocale('mr_IN'))
207 hunspell_locales['ms'].append(HunspellLocale('ms_MY'))
208 hunspell_locales['my'].append(HunspellLocale('my_MM'))
209 hunspell_locales['nb'].append(HunspellLocale('nb_NO'))
210 hunspell_locales['nds'].append(HunspellLocale('nds_DE'))
211 hunspell_locales['nl'].append(HunspellLocale('nl_NL'))
212 hunspell_locales['nn'].append(HunspellLocale('nn_NO'))
213 hunspell_locales['oc'].append(HunspellLocale('oc_FR'))
214 hunspell_locales['pl'].append(HunspellLocale('pl_PL'))
215 hunspell_locales['pt'].append(HunspellLocale('pt_PT'))
216 hunspell_locales['ro'].append(HunspellLocale('ro_RO'))
217 hunspell_locales['ru'].append(HunspellLocale('ru_RU'))
218 hunspell_locales['rw'].append(HunspellLocale('rw_RW'))
219 hunspell_locales['si'].append(HunspellLocale('si_LK'))
220 hunspell_locales['sk'].append(HunspellLocale('sk_SK'))
221 hunspell_locales['sl'].append(HunspellLocale('sl_SI'))
222 hunspell_locales['sr'].append(HunspellLocale('sr_RS'))
223 hunspell_locales['sv'].append(HunspellLocale('sv_SE'))
224 hunspell_locales['tr'].append(HunspellLocale('tr_TR'))
225 hunspell_locales['uk'].append(HunspellLocale('uk_UA'))
226 hunspell_locales['vi'].append(HunspellLocale('vi_VN'))
227 hunspell_locales['zh_CN'].append(HunspellLocale('zh_CN'))
228 hunspell_locales['zh_TW'].append(HunspellLocale('zh_TW'))
229 if locale == 'all':
230 print('Looking for Hunspell dictionaries')
231 for locale in hunspell_locales:
232 set_has_hunspell_locale(hunspell_locales[locale][0])
233 else:
234 print('Looking for Hunspell dictionary')
235 set_has_hunspell_locale(hunspell_locales[locale][0])
236
237
238def is_vowel(character):
239 """Helper function for creating inflections of English words."""
240 return character == 'a' or character == 'e' or character == 'i' \
241 or character == 'o' or character == 'u' or character == 'y'
242
243
244def make_english_plural(word):
245 """Create plural forms for nouns.
246
247 This will create a few nonsense entries for irregular plurals, but
248 it's good enough for our purpose. Glossary contains pluralized
249 terms, so we don't add any plural forms for strings ending in 's'.
250
251 """
252 result = ''
253 if not word.endswith('s'):
254 if word.endswith('y') and not is_vowel(word[-2:-1]):
255 result = word[0:-1] + 'ies'
256 elif word.endswith('z') or word.endswith('x') or word.endswith('ch') or word.endswith('sh') or word.endswith('o'):
257 result = word + 'es'
258 else:
259 result = word + 's'
260 return result
261
262
263def make_english_verb_forms(word):
264 """Create inflected forms of an English verb: -ed and -ing forms.
265
266 Will create nonsense for irregular verbs.
267
268 """
269 result = []
270 if word.endswith('e'):
271 result.append(word[0:-1] + 'ing')
272 result.append(word + 'd')
273 elif is_vowel(word[-2:-1]) and not is_vowel(word[-1]):
274 # The consonant is duplicated here if the last syllable is stressed.
275 # We can't detect stress, so we add both variants.
276 result.append(word + word[-1] + 'ing')
277 result.append(word + 'ing')
278 result.append(word + word[-1] + 'ed')
279 result.append(word + 'ed')
280 elif word.endswith('y') and not is_vowel(word[-2:-1]):
281 result.append(word + 'ing')
282 result.append(word[0:-1] + 'ed')
283 else:
284 result.append(word + 'ing')
285 result.append(word + 'ed')
286 # 3rd person s has the same pattern as noun plurals.
287 # We ommitted words ending on s i the plural, so we add them here.
288 if word.endswith('s'):
289 result.append(word + 'es')
290 else:
291 result.append(make_english_plural(word))
292 return result
293
294
295def load_glossary(glossary_file, locale):
296 """Build a glossary from the given Transifex glossary csv file for the
297 given locale."""
298 result = []
299 counter = 0
300 term_index = 0
301 term_comment_index = 0
302 wordclass_index = 0
303 translation_index = 0
304 comment_index = 0
305 for row in read_csv_file(glossary_file):
306 # Detect the column indices
307 if counter == 0:
308 colum_counter = 0
309 for header in row:
310 if header == 'term':
311 term_index = colum_counter
312 elif header == 'comment':
313 term_comment_index = colum_counter
314 elif header == 'pos':
315 wordclass_index = colum_counter
316 elif header == 'translation_' + locale or header == locale:
317 translation_index = colum_counter
318 elif header == 'comment_' + locale:
319 comment_index = colum_counter
320 colum_counter = colum_counter + 1
321 # If there is a translation, parse the entry
322 # We also have some obsolete terms in the glossary that we want to
323 # filter out.
324 elif len(row[translation_index].strip()) > 0 and not row[term_comment_index].startswith('OBSOLETE'):
325 if translation_index == 0:
326 raise Exception(
327 'Locale %s is missing from glossary file.' % locale)
328 if comment_index == 0:
329 raise Exception(
330 'Comment field for locale %s is missing from glossary file.' % locale)
331 entry = GlossaryEntry()
332 entry.terms.append(row[term_index].strip())
333 if row[wordclass_index] == 'Noun':
334 plural = make_english_plural(entry.terms[0])
335 if len(plural) > 0:
336 entry.terms.append(plural)
337 elif row[wordclass_index] == 'Verb':
338 verb_forms = make_english_verb_forms(entry.terms[0])
339 for verb_form in verb_forms:
340 entry.terms.append(verb_form)
341
342 entry.translations.append(row[translation_index].strip())
343
344 # Misuse the comment field to provide a list of inflected forms.
345 # Otherwise, we would get tons of false positive hits in the checks
346 # later on and the translators would have our heads on a platter.
347 delimiter = '|'
348 if len(row[comment_index].strip()) > 1 and delimiter in row[comment_index]:
349 inflections = row[comment_index].split(delimiter)
350 for inflection in inflections:
351 entry.translations.append(inflection.strip())
352
353 result.append(entry)
354 counter = counter + 1
355 return result
356
357
358#############################################################################
359# Term Checking #
360#############################################################################
361
362
363def contains_term(string, term):
364 """Checks whether 'string' contains 'term' as a whole word.
365
366 This check is case-ionsensitive.
367
368 """
369 result = False
370 # Regex is slow, so we do this preliminary check
371 if term.lower() in string.lower():
372 # Now make sure that it's whole words!
373 # We won't want to match "AI" against "again" etc.
374 regex = re.compile('^|(.+\W)' + term + '(\W.+)|$', re.IGNORECASE)
375 result = regex.match(string)
376 return result
377
378
379def source_contains_term(source_to_check, entry, glossary):
380 """Checks if the source string contains the glossary entry while filtering
381 out superstrings from the glossary, e.g. we don't want to check 'arena'
382 against 'battle arena'."""
383 source_to_check = source_to_check.lower()
384 for term in entry.terms:
385 term = term.lower()
386 if term in source_to_check:
387 source_regex = re.compile('.+[\s,.]' + term + '[\s,.].+')
388 if source_regex.match(source_to_check):
389 for entry2 in glossary:
390 if entry.terms[0] != entry2.terms[0]:
391 for term2 in entry2.terms:
392 term2 = term2.lower()
393 if term2 != term and term in term2 and term2 in source_to_check:
394 source_to_check = source_to_check.replace(
395 term2, '')
396 # Check if the source still contains the term to check
397 return contains_term(source_to_check, term)
398 return False
399
400
401def append_hunspell_stems(hunspell_locale, translation):
402 """ Use hunspell to append the stems for terms found = less work for glossary editors.
403 The effectiveness of this check depends on how good the hunspell data is."""
404 try:
405 process = Popen(['hunspell', '-d', hunspell_locale,
406 '-s'], stdout=PIPE, stdin=PIPE)
407 hunspell_result = process.communicate(translation)
408 if hunspell_result[0] != '':
409 translation = ' '.join([translation, hunspell_result[0]])
410 except CalledProcessError:
411 print('Failed to run hunspell for locale: ' + hunspell_locale)
412 return translation
413
414
415def translation_has_term(entry, target):
416 """Verify the target translation against all translation variations from
417 the glossary."""
418 result = False
419 for translation in entry.translations:
420 if contains_term(target, translation):
421 result = True
422 break
423 return result
424
425
426def check_file(csv_file, glossaries, locale, po_file):
427 """Run the actual check."""
428 translations = read_csv_file(csv_file)
429 source_index = 0
430 target_index = 0
431 location_index = 0
432 hits = []
433 counter = 0
434 has_hunspell = True
435 hunspell_locale = get_hunspell_locale(locale)
436 for row in translations:
437 # Detect the column indices
438 if counter == 0:
439 colum_counter = 0
440 for header in row:
441 if header == 'source':
442 source_index = colum_counter
443 elif header == 'target':
444 target_index = colum_counter
445 elif header == 'location':
446 location_index = colum_counter
447 colum_counter = colum_counter + 1
448 else:
449 for entry in glossaries[locale][0]:
450 # Check if the source text contains the glossary term.
451 # Filter out superstrings, e.g. we don't want to check
452 # "arena" against "battle arena"
453 if source_contains_term(row[source_index], entry, glossaries[locale][0]):
454 # Now verify the translation against all translation
455 # variations from the glossary
456 term_found = translation_has_term(entry, row[target_index])
457 # Add Hunspell stems for better matches and try again
458 # We do it here because the Hunspell manipulation is slow.
459 if not term_found and hunspell_locale != '':
460 target_to_check = append_hunspell_stems(
461 hunspell_locale, row[target_index])
462 term_found = translation_has_term(
463 entry, target_to_check)
464 if not term_found:
465 hit = FailedTranslation()
466 hit.source = row[source_index]
467 hit.target = row[target_index]
468 hit.location = row[location_index]
469 hit.term = entry.terms[0]
470 hit.translation = entry.translations[0]
471 hit.locale = locale
472 hit.po_file = po_file
473 hits.append(hit)
474 counter = counter + 1
475 return hits
476
477
478#############################################################################
479# Main Loop #
480#############################################################################
481
482
483def check_translations_with_glossary(input_path, output_path, glossary_file, only_locale):
484 """Main loop.
485
486 Loads the Transifex and Hunspell glossaries, converts all po files
487 for languages that have glossary entries to temporary csv files,
488 runs the check and then reports any hits to csv files.
489
490 """
491 print('Locale: ' + only_locale)
492 temp_path = make_path(output_path, 'temp_glossary')
493 hits = []
494 locale_list = defaultdict(list)
495
496 glossaries = defaultdict(list)
497 load_hunspell_locales(only_locale)
498
499 source_directories = sorted(os.listdir(input_path), key=str.lower)
500 for dirname in source_directories:
501 dirpath = os.path.join(input_path, dirname)
502 if os.path.isdir(dirpath):
503 source_files = sorted(os.listdir(dirpath), key=str.lower)
504 sys.stdout.write("\nChecking text domain '" + dirname + "': ")
505 sys.stdout.flush()
506 failed = 0
507 for source_filename in source_files:
508 po_file = dirpath + '/' + source_filename
509 if source_filename.endswith('.po'):
510 locale = source_filename[0:-3]
511 if only_locale == 'all' or locale == only_locale:
512 # Load the glossary if we haven't seen this locale
513 # before
514 if len(glossaries[locale]) < 1:
515 sys.stdout.write(
516 '\nLoading glossary for ' + locale)
517 glossaries[locale].append(
518 load_glossary(glossary_file, locale))
519 sys.stdout.write(' - %d entries ' %
520 len(glossaries[locale][0]))
521 sys.stdout.flush()
522 # Only bother with locales that have glossary entries
523 if len(glossaries[locale][0]) > 0:
524 sys.stdout.write(locale + ' ')
525 sys.stdout.flush()
526 if len(locale_list[locale]) < 1:
527 locale_list[locale].append(locale)
528 csv_file = os.path.abspath(os.path.join(
529 temp_path, dirname + '_' + locale + '.csv'))
530 # Convert to csv for easy parsing
531 call(['po2csv', '--progress=none', po_file, csv_file])
532
533 # Now run the actual check
534 current_hits = check_file(
535 csv_file, glossaries, locale, dirname)
536 for hit in current_hits:
537 hits.append(hit)
538
539 # The csv file is no longer needed, delete it.
540 os.remove(csv_file)
541
542 hits = sorted(hits, key=lambda FailedTranslation: [
543 FailedTranslation.locale, FailedTranslation.translation])
544 for locale in locale_list:
545 locale_result = '"glossary_term","glossary_translation","source","target","file","location"\n'
546 counter = 0
547 for hit in hits:
548 if hit.locale == locale:
549 row = '"%s","%s","%s","%s","%s","%s"\n' % (
550 hit.term, hit.translation, hit.source, hit.target, hit.po_file, hit.location)
551 locale_result = locale_result + row
552 counter = counter + 1
553 dest_filepath = output_path + '/glossary_check_' + locale + '.csv'
554 with open(dest_filepath, 'wt') as dest_file:
555 dest_file.write(locale_result)
556 # Uncomment this line to print a statistic of the number of hits for each locale
557 # print("%s\t%d"%(locale, counter))
558
559 delete_path(temp_path)
560 return 0
561
562
563def main():
564 """Checks whether we are in the correct directory and everything's there,
565 then runs a glossary check over all PO files."""
566 if len(sys.argv) == 2 or len(sys.argv) == 3:
567 print('Running glossary checks:')
568 else:
569 print(
570 'Usage: glossary_checks.py <relative-path-to-glossary> [locale]')
571 return 1
572
573 try:
574 print('Current time: %s' % time.ctime())
575 # Prepare the paths
576 glossary_file = os.path.abspath(os.path.join(
577 os.path.dirname(__file__), sys.argv[1]))
578 locale = 'all'
579 if len(sys.argv) == 3:
580 locale = sys.argv[2]
581
582 if (not (os.path.exists(glossary_file) and os.path.isfile(glossary_file))):
583 print('There is no glossary file at ' + glossary_file)
584 return 1
585
586 input_path = os.path.abspath(os.path.join(
587 os.path.dirname(__file__), '../po'))
588 output_path = make_path(os.path.dirname(__file__), '../po_validation')
589 result = check_translations_with_glossary(
590 input_path, output_path, glossary_file, locale)
591 print('Current time: %s' % time.ctime())
592 return result
593
594 except Exception:
595 print('Something went wrong:')
596 traceback.print_exc()
597 delete_path(make_path(output_path, 'temp_glossary'))
598 return 1
599
600if __name__ == '__main__':
601 sys.exit(main())

Subscribers

People subscribed via source and target branches

to status/vote changes: