Merge lp:~widelands-dev/widelands/glossary_checks into lp:widelands

Proposed by GunChleoc
Status: Merged
Merged at revision: 8315
Proposed branch: lp:~widelands-dev/widelands/glossary_checks
Merge into: lp:widelands
Diff against target: 605 lines (+601/-0)
1 file modified
utils/glossary_checks.py (+601/-0)
To merge this branch: bzr merge lp:~widelands-dev/widelands/glossary_checks
Reviewer Review Type Date Requested Status
GunChleoc Needs Resubmitting
Review via email: mp+312430@code.launchpad.net

Commit message

Added a Python script to do automated glossary checks for translations. It enlists the help of Hunspell and 'misuses' the Transifex note field in order to reduce noise. Functionality for translators is documented in the wiki:

https://wl.widelands.org/wiki/TranslatingWidelands/#preparing-your-glossary-for-automated-keyword-checks

Description of the change

After the British English fiasco in Build 19, I decided it would be good to have some glossary checks for translations. We are using this kind of check at my workplace, and they help with keeping consistency on big projects.

Downloading the glossary from Transifex can't be automated, so we have to download it manually each time before we do the checks. So, I decided against committing it to the ode base - we won't want to accidentally check against an outdated glossary.

Translators can hack the glossary's comment fields to provide inflected word forms, so in the long run, it won't annoy the translators with false positive hits. For example, for "worker" = "Arbeiter", "workers" = "Arbeitern" can pass the check if a translator has added the relevant data.

I am also using the Hunspell stem function to reduce the noise. This is slow, but any entry that a translator doesn't have to look at needlessly is a good entry.

These check will be a service for the translation teams and NOT mandatory - we can't require of volunteers to go through them. Some of the translators gladly snapped up my last round of validations though, so some will like using this.

To post a comment you must log in.
Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1708. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/181170109.
Appveyor build 1548. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1548.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1728. State: errored. Details: https://travis-ci.org/widelands/widelands/builds/182020101.
Appveyor build 1568. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1568.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1731. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/182101596.
Appveyor build 1571. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1571.

Revision history for this message
GunChleoc (gunchleoc) wrote :

I guess I had quite a few new ideas after submitting this merge request.... should be done now. I have already dogfooded this with my own locale and fixed up a number of translations thanks to this check :)

Will create a zip of the results for the translators so they can check it out.

review: Needs Resubmitting
Revision history for this message
bunnybot (widelandsofficial) wrote :

Bunnybot encountered an error while working on this merge proposal:

HTTP Error 500: Internal Server Error

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1757. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/182873856.
Appveyor build 1597. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1597.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Bunnybot encountered an error while working on this merge proposal:

('The read operation timed out',)

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1757. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/182873856.
Appveyor build 1597. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1597.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Bunnybot encountered an error while working on this merge proposal:

HTTP Error 500: Internal Server Error

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1757. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/182873856.
Appveyor build 1597. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1597.

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 1864. State: failed. Details: https://travis-ci.org/widelands/widelands/builds/194595810.
Appveyor build 1700. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1700.

Revision history for this message
GunChleoc (gunchleoc) wrote :

It's getting a bit annoying to run these from a separate branch. Since none of this affects the Widelands code or translations directly, I'm gonna merge this now.

@bunnybot merge

Revision history for this message
bunnybot (widelandsofficial) wrote :

Continuous integration builds have changed state:

Travis build 2031. State: passed. Details: https://travis-ci.org/widelands/widelands/builds/207665942.
Appveyor build 1700. State: success. Details: https://ci.appveyor.com/project/widelands-dev/widelands/build/_widelands_dev_widelands_glossary_checks-1700.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'utils/glossary_checks.py'
2--- utils/glossary_checks.py 1970-01-01 00:00:00 +0000
3+++ utils/glossary_checks.py 2017-03-04 12:24:08 +0000
4@@ -0,0 +1,601 @@
5+#!/usr/bin/env python
6+# encoding: utf-8
7+
8+"""Runs a glossary check on all po files and writes the check results to
9+po_validation/glossary.
10+
11+You will need to have the Translate Toolkit installed in order for the checks to work:
12+http://toolkit.translatehouse.org/
13+
14+This script also uses hunspell to reduce the number of false positive hits, so
15+install as many of the needed hunspell dictionaries as you can find. This script
16+will inform you about missing hunspell locales.
17+
18+For Debian-based Linux: sudo apt-get install translate-toolkit hunspell hunspell-ar hunspell-bg hunspell-br hunspell-ca hunspell-cs hunspell-da hunspell-de-de hunspell-el hunspell-en-ca hunspell-en-gb hunspell-en-us hunspell-eu hunspell-fr hunspell-gd hunspell-gl hunspell-he hunspell-hr hunspell-hu hunspell-it hunspell-ko hunspell-lt hunspell-nl hunspell-no hunspell-pl hunspell-pt-br hunspell-pt-pt hunspell-ro hunspell-ru hunspell-si hunspell-sk hunspell-sl hunspell-sr hunspell-sv hunspell-uk hunspell-vi
19+
20+You will need to provide an export of the Transifex glossary and specify it at
21+the command line. Make sure to select "Include glossary notes in file" when
22+exporting the csv from Transifex.
23+
24+Translators can 'misuse' their languages' comment field on Transifex to add
25+inflected forms of their glossary translations. We use the delimiter '|' to
26+signal that the field has inflected forms in it. Examples:
27+
28+Source Translation Comment Translation will be matched against
29+------ ----------- ---------------- -----------------------------------
30+sheep sheep Nice, fluffy! 'sheep'
31+ax axe axes| 'axe', 'axes'
32+click click clicking|clicked 'click', 'clicking', 'clicked'
33+click click clicking | clicked 'click', 'clicking', 'clicked'
34+
35+"""
36+
37+from collections import defaultdict
38+from subprocess import call, CalledProcessError, Popen, PIPE
39+import csv
40+import os.path
41+import re
42+import subprocess
43+import sys
44+import time
45+import traceback
46+
47+#############################################################################
48+# Data Containers #
49+#############################################################################
50+
51+
52+class GlossaryEntry:
53+ """An entry in our parsed glossaries."""
54+
55+ def __init__(self):
56+ # Base form of the term, followed by any inflected forms
57+ self.terms = []
58+ # Base form of the translation, followed by any inflected forms
59+ self.translations = []
60+
61+
62+class FailedTranslation:
63+ """Information about a translation that failed a check."""
64+
65+ def __init__(self):
66+ # The locale where the check failed
67+ self.locale = ''
68+ # The po file containing the failed translation
69+ self.po_file = ''
70+ # Source text
71+ self.source = ''
72+ # Target text
73+ self.target = ''
74+ # Location in the source code
75+ self.location = ''
76+ # The glossary term that failed the check
77+ self.term = ''
78+ # The base form of the translated glossary term
79+ self.translation = ''
80+
81+
82+class HunspellLocale:
83+ """A specific locale for Hunspell, plus whether its dictionary is
84+ installed."""
85+
86+ def __init__(self, locale):
87+ # Specific language/country code for Hunspell, e.g. el_GR
88+ self.locale = locale
89+ # Whether a dictionary has been found for the locale
90+ self.is_available = False
91+
92+hunspell_locales = defaultdict(list)
93+""" Hunspell needs specific locales"""
94+
95+#############################################################################
96+# File System Functions #
97+#############################################################################
98+
99+
100+def read_csv_file(filepath):
101+ """Parses a CSV file into a 2-dimensional array."""
102+ result = []
103+ with open(filepath) as csvfile:
104+ csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
105+ for row in csvreader:
106+ result.append(row)
107+ return result
108+
109+
110+def make_path(base_path, subdir):
111+ """Creates the correct form of the path and makes sure that it exists."""
112+ result = os.path.abspath(os.path.join(base_path, subdir))
113+ if not os.path.exists(result):
114+ os.makedirs(result)
115+ return result
116+
117+
118+def delete_path(path):
119+ """Deletes the directory specified by 'path' and all its subdirectories and
120+ file contents."""
121+ if os.path.exists(path) and not os.path.isfile(path):
122+ files = sorted(os.listdir(path), key=str.lower)
123+ for deletefile in files:
124+ deleteme = os.path.abspath(os.path.join(path, deletefile))
125+ if os.path.isfile(deleteme):
126+ try:
127+ os.remove(deleteme)
128+ except Exception:
129+ print('Failed to delete file ' + deleteme)
130+ else:
131+ delete_path(deleteme)
132+ try:
133+ os.rmdir(path)
134+ except Exception:
135+ print('Failed to delete path ' + deleteme)
136+
137+#############################################################################
138+# Glossary Loading #
139+#############################################################################
140+
141+
142+def set_has_hunspell_locale(hunspell_locale):
143+ """Tries calling hunspell with the given locale and returns false if it has
144+ failed."""
145+ try:
146+ process = Popen(['hunspell', '-d', hunspell_locale.locale,
147+ '-s'], stderr=PIPE, stdout=PIPE, stdin=PIPE)
148+ hunspell_result = process.communicate('foo')
149+ if hunspell_result[1] == None:
150+ hunspell_locale.is_available = True
151+ return True
152+ else:
153+ print('Error loading Hunspell dictionary for locale ' +
154+ hunspell_locale.locale + ': ' + hunspell_result[1])
155+ return False
156+
157+ except CalledProcessError:
158+ print('Failed to run hunspell for locale: ' + hunspell_locale.locale)
159+ return False
160+
161+
162+def get_hunspell_locale(locale):
163+ """Returns the corresponding Hunspell locale for this locale, or empty
164+ string if not available."""
165+ if len(hunspell_locales[locale]) == 1 and hunspell_locales[locale][0].is_available:
166+ return hunspell_locales[locale][0].locale
167+ return ''
168+
169+
170+def load_hunspell_locales(locale):
171+ """Registers locales for Hunspell.
172+
173+ Maps a list of generic locales to specific locales and checks which
174+ dictionaries are available. If locale != "all", load only the
175+ dictionary for the given locale.
176+
177+ """
178+ hunspell_locales['bg'].append(HunspellLocale('bg_BG'))
179+ hunspell_locales['br'].append(HunspellLocale('br_FR'))
180+ hunspell_locales['ca'].append(HunspellLocale('ca_ES'))
181+ hunspell_locales['da'].append(HunspellLocale('da_DK'))
182+ hunspell_locales['cs'].append(HunspellLocale('cs_CZ'))
183+ hunspell_locales['de'].append(HunspellLocale('de_DE'))
184+ hunspell_locales['el'].append(HunspellLocale('el_GR'))
185+ hunspell_locales['en_CA'].append(HunspellLocale('en_CA'))
186+ hunspell_locales['en_GB'].append(HunspellLocale('en_GB'))
187+ hunspell_locales['en_US'].append(HunspellLocale('en_US'))
188+ hunspell_locales['eo'].append(HunspellLocale('eo'))
189+ hunspell_locales['es'].append(HunspellLocale('es_ES'))
190+ hunspell_locales['et'].append(HunspellLocale('et_EE'))
191+ hunspell_locales['eu'].append(HunspellLocale('eu_ES'))
192+ hunspell_locales['fa'].append(HunspellLocale('fa_IR'))
193+ hunspell_locales['fi'].append(HunspellLocale('fi_FI'))
194+ hunspell_locales['fr'].append(HunspellLocale('fr_FR'))
195+ hunspell_locales['gd'].append(HunspellLocale('gd_GB'))
196+ hunspell_locales['gl'].append(HunspellLocale('gl_ES'))
197+ hunspell_locales['he'].append(HunspellLocale('he_IL'))
198+ hunspell_locales['hr'].append(HunspellLocale('hr_HR'))
199+ hunspell_locales['hu'].append(HunspellLocale('hu_HU'))
200+ hunspell_locales['ia'].append(HunspellLocale('ia'))
201+ hunspell_locales['id'].append(HunspellLocale('id_ID'))
202+ hunspell_locales['it'].append(HunspellLocale('it_IT'))
203+ hunspell_locales['ja'].append(HunspellLocale('ja_JP'))
204+ hunspell_locales['jv'].append(HunspellLocale('jv_ID'))
205+ hunspell_locales['ka'].append(HunspellLocale('ka_GE'))
206+ hunspell_locales['ko'].append(HunspellLocale('ko_KR'))
207+ hunspell_locales['krl'].append(HunspellLocale('krl_RU'))
208+ hunspell_locales['la'].append(HunspellLocale('la'))
209+ hunspell_locales['lt'].append(HunspellLocale('lt_LT'))
210+ hunspell_locales['mr'].append(HunspellLocale('mr_IN'))
211+ hunspell_locales['ms'].append(HunspellLocale('ms_MY'))
212+ hunspell_locales['my'].append(HunspellLocale('my_MM'))
213+ hunspell_locales['nb'].append(HunspellLocale('nb_NO'))
214+ hunspell_locales['nds'].append(HunspellLocale('nds_DE'))
215+ hunspell_locales['nl'].append(HunspellLocale('nl_NL'))
216+ hunspell_locales['nn'].append(HunspellLocale('nn_NO'))
217+ hunspell_locales['oc'].append(HunspellLocale('oc_FR'))
218+ hunspell_locales['pl'].append(HunspellLocale('pl_PL'))
219+ hunspell_locales['pt'].append(HunspellLocale('pt_PT'))
220+ hunspell_locales['ro'].append(HunspellLocale('ro_RO'))
221+ hunspell_locales['ru'].append(HunspellLocale('ru_RU'))
222+ hunspell_locales['rw'].append(HunspellLocale('rw_RW'))
223+ hunspell_locales['si'].append(HunspellLocale('si_LK'))
224+ hunspell_locales['sk'].append(HunspellLocale('sk_SK'))
225+ hunspell_locales['sl'].append(HunspellLocale('sl_SI'))
226+ hunspell_locales['sr'].append(HunspellLocale('sr_RS'))
227+ hunspell_locales['sv'].append(HunspellLocale('sv_SE'))
228+ hunspell_locales['tr'].append(HunspellLocale('tr_TR'))
229+ hunspell_locales['uk'].append(HunspellLocale('uk_UA'))
230+ hunspell_locales['vi'].append(HunspellLocale('vi_VN'))
231+ hunspell_locales['zh_CN'].append(HunspellLocale('zh_CN'))
232+ hunspell_locales['zh_TW'].append(HunspellLocale('zh_TW'))
233+ if locale == 'all':
234+ print('Looking for Hunspell dictionaries')
235+ for locale in hunspell_locales:
236+ set_has_hunspell_locale(hunspell_locales[locale][0])
237+ else:
238+ print('Looking for Hunspell dictionary')
239+ set_has_hunspell_locale(hunspell_locales[locale][0])
240+
241+
242+def is_vowel(character):
243+ """Helper function for creating inflections of English words."""
244+ return character == 'a' or character == 'e' or character == 'i' \
245+ or character == 'o' or character == 'u' or character == 'y'
246+
247+
248+def make_english_plural(word):
249+ """Create plural forms for nouns.
250+
251+ This will create a few nonsense entries for irregular plurals, but
252+ it's good enough for our purpose. Glossary contains pluralized
253+ terms, so we don't add any plural forms for strings ending in 's'.
254+
255+ """
256+ result = ''
257+ if not word.endswith('s'):
258+ if word.endswith('y') and not is_vowel(word[-2:-1]):
259+ result = word[0:-1] + 'ies'
260+ elif word.endswith('z') or word.endswith('x') or word.endswith('ch') or word.endswith('sh') or word.endswith('o'):
261+ result = word + 'es'
262+ else:
263+ result = word + 's'
264+ return result
265+
266+
267+def make_english_verb_forms(word):
268+ """Create inflected forms of an English verb: -ed and -ing forms.
269+
270+ Will create nonsense for irregular verbs.
271+
272+ """
273+ result = []
274+ if word.endswith('e'):
275+ result.append(word[0:-1] + 'ing')
276+ result.append(word + 'd')
277+ elif is_vowel(word[-2:-1]) and not is_vowel(word[-1]):
278+ # The consonant is duplicated here if the last syllable is stressed.
279+ # We can't detect stress, so we add both variants.
280+ result.append(word + word[-1] + 'ing')
281+ result.append(word + 'ing')
282+ result.append(word + word[-1] + 'ed')
283+ result.append(word + 'ed')
284+ elif word.endswith('y') and not is_vowel(word[-2:-1]):
285+ result.append(word + 'ing')
286+ result.append(word[0:-1] + 'ed')
287+ else:
288+ result.append(word + 'ing')
289+ result.append(word + 'ed')
290+ # 3rd person s has the same pattern as noun plurals.
291+ # We ommitted words ending on s i the plural, so we add them here.
292+ if word.endswith('s'):
293+ result.append(word + 'es')
294+ else:
295+ result.append(make_english_plural(word))
296+ return result
297+
298+
299+def load_glossary(glossary_file, locale):
300+ """Build a glossary from the given Transifex glossary csv file for the
301+ given locale."""
302+ result = []
303+ counter = 0
304+ term_index = 0
305+ term_comment_index = 0
306+ wordclass_index = 0
307+ translation_index = 0
308+ comment_index = 0
309+ for row in read_csv_file(glossary_file):
310+ # Detect the column indices
311+ if counter == 0:
312+ colum_counter = 0
313+ for header in row:
314+ if header == 'term':
315+ term_index = colum_counter
316+ elif header == 'comment':
317+ term_comment_index = colum_counter
318+ elif header == 'pos':
319+ wordclass_index = colum_counter
320+ elif header == 'translation_' + locale or header == locale:
321+ translation_index = colum_counter
322+ elif header == 'comment_' + locale:
323+ comment_index = colum_counter
324+ colum_counter = colum_counter + 1
325+ # If there is a translation, parse the entry
326+ # We also have some obsolete terms in the glossary that we want to
327+ # filter out.
328+ elif len(row[translation_index].strip()) > 0 and not row[term_comment_index].startswith('OBSOLETE'):
329+ if translation_index == 0:
330+ raise Exception(
331+ 'Locale %s is missing from glossary file.' % locale)
332+ if comment_index == 0:
333+ raise Exception(
334+ 'Comment field for locale %s is missing from glossary file.' % locale)
335+ entry = GlossaryEntry()
336+ entry.terms.append(row[term_index].strip())
337+ if row[wordclass_index] == 'Noun':
338+ plural = make_english_plural(entry.terms[0])
339+ if len(plural) > 0:
340+ entry.terms.append(plural)
341+ elif row[wordclass_index] == 'Verb':
342+ verb_forms = make_english_verb_forms(entry.terms[0])
343+ for verb_form in verb_forms:
344+ entry.terms.append(verb_form)
345+
346+ entry.translations.append(row[translation_index].strip())
347+
348+ # Misuse the comment field to provide a list of inflected forms.
349+ # Otherwise, we would get tons of false positive hits in the checks
350+ # later on and the translators would have our heads on a platter.
351+ delimiter = '|'
352+ if len(row[comment_index].strip()) > 1 and delimiter in row[comment_index]:
353+ inflections = row[comment_index].split(delimiter)
354+ for inflection in inflections:
355+ entry.translations.append(inflection.strip())
356+
357+ result.append(entry)
358+ counter = counter + 1
359+ return result
360+
361+
362+#############################################################################
363+# Term Checking #
364+#############################################################################
365+
366+
367+def contains_term(string, term):
368+ """Checks whether 'string' contains 'term' as a whole word.
369+
370+ This check is case-ionsensitive.
371+
372+ """
373+ result = False
374+ # Regex is slow, so we do this preliminary check
375+ if term.lower() in string.lower():
376+ # Now make sure that it's whole words!
377+ # We won't want to match "AI" against "again" etc.
378+ regex = re.compile('^|(.+\W)' + term + '(\W.+)|$', re.IGNORECASE)
379+ result = regex.match(string)
380+ return result
381+
382+
383+def source_contains_term(source_to_check, entry, glossary):
384+ """Checks if the source string contains the glossary entry while filtering
385+ out superstrings from the glossary, e.g. we don't want to check 'arena'
386+ against 'battle arena'."""
387+ source_to_check = source_to_check.lower()
388+ for term in entry.terms:
389+ term = term.lower()
390+ if term in source_to_check:
391+ source_regex = re.compile('.+[\s,.]' + term + '[\s,.].+')
392+ if source_regex.match(source_to_check):
393+ for entry2 in glossary:
394+ if entry.terms[0] != entry2.terms[0]:
395+ for term2 in entry2.terms:
396+ term2 = term2.lower()
397+ if term2 != term and term in term2 and term2 in source_to_check:
398+ source_to_check = source_to_check.replace(
399+ term2, '')
400+ # Check if the source still contains the term to check
401+ return contains_term(source_to_check, term)
402+ return False
403+
404+
405+def append_hunspell_stems(hunspell_locale, translation):
406+ """ Use hunspell to append the stems for terms found = less work for glossary editors.
407+ The effectiveness of this check depends on how good the hunspell data is."""
408+ try:
409+ process = Popen(['hunspell', '-d', hunspell_locale,
410+ '-s'], stdout=PIPE, stdin=PIPE)
411+ hunspell_result = process.communicate(translation)
412+ if hunspell_result[0] != '':
413+ translation = ' '.join([translation, hunspell_result[0]])
414+ except CalledProcessError:
415+ print('Failed to run hunspell for locale: ' + hunspell_locale)
416+ return translation
417+
418+
419+def translation_has_term(entry, target):
420+ """Verify the target translation against all translation variations from
421+ the glossary."""
422+ result = False
423+ for translation in entry.translations:
424+ if contains_term(target, translation):
425+ result = True
426+ break
427+ return result
428+
429+
430+def check_file(csv_file, glossaries, locale, po_file):
431+ """Run the actual check."""
432+ translations = read_csv_file(csv_file)
433+ source_index = 0
434+ target_index = 0
435+ location_index = 0
436+ hits = []
437+ counter = 0
438+ has_hunspell = True
439+ hunspell_locale = get_hunspell_locale(locale)
440+ for row in translations:
441+ # Detect the column indices
442+ if counter == 0:
443+ colum_counter = 0
444+ for header in row:
445+ if header == 'source':
446+ source_index = colum_counter
447+ elif header == 'target':
448+ target_index = colum_counter
449+ elif header == 'location':
450+ location_index = colum_counter
451+ colum_counter = colum_counter + 1
452+ else:
453+ for entry in glossaries[locale][0]:
454+ # Check if the source text contains the glossary term.
455+ # Filter out superstrings, e.g. we don't want to check
456+ # "arena" against "battle arena"
457+ if source_contains_term(row[source_index], entry, glossaries[locale][0]):
458+ # Now verify the translation against all translation
459+ # variations from the glossary
460+ term_found = translation_has_term(entry, row[target_index])
461+ # Add Hunspell stems for better matches and try again
462+ # We do it here because the Hunspell manipulation is slow.
463+ if not term_found and hunspell_locale != '':
464+ target_to_check = append_hunspell_stems(
465+ hunspell_locale, row[target_index])
466+ term_found = translation_has_term(
467+ entry, target_to_check)
468+ if not term_found:
469+ hit = FailedTranslation()
470+ hit.source = row[source_index]
471+ hit.target = row[target_index]
472+ hit.location = row[location_index]
473+ hit.term = entry.terms[0]
474+ hit.translation = entry.translations[0]
475+ hit.locale = locale
476+ hit.po_file = po_file
477+ hits.append(hit)
478+ counter = counter + 1
479+ return hits
480+
481+
482+#############################################################################
483+# Main Loop #
484+#############################################################################
485+
486+
487+def check_translations_with_glossary(input_path, output_path, glossary_file, only_locale):
488+ """Main loop.
489+
490+ Loads the Transifex and Hunspell glossaries, converts all po files
491+ for languages that have glossary entries to temporary csv files,
492+ runs the check and then reports any hits to csv files.
493+
494+ """
495+ print('Locale: ' + only_locale)
496+ temp_path = make_path(output_path, 'temp_glossary')
497+ hits = []
498+ locale_list = defaultdict(list)
499+
500+ glossaries = defaultdict(list)
501+ load_hunspell_locales(only_locale)
502+
503+ source_directories = sorted(os.listdir(input_path), key=str.lower)
504+ for dirname in source_directories:
505+ dirpath = os.path.join(input_path, dirname)
506+ if os.path.isdir(dirpath):
507+ source_files = sorted(os.listdir(dirpath), key=str.lower)
508+ sys.stdout.write("\nChecking text domain '" + dirname + "': ")
509+ sys.stdout.flush()
510+ failed = 0
511+ for source_filename in source_files:
512+ po_file = dirpath + '/' + source_filename
513+ if source_filename.endswith('.po'):
514+ locale = source_filename[0:-3]
515+ if only_locale == 'all' or locale == only_locale:
516+ # Load the glossary if we haven't seen this locale
517+ # before
518+ if len(glossaries[locale]) < 1:
519+ sys.stdout.write(
520+ '\nLoading glossary for ' + locale)
521+ glossaries[locale].append(
522+ load_glossary(glossary_file, locale))
523+ sys.stdout.write(' - %d entries ' %
524+ len(glossaries[locale][0]))
525+ sys.stdout.flush()
526+ # Only bother with locales that have glossary entries
527+ if len(glossaries[locale][0]) > 0:
528+ sys.stdout.write(locale + ' ')
529+ sys.stdout.flush()
530+ if len(locale_list[locale]) < 1:
531+ locale_list[locale].append(locale)
532+ csv_file = os.path.abspath(os.path.join(
533+ temp_path, dirname + '_' + locale + '.csv'))
534+ # Convert to csv for easy parsing
535+ call(['po2csv', '--progress=none', po_file, csv_file])
536+
537+ # Now run the actual check
538+ current_hits = check_file(
539+ csv_file, glossaries, locale, dirname)
540+ for hit in current_hits:
541+ hits.append(hit)
542+
543+ # The csv file is no longer needed, delete it.
544+ os.remove(csv_file)
545+
546+ hits = sorted(hits, key=lambda FailedTranslation: [
547+ FailedTranslation.locale, FailedTranslation.translation])
548+ for locale in locale_list:
549+ locale_result = '"glossary_term","glossary_translation","source","target","file","location"\n'
550+ counter = 0
551+ for hit in hits:
552+ if hit.locale == locale:
553+ row = '"%s","%s","%s","%s","%s","%s"\n' % (
554+ hit.term, hit.translation, hit.source, hit.target, hit.po_file, hit.location)
555+ locale_result = locale_result + row
556+ counter = counter + 1
557+ dest_filepath = output_path + '/glossary_check_' + locale + '.csv'
558+ with open(dest_filepath, 'wt') as dest_file:
559+ dest_file.write(locale_result)
560+ # Uncomment this line to print a statistic of the number of hits for each locale
561+ # print("%s\t%d"%(locale, counter))
562+
563+ delete_path(temp_path)
564+ return 0
565+
566+
567+def main():
568+ """Checks whether we are in the correct directory and everything's there,
569+ then runs a glossary check over all PO files."""
570+ if len(sys.argv) == 2 or len(sys.argv) == 3:
571+ print('Running glossary checks:')
572+ else:
573+ print(
574+ 'Usage: glossary_checks.py <relative-path-to-glossary> [locale]')
575+ return 1
576+
577+ try:
578+ print('Current time: %s' % time.ctime())
579+ # Prepare the paths
580+ glossary_file = os.path.abspath(os.path.join(
581+ os.path.dirname(__file__), sys.argv[1]))
582+ locale = 'all'
583+ if len(sys.argv) == 3:
584+ locale = sys.argv[2]
585+
586+ if (not (os.path.exists(glossary_file) and os.path.isfile(glossary_file))):
587+ print('There is no glossary file at ' + glossary_file)
588+ return 1
589+
590+ input_path = os.path.abspath(os.path.join(
591+ os.path.dirname(__file__), '../po'))
592+ output_path = make_path(os.path.dirname(__file__), '../po_validation')
593+ result = check_translations_with_glossary(
594+ input_path, output_path, glossary_file, locale)
595+ print('Current time: %s' % time.ctime())
596+ return result
597+
598+ except Exception:
599+ print('Something went wrong:')
600+ traceback.print_exc()
601+ delete_path(make_path(output_path, 'temp_glossary'))
602+ return 1
603+
604+if __name__ == '__main__':
605+ sys.exit(main())

Subscribers

People subscribed via source and target branches

to status/vote changes: