Merge lp:~max-rabkin/ibid/google-translate into lp:~ibid-core/ibid/old-trunk-1.6

Proposed by Max Rabkin
Status: Merged
Approved by: Jonathan Hitchcock
Approved revision: not available
Merged at revision: 821
Proposed branch: lp:~max-rabkin/ibid/google-translate
Merge into: lp:~ibid-core/ibid/old-trunk-1.6
Diff against target: 182 lines (+59/-73)
1 file modified
ibid/plugins/google.py (+59/-73)
To merge this branch: bzr merge lp:~max-rabkin/ibid/google-translate
Reviewer Review Type Date Requested Status
Jonathan Hitchcock Approve
Michael Gorven Approve
Stefano Rivera Approve
Review via email: mp+16691@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Max Rabkin (max-rabkin) wrote :

Using a hard-coded language list means we can tell users which languages we support, and we can use a normal @match to extract the arguments. It also means we use Google's language codes, which are not the same as ISO 639-1: they use "iw" for Hebrew instead of "he" (but accept both), and other functions in the Language API use the ISO 639-2 code for Cherokee (there is no 639-1 code). Also, we can read the languages right there in the code, so no more surprises like Greek.

Of course, there are the usual disadvantages of hard-coded values: basically, when Google adds languages, we'll have to update the list.

Revision history for this message
Stefano Rivera (stefanor) wrote :

I see a conflict in the diff below.

lp:~max-rabkin/ibid/google-translate updated
827. By Max Rabkin

merge

Revision history for this message
Max Rabkin (max-rabkin) wrote :

On Thu, Dec 31, 2009 at 7:58 PM, Stefano Rivera <email address hidden> wrote:
> I see a conflict in the diff below.

Thanks, fixed.

Revision history for this message
Stefano Rivera (stefanor) wrote :

Your merge request also seems to modify ibid.ini

lp:~max-rabkin/ibid/google-translate updated
828. By Max Rabkin

undo changes to ibid.ini

Revision history for this message
Max Rabkin (max-rabkin) wrote :

> Your merge request also seems to modify ibid.ini

Gah, fixed.

Revision history for this message
Stefano Rivera (stefanor) wrote :

ibid/plugins/google.py:1: 'codecs' imported but unused
ibid/plugins/google.py:11: 'cacheable_download' imported but unused
ibid/plugins/google.py:13: 'get_html_parse_tree' imported but unused

Besides those I approve

review: Approve
Revision history for this message
Michael Gorven (mgorven) wrote :

Looks fine.
 review approve

review: Approve
Revision history for this message
Jonathan Hitchcock (vhata) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'ibid/plugins/google.py'
--- ibid/plugins/google.py 2009-12-30 21:21:20 +0000
+++ ibid/plugins/google.py 2010-01-02 11:17:13 +0000
@@ -9,6 +9,8 @@
9from ibid.plugins import Processor, match9from ibid.plugins import Processor, match
10from ibid.config import Option, IntOption10from ibid.config import Option, IntOption
11from ibid.utils import decode_htmlentities, json_webservice, cacheable_download11from ibid.utils import decode_htmlentities, json_webservice, cacheable_download
12from ibid.utils import human_join
13from ibid.utils.html import get_html_parse_tree
1214
13help = {'google': u'Retrieves results from Google and Google Calculator.'}15help = {'google': u'Retrieves results from Google and Google Calculator.'}
1416
@@ -135,24 +137,66 @@
135137
136 api_key = Option('api_key', 'Your Google API Key (optional)', None)138 api_key = Option('api_key', 'Your Google API Key (optional)', None)
137 referer = Option('referer', 'The referer string to use (API searches)', default_referer)139 referer = Option('referer', 'The referer string to use (API searches)', default_referer)
138 dest_lang = Option('dest_lang', 'Destination language when none is specified', 'en')140 dest_lang = Option('dest_lang', 'Destination language when none is specified', 'english')
139141
140 chain_length = IntOption('chain_length', 'Maximum length of translation chains', 10)142 chain_length = IntOption('chain_length', 'Maximum length of translation chains', 10)
141143
142 @match(r'^translate\s+(.*)$')144 lang_names = {'afrikaans':'af', 'albanian':'sq', 'arabic':'ar',
143 def translate (self, event, data):145 'belarusian':'be', 'bulgarian':'bg', 'catalan':'ca',
146 'chinese':'zh', 'chinese simplified':'zh-cn',
147 'chinese traditional':'zh-tw', 'croatian':'hr', 'czech':'cs',
148 'danish':'da', 'dutch':'nl', 'english':'en', 'estonian':'et',
149 'filipino':'tl', 'finnish':'fi', 'french':'fr',
150 'galacian':'gl', 'german':'de', 'greek':'el', 'hebrew':'iw',
151 'hindi':'hi', 'hungarian':'hu', 'icelandic':'is',
152 'indonesian':'id', 'irish':'ga', 'italian':'it',
153 'japanese':'ja', 'korean': 'ko', 'latvian':'lv',
154 'lithuanian':'lt', 'macedonian':'mk', 'malay':'ms',
155 'maltese':'mt', 'norwegian':'no', 'persian':'fa',
156 'polish':'pl', 'portuguese':'pt', 'romanian':'ro',
157 'russian': 'ru', 'serbian':'sr', 'slovak':'sk',
158 'slovenian':'sl', 'spanish':'es', 'swahili':'sw',
159 'swedish':'sv', 'thai':'th', 'turkish':'tr', 'ukrainian':'uk',
160 'uzbek': 'uz', 'vietnamese':'vi', 'welsh':'cy',
161 'yiddish':'yi'}
162
163 alt_lang_names = {'simplified':'zh-CN', 'simplified chinese':'zh-CN',
164 'traditional':'zh-TW', 'traditional chinese':'zh-TW',
165 'bokmal':'no', 'norwegian bokmal':'no',
166 u'bokm\N{LATIN SMALL LETTER A WITH RING ABOVE}l':'no',
167 u'norwegian bokm\N{LATIN SMALL LETTER A WITH RING ABOVE}l':
168 'no',
169 'farsi':'fa'}
170
171 LANG_REGEX = '|'.join(lang_names.keys() + lang_names.values() +
172 alt_lang_names.keys())
173
174 @match(r'^(?:translation\s*)?languages$')
175 def languages (self, event):
176 event.addresponse(human_join(sorted(self.lang_names.keys())))
177
178 @match(r'^translate\s+(.*?)(?:\s+from\s+(' + LANG_REGEX + r'))?'
179 r'(?:\s+(?:in)?to\s+(' + LANG_REGEX + r'))?$')
180 def translate (self, event, text, src_lang, dest_lang):
181 dest_lang = self.language_code(dest_lang or self.dest_lang)
182 src_lang = self.language_code(src_lang or '')
183
144 try:184 try:
145 translated = self._translate(event, *self._parse_request(data))[0]185 translated = self._translate(event, text, src_lang, dest_lang)[0]
146 event.addresponse(translated)186 event.addresponse(translated)
147 except TranslationException, e:187 except TranslationException, e:
148 event.addresponse(u"I couldn't translate that: %s.", unicode(e))188 event.addresponse(u"I couldn't translate that: %s.", unicode(e))
149189
150 @match(r'^translation[-\s]*(?:chain|party)\s+(.*)$')190 @match(r'^translation[-\s]*(?:chain|party)\s+(.*?)'
151 def translation_chain (self, event, data):191 r'(?:\s+from\s+(' + LANG_REGEX + r'))?'
192 r'(?:\s+(?:in)?to\s+(' + LANG_REGEX + r'))?$')
193 def translation_chain (self, event, phrase, src_lang, dest_lang):
152 if self.chain_length < 1:194 if self.chain_length < 1:
153 event.addresponse(u"I'm not allowed to play translation games.")195 event.addresponse(u"I'm not allowed to play translation games.")
154 try:196 try:
155 phrase, src_lang, dest_lang = self._parse_request(data)197 dest_lang = self.language_code(dest_lang or self.dest_lang)
198 src_lang = self.language_code(src_lang or '')
199
156 chain = [phrase]200 chain = [phrase]
157 for i in range(self.chain_length):201 for i in range(self.chain_length):
158 phrase, src_lang = self._translate(event, phrase,202 phrase, src_lang = self._translate(event, phrase,
@@ -167,38 +211,6 @@
167 except TranslationException, e:211 except TranslationException, e:
168 event.addresponse(u"I couldn't translate that: %s.", unicode(e))212 event.addresponse(u"I couldn't translate that: %s.", unicode(e))
169213
170 def _parse_request (self, data):
171 if not hasattr(self, 'lang_names'):
172 self._make_language_dict()
173
174 from_re = r'\s+from\s+(?P<from>[-()\s\w]+?)'
175 to_re = r'\s+(?:in)?to\s+(?P<to>[-()\s\w]+?)'
176
177 res = [(from_re, to_re), (to_re, from_re), (to_re,), (from_re,), ()]
178
179 # Try all possible specifications of source and target language until we
180 # find a valid one.
181 for pat in res:
182 pat = '(?P<text>.*)' + ''.join(pat) + '\s*$'
183 m = re.match(pat, data, re.IGNORECASE | re.UNICODE | re.DOTALL)
184 if m:
185 dest_lang = m.groupdict().get('to')
186 src_lang = m.groupdict().get('from')
187 try:
188 if dest_lang:
189 dest_lang = self.language_code(dest_lang)
190 else:
191 dest_lang = self.dest_lang
192
193 if src_lang:
194 src_lang = self.language_code(src_lang)
195 else:
196 src_lang = ''
197
198 return (m.group('text'), src_lang, dest_lang)
199 except UnknownLanguageException:
200 continue
201
202 def _translate (self, event, phrase, src_lang, dest_lang):214 def _translate (self, event, phrase, src_lang, dest_lang):
203 params = {215 params = {
204 'v': '1.0',216 'v': '1.0',
@@ -234,48 +246,22 @@
234246
235 raise TranslationException(msg)247 raise TranslationException(msg)
236248
237 def _make_language_dict (self):
238 self.lang_names = d = {}
239
240 filename = cacheable_download('http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt',
241 'google/ISO-639-2_utf-8.txt')
242 f = codecs.open(filename, 'rU', 'utf-8')
243 for line in f:
244 code2B, code2T, code1, englishNames, frenchNames = line.split('|')
245
246 # Identify languages by ISO 639-1 code if it exists; otherwise use
247 # ISO 639-2 (B). Google currently only translates languages with -1
248 # codes, but will may use -2 (B) codes in the future.
249 ident = code1 or code2B
250
251 d[code2B] = d[code2T] = d[code1] = ident
252 for name in englishNames.lower().split(';'):
253 d[name] = ident
254
255 del d['']
256
257 def language_code (self, name):249 def language_code (self, name):
258 """Convert a name to a language code.250 """Convert a name to a language code."""
259
260 Caller must call _make_language_dict first."""
261251
262 name = name.lower()252 name = name.lower()
263253
264 m = re.match('^([a-z]{2})(?:-[a-z]{2})?$', name)254 if name == '':
265 if m and m.group(1) in self.lang_names:
266 return name255 return name
267 if 'simplified' in name:
268 return 'zh-CN'
269 if 'traditional' in name:
270 return 'zh-TW'
271 if re.search(u'bokm[a\N{LATIN SMALL LETTER A WITH RING ABOVE}]l', name):
272 # what Google calls Norwegian seems to be Bokmal
273 return 'no'
274256
275 try:257 try:
276 return self.lang_names[name]258 return self.lang_names.get(name) or self.alt_lang_names[name]
277 except KeyError:259 except KeyError:
278 raise UnknownLanguageException260 m = re.match('^([a-z]{2,3})(?:-[a-z]{2})?$', name)
261 if m and m.group(1) in self.lang_names.values():
262 return name
263 else:
264 raise UnknownLanguageException
279265
280# This Plugin uses code from youtube-dl266# This Plugin uses code from youtube-dl
281# Copyright (c) 2006-2008 Ricardo Garcia Gonzalez267# Copyright (c) 2006-2008 Ricardo Garcia Gonzalez

Subscribers

People subscribed via source and target branches