Merge lp:~jtv/launchpad/scale-message-sharing-migration into lp:launchpad
- scale-message-sharing-migration
- Merge into devel
Proposed by
Jeroen T. Vermeulen
Status: | Merged |
---|---|
Approved by: | Brad Crittenden |
Approved revision: | no longer in the source branch. |
Merged at revision: | not available |
Proposed branch: | lp:~jtv/launchpad/scale-message-sharing-migration |
Merge into: | lp:launchpad |
Diff against target: |
689 lines 4 files modified
lib/lp/translations/model/potmsgset.py (+6/-5) lib/lp/translations/model/translationmessage.py (+3/-3) lib/lp/translations/scripts/message_sharing_migration.py (+164/-70) lib/lp/translations/scripts/tests/test_message_sharing_migration.py (+95/-89) |
To merge this branch: | bzr merge lp:~jtv/launchpad/scale-message-sharing-migration |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Brad Crittenden (community) | code | Approve | |
Review via email: mp+13961@code.launchpad.net |
Commit message
Memory-footprint cuts for message sharing migration script.
Description of the change
To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote : | # |
Revision history for this message
Brad Crittenden (bac) wrote : | # |
Hi Jeroen,
Thanks for this branch. Other than the following issues this branch looks great.
* As discussed on IRC, the parser option around 67 needs an action and a better help description to remove the ambiguous 'some'.
* Remove the assignment at line 158.
* The comment at line 291 makes no sense as written.
review:
Approve
(code)
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'lib/lp/translations/model/potmsgset.py' |
2 | --- lib/lp/translations/model/potmsgset.py 2009-10-22 15:51:58 +0000 |
3 | +++ lib/lp/translations/model/potmsgset.py 2009-10-28 14:58:15 +0000 |
4 | @@ -15,7 +15,7 @@ |
5 | from zope.component import getUtility |
6 | |
7 | from sqlobject import ForeignKey, IntCol, StringCol, SQLObjectNotFound |
8 | -from storm.store import EmptyResultSet |
9 | +from storm.store import EmptyResultSet, Store |
10 | |
11 | from canonical.config import config |
12 | from canonical.database.constants import DEFAULT, UTC_NOW |
13 | @@ -58,7 +58,7 @@ |
14 | u'translator_credits': (None, TranslationCreditsType.GNOME), |
15 | |
16 | # KDE credits messages. |
17 | - u'Your emails': |
18 | + u'Your emails': |
19 | (u'EMAIL OF TRANSLATORS', TranslationCreditsType.KDE_EMAILS), |
20 | u'Your names': |
21 | (u'NAME OF TRANSLATORS', TranslationCreditsType.KDE_NAMES), |
22 | @@ -66,7 +66,7 @@ |
23 | # Old KDE credits messages. |
24 | u'_: EMAIL OF TRANSLATORS\nYour emails': |
25 | (None, TranslationCreditsType.KDE_EMAILS), |
26 | - u'_: NAME OF TRANSLATORS\nYour names': |
27 | + u'_: NAME OF TRANSLATORS\nYour names': |
28 | (None, TranslationCreditsType.KDE_NAMES), |
29 | } |
30 | |
31 | @@ -1056,7 +1056,7 @@ |
32 | if self.is_translation_credit: |
33 | translation = self.getSharedTranslationMessage(pofile.language) |
34 | if translation is None: |
35 | - message = self.updateTranslation(pofile, pofile.owner, |
36 | + message = self.updateTranslation(pofile, pofile.owner, |
37 | [credits_message_str], False, |
38 | datetime.datetime.now(pytz.UTC), |
39 | allow_credits=True) |
40 | @@ -1104,7 +1104,8 @@ |
41 | |
42 | def getAllTranslationMessages(self): |
43 | """See `IPOTMsgSet`.""" |
44 | - return TranslationMessage.selectBy(potmsgset=self, orderBy=['id']) |
45 | + return Store.of(self).find( |
46 | + TranslationMessage, TranslationMessage.potmsgset == self) |
47 | |
48 | def getAllTranslationTemplateItems(self): |
49 | """See `IPOTMsgSet`.""" |
50 | |
51 | === modified file 'lib/lp/translations/model/translationmessage.py' |
52 | --- lib/lp/translations/model/translationmessage.py 2009-07-19 04:41:14 +0000 |
53 | +++ lib/lp/translations/model/translationmessage.py 2009-10-28 14:58:15 +0000 |
54 | @@ -386,7 +386,7 @@ |
55 | |
56 | for form in range(TranslationConstants.MAX_PLURAL_FORMS): |
57 | msgstr_name = 'msgstr%d' % form |
58 | - msgstr = getattr(self, msgstr_name) |
59 | + msgstr = getattr(self, 'msgstr%dID' % form) |
60 | if msgstr is None: |
61 | form_clause = "%s IS NULL" % msgstr_name |
62 | else: |
63 | @@ -454,10 +454,10 @@ |
64 | """See `ITranslationMessage`.""" |
65 | store = Store.of(self) |
66 | |
67 | - forms_match = (TranslationMessage.msgstr0 == self.msgstr0) |
68 | + forms_match = (TranslationMessage.msgstr0ID == self.msgstr0ID) |
69 | for form in xrange(1, TranslationConstants.MAX_PLURAL_FORMS): |
70 | form_name = 'msgstr%d' % form |
71 | - form_value = getattr(self, form_name) |
72 | + form_value = getattr(self, 'msgstr%dID' % form) |
73 | forms_match = And( |
74 | forms_match, |
75 | getattr(TranslationMessage, form_name) == form_value) |
76 | |
77 | === modified file 'lib/lp/translations/scripts/message_sharing_migration.py' |
78 | --- lib/lp/translations/scripts/message_sharing_migration.py 2009-10-22 10:17:44 +0000 |
79 | +++ lib/lp/translations/scripts/message_sharing_migration.py 2009-10-28 14:58:15 +0000 |
80 | @@ -7,12 +7,16 @@ |
81 | ] |
82 | |
83 | |
84 | +import gc |
85 | +import os |
86 | + |
87 | from zope.component import getUtility |
88 | from zope.security.proxy import removeSecurityProxy |
89 | |
90 | from canonical.launchpad.utilities.orderingcheck import OrderingCheck |
91 | from lp.translations.interfaces.potemplate import IPOTemplateSet |
92 | from lp.translations.interfaces.translations import TranslationConstants |
93 | +from lp.translations.model.translationmessage import TranslationMessage |
94 | from lp.registry.interfaces.product import IProductSet |
95 | from lp.registry.interfaces.sourcepackagename import ISourcePackageNameSet |
96 | from lp.services.scripts.base import ( |
97 | @@ -153,14 +157,19 @@ |
98 | |
99 | template_set = None |
100 | |
101 | + commit_count = 0 |
102 | + |
103 | def add_my_options(self): |
104 | self.parser.add_option('-d', '--distribution', dest='distribution', |
105 | help="Distribution to merge messages for.") |
106 | + self.parser.add_option('-D', '--remove-duplicates', |
107 | + dest='remove_duplicates', action='store_true', |
108 | + help="Phase 1: Remove problematic duplicate TranslationMessages.") |
109 | self.parser.add_option('-p', '--product', dest='product', |
110 | help="Product to merge messages for.") |
111 | self.parser.add_option('-P', '--merge-potmsgsets', |
112 | action='store_true', dest='merge_potmsgsets', |
113 | - help="Merge POTMsgSets.") |
114 | + help="Phase 2: Merge POTMsgSets.") |
115 | self.parser.add_option('-s', '--source-package', dest='sourcepackage', |
116 | help="Source package name within a distribution.") |
117 | self.parser.add_option('-t', '--template-names', |
118 | @@ -168,7 +177,7 @@ |
119 | help="Merge for templates with name matching this regex pattern.") |
120 | self.parser.add_option('-T', '--merge-translationmessages', |
121 | action='store_true', dest='merge_translationmessages', |
122 | - help="Merge TranslationMessages.") |
123 | + help="Phase 3: Merge TranslationMessages.") |
124 | self.parser.add_option('-x', '--dry-run', dest='dry_run', |
125 | action='store_true', |
126 | help="Dry run, don't really make any changes.") |
127 | @@ -185,13 +194,14 @@ |
128 | |
129 | def main(self): |
130 | actions = ( |
131 | + self.options.remove_duplicates or |
132 | self.options.merge_potmsgsets or |
133 | self.options.merge_translationmessages) |
134 | |
135 | if not actions: |
136 | raise LaunchpadScriptFailure( |
137 | - "Select at least one action: merge POTMsgSets, " |
138 | - "TranslationMessages, or both.") |
139 | + "Select at least one action: remove duplicates, merge " |
140 | + "POTMsgSets, and/or merge TranslationMessages.") |
141 | |
142 | if self.options.product and self.options.distribution: |
143 | raise LaunchpadScriptFailure( |
144 | @@ -250,27 +260,85 @@ |
145 | name, len(templates), number + 1, class_count)) |
146 | self.logger.debug("Templates: %s" % str(templates)) |
147 | |
148 | + if self.options.remove_duplicates: |
149 | + self.logger.info("Removing duplicate messages.") |
150 | + self._removeDuplicateMessages(templates) |
151 | + self._endTransaction(intermediate=True) |
152 | + |
153 | if self.options.merge_potmsgsets: |
154 | + self.logger.info("Merging POTMsgSets.") |
155 | self._mergePOTMsgSets(templates) |
156 | + self._endTransaction(intermediate=True) |
157 | |
158 | if self.options.merge_translationmessages: |
159 | + self.logger.info("Merging TranslationMessages.") |
160 | self._mergeTranslationMessages(templates) |
161 | |
162 | self._endTransaction() |
163 | |
164 | self.logger.info("Done.") |
165 | |
166 | - def _endTransaction(self): |
167 | + def _endTransaction(self, intermediate=False): |
168 | + """End this transaction and start a new one. |
169 | + |
170 | + :param intermediate: Whether this is an intermediate commit. |
171 | + Dry-run mode aborts transactions rather than committing |
172 | + them; where doing that may break dependencies between steps |
173 | + of the algorithm, pass `True` so that the abort can be |
174 | + skipped. |
175 | + """ |
176 | + if self.txn is None: |
177 | + return |
178 | + |
179 | + if self.commit_count % 100 == 0 or not intermediate: |
180 | + freed = gc.collect() |
181 | + objcount = len(gc.get_objects()) |
182 | + garbage = len(gc.garbage) |
183 | + memsize = open("/proc/%s/statm" % os.getpid()).read().split()[5] |
184 | + self.logger.debug( |
185 | + "Freed: %d. Object count: %d. Garbage: %d. Memory size: %s" |
186 | + % (freed, objcount, garbage, memsize)) |
187 | + |
188 | + self.commit_count += 1 |
189 | + |
190 | + if intermediate and self.commit_count % 10 != 0: |
191 | + return |
192 | + |
193 | if self.options.dry_run: |
194 | - self.txn.abort() |
195 | + if not intermediate: |
196 | + self.txn.abort() |
197 | else: |
198 | self.txn.commit() |
199 | - self.txn.begin() |
200 | |
201 | - def _mergePOTMsgSets(self, potemplates): |
202 | - """Merge POTMsgSets for given sequence of sharing templates.""" |
203 | + def _removeDuplicateMessages(self, potemplates): |
204 | + """Get rid of duplicate `TranslationMessages` where needed.""" |
205 | self._setUpUtilities() |
206 | |
207 | + representatives = {} |
208 | + order_check = OrderingCheck(cmp=self.compare_template_precedence) |
209 | + |
210 | + for template in potemplates: |
211 | + order_check.check(template) |
212 | + for potmsgset in template.getPOTMsgSets(False, prefetch=False): |
213 | + key = get_potmsgset_key(potmsgset) |
214 | + if key not in representatives: |
215 | + representatives[key] = potmsgset |
216 | + |
217 | + for representative in representatives.itervalues(): |
218 | + self._scrubPOTMsgSetTranslations(representative) |
219 | + self._endTransaction(intermediate=True) |
220 | + |
221 | + def _mapRepresentatives(self, potemplates): |
222 | + """Map out POTMsgSets' subordinates and templates. |
223 | + |
224 | + :param potemplates: An equivalence class of `POTemplate`s to |
225 | + sort out. |
226 | + :return: A tuple of dicts. The first maps each `POTMsgSet`'s |
227 | + key (as returned by `get_potmsgset_key`) to a list of its |
228 | + subordinate `POTMsgSet`s. The second maps each |
229 | + representative `POTMsgSet` to its representative |
230 | + `POTemplate`. |
231 | + """ |
232 | # Map each POTMsgSet key (context, msgid, plural) to its |
233 | # representative POTMsgSet. |
234 | representatives = {} |
235 | @@ -302,21 +370,34 @@ |
236 | else: |
237 | subordinates[representative] = [] |
238 | |
239 | + return subordinates, representative_templates |
240 | + |
241 | + def _mergePOTMsgSets(self, potemplates): |
242 | + """Merge POTMsgSets for given sequence of sharing templates.""" |
243 | + self._setUpUtilities() |
244 | + |
245 | + subordinates, representative_templates = self._mapRepresentatives( |
246 | + potemplates) |
247 | + |
248 | + num_representatives = len(subordinates) |
249 | + representative_num = 0 |
250 | + |
251 | for representative, potmsgsets in subordinates.iteritems(): |
252 | - # Scrub the representative POTMsgSet of any duplicate |
253 | - # translation messages. We don't need do this for subordinates |
254 | - # because the algorithm will refuse to add new duplicates to the |
255 | - # representative POTMsgSet anyway. |
256 | - self._scrubPOTMsgSetTranslations(representative) |
257 | - |
258 | - seen_potmsgsets = set([representative]) |
259 | + representative_num += 1 |
260 | + self.logger.info("Message %d/%d: %d subordinate(s)." % ( |
261 | + representative_num, num_representatives, len(potmsgsets))) |
262 | + |
263 | + seen_potmsgsets = set([representative.id]) |
264 | + |
265 | + potmsgset_deletions = 0 |
266 | + tm_deletions = 0 |
267 | |
268 | # Merge each subordinate POTMsgSet into its representative. |
269 | for subordinate in potmsgsets: |
270 | - if subordinate in seen_potmsgsets: |
271 | + if subordinate.id in seen_potmsgsets: |
272 | continue |
273 | |
274 | - seen_potmsgsets.add(subordinate) |
275 | + seen_potmsgsets.add(subordinate.id) |
276 | |
277 | for message in subordinate.getAllTranslationMessages(): |
278 | message = removeSecurityProxy(message) |
279 | @@ -350,22 +431,40 @@ |
280 | bequeathe_flags( |
281 | message, twin, |
282 | (clashing_current, clashing_imported)) |
283 | + tm_deletions += 1 |
284 | |
285 | merge_translationtemplateitems( |
286 | subordinate, representative, |
287 | representative_templates[representative]) |
288 | removeSecurityProxy(subordinate).destroySelf() |
289 | + potmsgset_deletions += 1 |
290 | + |
291 | + self._endTransaction(intermediate=True) |
292 | + |
293 | + self.logger.info( |
294 | + "Deleted POTMsgSets: %d. TranslationMessages: %d." % ( |
295 | + potmsgset_deletions, tm_deletions)) |
296 | |
297 | def _mergeTranslationMessages(self, potemplates): |
298 | """Share `TranslationMessage`s between templates where possible.""" |
299 | self._setUpUtilities() |
300 | order_check = OrderingCheck(cmp=self.compare_template_precedence) |
301 | for template in potemplates: |
302 | + deletions = 0 |
303 | order_check.check(template) |
304 | for potmsgset in template.getPOTMsgSets(False, prefetch=False): |
305 | + before = potmsgset.getAllTranslationMessages().count() |
306 | + |
307 | for message in potmsgset.getAllTranslationMessages(): |
308 | removeSecurityProxy(message).shareIfPossible() |
309 | |
310 | + after = potmsgset.getAllTranslationMessages().count() |
311 | + self._endTransaction(intermediate=True) |
312 | + |
313 | + deletions += max(0, before - after) |
314 | + |
315 | + self.logger.info("Removed TranslationMessages: %d" % deletions) |
316 | + |
317 | def _getPOTMsgSetTranslationMessageKey(self, tm): |
318 | """Return tuple that identifies a TranslationMessage in a POTMsgSet. |
319 | |
320 | @@ -375,36 +474,12 @@ |
321 | that's sorted out in the nested dicts). |
322 | """ |
323 | tm = removeSecurityProxy(tm) |
324 | - msgstr_ids = [ |
325 | + msgstr_ids = tuple([ |
326 | getattr(tm, 'msgstr%dID' % form) |
327 | for form in xrange(TranslationConstants.MAX_PLURAL_FORMS) |
328 | - ] |
329 | + ]) |
330 | |
331 | - return (tm.language, tm.variant) + tuple(msgstr_ids) |
332 | - |
333 | - def _mapExistingMessages(self, potmsgset): |
334 | - """Map out the existing TranslationMessages for `potmsgset`. |
335 | - |
336 | - :return: a dict mapping message keys (as returned by |
337 | - `_getPOTMsgSetTranslationMessageKey`) to further dicts, which |
338 | - map templates to TranslationMessages for that key and template. |
339 | - Each list should normally have only one message in it, but in |
340 | - the transition period this isn't always the case. |
341 | - """ |
342 | - existing_tms = {} |
343 | - |
344 | - for tm in potmsgset.getAllTranslationMessages(): |
345 | - key = self._getPOTMsgSetTranslationMessageKey(tm) |
346 | - if key not in existing_tms: |
347 | - existing_tms[key] = {} |
348 | - per_template = existing_tms[key] |
349 | - |
350 | - if tm.potemplate not in per_template: |
351 | - per_template[tm.potemplate] = [] |
352 | - |
353 | - per_template[tm.potemplate].append(tm) |
354 | - |
355 | - return existing_tms |
356 | + return (tm.potemplateID, tm.languageID, tm.variant) + msgstr_ids |
357 | |
358 | def _scrubPOTMsgSetTranslations(self, potmsgset): |
359 | """Map out translations for `potmsgset`, and eliminate duplicates. |
360 | @@ -416,34 +491,53 @@ |
361 | # XXX JeroenVermeulen 2009-06-15 |
362 | # spec=message-sharing-prevent-duplicates: We're going to have a |
363 | # unique index again at some point that will prevent this. When |
364 | - # it becomes impossible to test this function, it can be |
365 | - # removed. |
366 | - translations = self._mapExistingMessages(potmsgset) |
367 | - |
368 | - for key, per_template in translations.iteritems(): |
369 | - for template, tms in per_template.iteritems(): |
370 | - assert len(tms) > 0, "Empty-list entry in translations dict." |
371 | - if len(tms) == 1: |
372 | - continue |
373 | - self.logger.info( |
374 | - "Cleaning up %s identical '%s' messages for: \"%s\"" % ( |
375 | - len(tms), tms[0].language.getFullCode(), |
376 | - potmsgset.singular_text)) |
377 | - for tm in tms[1:]: |
378 | - assert tm != tms[0], "Duplicate is listed twice." |
379 | - assert tm.potmsgset == tms[0].potmsgset, ( |
380 | + # it becomes impossible to test this function, this whole |
381 | + # migration phase can be scrapped. |
382 | + tms = potmsgset.getAllTranslationMessages().order_by( |
383 | + TranslationMessage.languageID, TranslationMessage.variant, |
384 | + TranslationMessage.id) |
385 | + |
386 | + ids_per_language = {} |
387 | + for tm in tms: |
388 | + language = (tm.language, tm.variant) |
389 | + if language not in ids_per_language: |
390 | + ids_per_language[language] = [] |
391 | + ids_per_language[language].append(tm.id) |
392 | + |
393 | + self._endTransaction(intermediate=True) |
394 | + |
395 | + deletions = 0 |
396 | + |
397 | + for ids in ids_per_language.itervalues(): |
398 | + translations = {} |
399 | + |
400 | + for tm_id in ids: |
401 | + tm = TranslationMessage.get(tm_id) |
402 | + key = self._getPOTMsgSetTranslationMessageKey(tm) |
403 | + |
404 | + if key in translations: |
405 | + self.logger.info( |
406 | + "Cleaning up identical '%s' message for: \"%s\"" % ( |
407 | + tm.language.getFullCode(), potmsgset.singular_text)) |
408 | + |
409 | + existing_tm = translations[key] |
410 | + assert tm != existing_tm, ( |
411 | + "Message is duplicate of itself.") |
412 | + assert tm.potmsgset == existing_tm.potmsgset, ( |
413 | "Different potmsgsets considered identical.") |
414 | - assert tm.potemplate == tms[0].potemplate, ( |
415 | + assert tm.potemplate == existing_tm.potemplate, ( |
416 | "Different potemplates considered identical.") |
417 | |
418 | - # Transfer any current/imported flags to the first of |
419 | - # the identical messages, and delete the duplicates. |
420 | - bequeathe_flags(tm, tms[0]) |
421 | - removeSecurityProxy(tm).sync() |
422 | - |
423 | - per_template[template] = [tms[0]] |
424 | - |
425 | - return translations |
426 | + # Transfer any current/imported flags to the existing |
427 | + # message, and delete the duplicate. |
428 | + bequeathe_flags(tm, existing_tm) |
429 | + deletions += 1 |
430 | + else: |
431 | + translations[key] = tm |
432 | + |
433 | + self._endTransaction(intermediate=True) |
434 | + |
435 | + self.logger.info("Deleted TranslationMessages: %d" % deletions) |
436 | |
437 | def _findClashes(self, message, target_potmsgset, target_potemplate): |
438 | """What would clash if we moved `message` to the target environment? |
439 | |
440 | === modified file 'lib/lp/translations/scripts/tests/test_message_sharing_migration.py' |
441 | --- lib/lp/translations/scripts/tests/test_message_sharing_migration.py 2009-10-22 11:25:01 +0000 |
442 | +++ lib/lp/translations/scripts/tests/test_message_sharing_migration.py 2009-10-28 14:58:15 +0000 |
443 | @@ -18,6 +18,7 @@ |
444 | IPOFileTranslatorSet) |
445 | from lp.translations.model.pomsgid import POMsgID |
446 | from lp.translations.model.potemplate import POTemplate |
447 | +from lp.translations.model.potranslation import POTranslation |
448 | from lp.translations.scripts.message_sharing_migration import ( |
449 | MessageSharingMerge) |
450 | from canonical.testing import LaunchpadZopelessLayer |
451 | @@ -361,40 +362,6 @@ |
452 | self.assertEqual(tms, expected_tms) |
453 | self.assertEqual(len(tms), 3) |
454 | |
455 | - def test_duplicatesAreCleanedUp(self): |
456 | - # The POTMsgSet merging function cleans up any duplicate |
457 | - # TranslationMessages that might get in the way. |
458 | - trunk_message, stable_message = self._makeTranslationMessages( |
459 | - 'snaggle', 'snaggle') |
460 | - trunk_message.is_current = False |
461 | - trunk_message.sync() |
462 | - |
463 | - potmsgset = trunk_message.potmsgset |
464 | - |
465 | - stable_message.is_imported = True |
466 | - stable_message.potemplate = trunk_message.potemplate |
467 | - stable_message.potmsgset = potmsgset |
468 | - stable_message.sync() |
469 | - |
470 | - # We've set up a situation where trunk has two identical |
471 | - # messages (one of which is current, the other imported) and |
472 | - # stable has none. |
473 | - self.assertEqual(self._getTranslations(), ('snaggle', None)) |
474 | - tms = set(potmsgset.getAllTranslationMessages()) |
475 | - self.assertEqual(tms, set([trunk_message, stable_message])) |
476 | - |
477 | - self.script._mergePOTMsgSets(self.templates) |
478 | - |
479 | - # The duplicates have been cleaned up. |
480 | - self.assertEqual(potmsgset.getAllTranslationMessages().count(), 1) |
481 | - |
482 | - # The is_current and is_imported flags from the duplicate |
483 | - # messages have been merged into a single, current, imported |
484 | - # message. |
485 | - message = self._getMessage(potmsgset, self.trunk_template) |
486 | - self.assertTrue(message.is_current) |
487 | - self.assertTrue(message.is_imported) |
488 | - |
489 | |
490 | class TestTranslationMessageNonMerging(TestCaseWithFactory, |
491 | TranslatedProductMixin): |
492 | @@ -586,51 +553,61 @@ |
493 | self.assertEqual(poft.latest_message, stable_message) |
494 | |
495 | |
496 | -class TestMapMessages(TestCaseWithFactory, TranslatedProductMixin): |
497 | - """Test _mapExistingMessages and friends.""" |
498 | +class TestRemoveDuplicates(TestCaseWithFactory, TranslatedProductMixin): |
499 | + """Test _scrubPOTMsgSetTranslations and friends.""" |
500 | layer = LaunchpadZopelessLayer |
501 | |
502 | def setUp(self): |
503 | self.layer.switchDbUser('postgres') |
504 | - super(TestMapMessages, self).setUp(user='mark@example.com') |
505 | - super(TestMapMessages, self).setUpProduct() |
506 | - |
507 | - def test_NoMessagesToMap(self): |
508 | - # Mapping an untranslated POTMsgSet produces an empty dict. |
509 | - empty = self.script._mapExistingMessages(self.trunk_potmsgset) |
510 | - self.assertEqual(empty, {}) |
511 | - |
512 | - def test_MapSharedMessage(self): |
513 | - # Map existing, shared translation for a POTMsgSet. |
514 | - message = self._makeTranslationMessage( |
515 | - pofile=self.trunk_pofile, potmsgset=self.trunk_potmsgset, |
516 | - text='winslow', diverged=False) |
517 | - |
518 | - map = self.script._mapExistingMessages(message.potmsgset) |
519 | - key = self.script._getPOTMsgSetTranslationMessageKey(message) |
520 | - expected = {key: { self.trunk_template: [message]}} |
521 | - |
522 | - def test_MapDivergedMessage(self): |
523 | - # Map existing, diverged translation for a POTMsgSet. |
524 | - message = self._makeTranslationMessage( |
525 | - pofile=self.trunk_pofile, potmsgset=self.trunk_potmsgset, |
526 | - text='winslow', diverged=True) |
527 | - |
528 | - map = self.script._mapExistingMessages(message.potmsgset) |
529 | - key = self.script._getPOTMsgSetTranslationMessageKey(message) |
530 | - expected = {key: {self.trunk_template: [message]}} |
531 | - |
532 | - self.assertEqual(map, expected) |
533 | + super(TestRemoveDuplicates, self).setUp(user='mark@example.com') |
534 | + super(TestRemoveDuplicates, self).setUpProduct() |
535 | + |
536 | + def test_duplicatesAreCleanedUp(self): |
537 | + # The duplicates removal function cleans up any duplicate |
538 | + # TranslationMessages that might get in the way of merging. |
539 | + trunk_message, stable_message = self._makeTranslationMessages( |
540 | + 'snaggle', 'snaggle') |
541 | + trunk_message.is_current = False |
542 | + trunk_message.sync() |
543 | + |
544 | + potmsgset = trunk_message.potmsgset |
545 | + |
546 | + stable_message.is_imported = True |
547 | + stable_message.potemplate = trunk_message.potemplate |
548 | + stable_message.potmsgset = potmsgset |
549 | + stable_message.sync() |
550 | + |
551 | + # We've set up a situation where trunk has two identical |
552 | + # messages (one of which is current, the other imported) and |
553 | + # stable has none. |
554 | + self.assertEqual(self._getTranslations(), ('snaggle', None)) |
555 | + tms = set(potmsgset.getAllTranslationMessages()) |
556 | + self.assertEqual(tms, set([trunk_message, stable_message])) |
557 | + |
558 | + self.script._removeDuplicateMessages(self.templates) |
559 | + |
560 | + # The duplicates have been cleaned up. |
561 | + self.assertEqual(potmsgset.getAllTranslationMessages().count(), 1) |
562 | + |
563 | + # The is_current and is_imported flags from the duplicate |
564 | + # messages have been merged into a single, current, imported |
565 | + # message. |
566 | + message = self._getMessage(potmsgset, self.trunk_template) |
567 | + self.assertTrue(message.is_current) |
568 | + self.assertTrue(message.is_imported) |
569 | |
570 | def test_ScrubPOTMsgSetTranslationsWithoutDuplication(self): |
571 | # _scrubPOTMsgSetTranslations eliminates duplicated |
572 | # TranslationMessages. If it doesn't find any, nothing happens. |
573 | - message = self._makeTranslationMessage( |
574 | + self._makeTranslationMessage( |
575 | pofile=self.trunk_pofile, potmsgset=self.trunk_potmsgset, |
576 | text='gbzidh', diverged=False) |
577 | - map = self.script._scrubPOTMsgSetTranslations(self.trunk_potmsgset) |
578 | - key = self.script._getPOTMsgSetTranslationMessageKey(message) |
579 | - self.assertEqual(map, {key: {None: [message]}}) |
580 | + |
581 | + self.script._scrubPOTMsgSetTranslations(self.trunk_potmsgset) |
582 | + |
583 | + message1, message2 = self._getMessages() |
584 | + self.assertIsNot(None, message1) |
585 | + self.assertIs(None, message2) |
586 | |
587 | def test_ScrubPOTMsgSetTranslationsWithDuplication(self): |
588 | # If there are duplicate TranslationMessages, one inherits all |
589 | @@ -646,14 +623,14 @@ |
590 | message2.is_imported = True |
591 | message2.potmsgset = self.trunk_potmsgset |
592 | message2.potemplate = self.trunk_template |
593 | + ids = (message1.id, message2.id) |
594 | |
595 | - map = self.script._scrubPOTMsgSetTranslations(self.trunk_potmsgset) |
596 | + self.script._scrubPOTMsgSetTranslations(self.trunk_potmsgset) |
597 | |
598 | message, no_message = self._getMessages() |
599 | |
600 | - # The resulting map has only one of the identical messages. |
601 | - key = self.script._getPOTMsgSetTranslationMessageKey(message) |
602 | - self.assertEqual(map, {key: {self.trunk_template: [message]}}) |
603 | + # One of the two messages is now gone. |
604 | + self.assertIs(None, no_message) |
605 | |
606 | # The remaining message combines the flags from both its |
607 | # predecessors. |
608 | @@ -763,16 +740,17 @@ |
609 | transaction.commit() |
610 | gc.collect() |
611 | |
612 | - def _listLoadedPOMsgIDs(self, ignore_list=None): |
613 | - """Return the set of all `POMsgID` objects that are in memory. |
614 | + def _listLoadedObjects(self, of_class, ignore_list=None): |
615 | + """Return the set of objects of a given type that are in memory. |
616 | |
617 | + :param of_class: A class to filter for. |
618 | :param ignore_list: A previous return value. Any POMsgIDs that |
619 | were already in that list are ignored here. |
620 | """ |
621 | pomsgids = set([ |
622 | whatever |
623 | for whatever in gc.get_objects() |
624 | - if isinstance(whatever, POMsgID) |
625 | + if isinstance(whatever, of_class) |
626 | ]) |
627 | if ignore_list is not None: |
628 | pomsgids -= ignore_list |
629 | @@ -805,19 +783,47 @@ |
630 | def test_merging_loads_no_msgids(self): |
631 | # Migration does not load actual msgids into memory. |
632 | self._flushDbObjects() |
633 | - msgids_before = self._listLoadedPOMsgIDs() |
634 | - |
635 | - self._makeTranslationMessages('x', 'y', trunk_diverged=True) |
636 | - self._makeTranslationMessages('1', '2', stable_diverged=True) |
637 | - |
638 | - self._resetReferences() |
639 | - self.assertNotEqual([], self.templates) |
640 | - self.assertEqual(set(), self._listLoadedPOMsgIDs(msgids_before)) |
641 | - |
642 | - self.script._mergePOTMsgSets(self.templates) |
643 | - self.script._mergeTranslationMessages(self.templates) |
644 | - |
645 | - self.assertEqual(set(), self._listLoadedPOMsgIDs(msgids_before)) |
646 | + msgids_before = self._listLoadedObjects(POMsgID) |
647 | + |
648 | + self._makeTranslationMessages('x', 'y', trunk_diverged=True) |
649 | + self._makeTranslationMessages('1', '2', stable_diverged=True) |
650 | + |
651 | + self._resetReferences() |
652 | + self.assertNotEqual([], self.templates) |
653 | + self.assertEqual( |
654 | + set(), self._listLoadedObjects(POMsgID, msgids_before)) |
655 | + |
656 | + self.script._mergePOTMsgSets(self.templates) |
657 | + self.script._mergeTranslationMessages(self.templates) |
658 | + |
659 | + self.assertEqual( |
660 | + set(), self._listLoadedObjects(POMsgID, msgids_before)) |
661 | + |
662 | + def test_merging_loads_no_potranslations(self): |
663 | + # Migration does not load actual POTranslations into memory. |
664 | + self._flushDbObjects() |
665 | + potranslations_before = self._listLoadedObjects(POTranslation) |
666 | + |
667 | + self._makeTranslationMessages('x', 'y', trunk_diverged=True) |
668 | + self._makeTranslationMessages('1', '2', stable_diverged=True) |
669 | + |
670 | + self._resetReferences() |
671 | + self.assertNotEqual([], self.templates) |
672 | + self.assertEqual( |
673 | + set(), |
674 | + self._listLoadedObjects(POTranslation, potranslations_before)) |
675 | + |
676 | + self.script._mergePOTMsgSets(self.templates) |
677 | + |
678 | + self.assertEqual( |
679 | + set(), |
680 | + self._listLoadedObjects(POTranslation, potranslations_before)) |
681 | + |
682 | + self.script._mergeTranslationMessages(self.templates) |
683 | + |
684 | + self.assertEqual( |
685 | + set(), |
686 | + self._listLoadedObjects(POTranslation, potranslations_before)) |
687 | |
688 | |
689 | def test_suite(): |
= Bug 400544 =
The message-sharing migration script was using far too much memory for large projects and packages. In previous branches (the latest still on approach for landing as I write this) we stopped the script from fetching msgid texts while merging. This branch should further reduce memory footprint.
It doesn't do this by putting less stuff in memory per se. It probably allocates more overall. But it breaks things up so that less needs to be in memory at the same time.
== Details ==
The diff begins with some whitespace cleanups in potmsgset.py. Pay no heed to them. In the same file, getAllTranslati onMessages now returns a Storm result set. The caller can now append sorting clauses to it, which was more relevant in an earlier incarnation of this branch, but it's an improvement anyway so I left it in. There is no more default ordering, but I verified that no code relied on the ordering there was anyway.
In the script, moved elimination of certain troublesome duplications among TranslationMessages into a separate phase of execution. Since this brings the number of phases to 3, I also started numbering them clearly in the options list.
You'll note some new intermediate commits between phases. The commit function has evolved to skip intermediate aborts on dry runs; skip commits or aborts when no txn has been set; and to run the garbage collector after every commit in hopes of relieving memory load somewhat.
Next in the diff comes the script's new execution phase. It figures out which potmsgsets are "representative," and cleans up only those. They are the only ones that are a problem for the rest of the algorithm. There are probably few enough other cases that it's not worth going through a full cleanup.
Right below that you'll see how the _scrubPOTMsgSet Translations invocation was removed from the previous first execution phase (which is now the second).
The extra intermediate commits I've added inside phases are a delicate matter. Underneath Storm and above the database connection, psycopg batches query result sets. If you commit transactions while iterating over a result set (especially if you're also removing rows that occur in it!) there's a risk of confusing the iteration. This is why there are no _endTransaction calls while iterating over result sets. Once the full result set has been loaded, this is no longer an issue.
_mapExistingMes sages is gone. It was a helper for _scrubPOTMsgSet Translations, but its approach involved loading all of a potmsgset's translations right up front. A serious memory footprint risk that was still in there because once upon a time, the resulting dict was reused elsewhere in the algorithm.
There still is a two-stage approach, but with a different division of labour: one loop retrieves the TranslationMessages a potmsgset has for a given language/variant pair, but keeps only its ids so as not to pin all those objects in the ORM cache. A second loop sorts out only the TranslationMessages for a single language, deleting duplicates as it finds them. Duplicates must have the same language, and a problematic potmsgset is one that has TranslationMessages in many languages, so this ...