Merge lp:~jtv/launchpad/scale-message-sharing-migration into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Brad Crittenden
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~jtv/launchpad/scale-message-sharing-migration
Merge into: lp:launchpad
Diff against target: 689 lines
4 files modified
lib/lp/translations/model/potmsgset.py (+6/-5)
lib/lp/translations/model/translationmessage.py (+3/-3)
lib/lp/translations/scripts/message_sharing_migration.py (+164/-70)
lib/lp/translations/scripts/tests/test_message_sharing_migration.py (+95/-89)
To merge this branch: bzr merge lp:~jtv/launchpad/scale-message-sharing-migration
Reviewer Review Type Date Requested Status
Brad Crittenden (community) code Approve
Review via email: mp+13961@code.launchpad.net

Commit message

Memory-footprint cuts for message sharing migration script.

To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :
Download full text (4.4 KiB)

= Bug 400544 =

The message-sharing migration script was using far too much memory for large projects and packages. In previous branches (the latest still on approach for landing as I write this) we stopped the script from fetching msgid texts while merging. This branch should further reduce memory footprint.

It doesn't do this by putting less stuff in memory per se. It probably allocates more overall. But it breaks things up so that less needs to be in memory at the same time.

== Details ==

The diff begins with some whitespace cleanups in potmsgset.py. Pay no heed to them. In the same file, getAllTranslationMessages now returns a Storm result set. The caller can now append sorting clauses to it, which was more relevant in an earlier incarnation of this branch, but it's an improvement anyway so I left it in. There is no more default ordering, but I verified that no code relied on the ordering there was anyway.

In the script, moved elimination of certain troublesome duplications among TranslationMessages into a separate phase of execution. Since this brings the number of phases to 3, I also started numbering them clearly in the options list.

You'll note some new intermediate commits between phases. The commit function has evolved to skip intermediate aborts on dry runs; skip commits or aborts when no txn has been set; and to run the garbage collector after every commit in hopes of relieving memory load somewhat.

Next in the diff comes the script's new execution phase. It figures out which potmsgsets are "representative," and cleans up only those. They are the only ones that are a problem for the rest of the algorithm. There are probably few enough other cases that it's not worth going through a full cleanup.

Right below that you'll see how the _scrubPOTMsgSetTranslations invocation was removed from the previous first execution phase (which is now the second).

The extra intermediate commits I've added inside phases are a delicate matter. Underneath Storm and above the database connection, psycopg batches query result sets. If you commit transactions while iterating over a result set (especially if you're also removing rows that occur in it!) there's a risk of confusing the iteration. This is why there are no _endTransaction calls while iterating over result sets. Once the full result set has been loaded, this is no longer an issue.

_mapExistingMessages is gone. It was a helper for _scrubPOTMsgSetTranslations, but its approach involved loading all of a potmsgset's translations right up front. A serious memory footprint risk that was still in there because once upon a time, the resulting dict was reused elsewhere in the algorithm.

There still is a two-stage approach, but with a different division of labour: one loop retrieves the TranslationMessages a potmsgset has for a given language/variant pair, but keeps only its ids so as not to pin all those objects in the ORM cache. A second loop sorts out only the TranslationMessages for a single language, deleting duplicates as it finds them. Duplicates must have the same language, and a problematic potmsgset is one that has TranslationMessages in many languages, so this ...

Read more...

Revision history for this message
Brad Crittenden (bac) wrote :

Hi Jeroen,

Thanks for this branch. Other than the following issues this branch looks great.

* As discussed on IRC, the parser option around 67 needs an action and a better help description to remove the ambiguous 'some'.

* Remove the assignment at line 158.

* The comment at line 291 makes no sense as written.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/translations/model/potmsgset.py'
2--- lib/lp/translations/model/potmsgset.py 2009-10-22 15:51:58 +0000
3+++ lib/lp/translations/model/potmsgset.py 2009-10-28 14:58:15 +0000
4@@ -15,7 +15,7 @@
5 from zope.component import getUtility
6
7 from sqlobject import ForeignKey, IntCol, StringCol, SQLObjectNotFound
8-from storm.store import EmptyResultSet
9+from storm.store import EmptyResultSet, Store
10
11 from canonical.config import config
12 from canonical.database.constants import DEFAULT, UTC_NOW
13@@ -58,7 +58,7 @@
14 u'translator_credits': (None, TranslationCreditsType.GNOME),
15
16 # KDE credits messages.
17- u'Your emails':
18+ u'Your emails':
19 (u'EMAIL OF TRANSLATORS', TranslationCreditsType.KDE_EMAILS),
20 u'Your names':
21 (u'NAME OF TRANSLATORS', TranslationCreditsType.KDE_NAMES),
22@@ -66,7 +66,7 @@
23 # Old KDE credits messages.
24 u'_: EMAIL OF TRANSLATORS\nYour emails':
25 (None, TranslationCreditsType.KDE_EMAILS),
26- u'_: NAME OF TRANSLATORS\nYour names':
27+ u'_: NAME OF TRANSLATORS\nYour names':
28 (None, TranslationCreditsType.KDE_NAMES),
29 }
30
31@@ -1056,7 +1056,7 @@
32 if self.is_translation_credit:
33 translation = self.getSharedTranslationMessage(pofile.language)
34 if translation is None:
35- message = self.updateTranslation(pofile, pofile.owner,
36+ message = self.updateTranslation(pofile, pofile.owner,
37 [credits_message_str], False,
38 datetime.datetime.now(pytz.UTC),
39 allow_credits=True)
40@@ -1104,7 +1104,8 @@
41
42 def getAllTranslationMessages(self):
43 """See `IPOTMsgSet`."""
44- return TranslationMessage.selectBy(potmsgset=self, orderBy=['id'])
45+ return Store.of(self).find(
46+ TranslationMessage, TranslationMessage.potmsgset == self)
47
48 def getAllTranslationTemplateItems(self):
49 """See `IPOTMsgSet`."""
50
51=== modified file 'lib/lp/translations/model/translationmessage.py'
52--- lib/lp/translations/model/translationmessage.py 2009-07-19 04:41:14 +0000
53+++ lib/lp/translations/model/translationmessage.py 2009-10-28 14:58:15 +0000
54@@ -386,7 +386,7 @@
55
56 for form in range(TranslationConstants.MAX_PLURAL_FORMS):
57 msgstr_name = 'msgstr%d' % form
58- msgstr = getattr(self, msgstr_name)
59+ msgstr = getattr(self, 'msgstr%dID' % form)
60 if msgstr is None:
61 form_clause = "%s IS NULL" % msgstr_name
62 else:
63@@ -454,10 +454,10 @@
64 """See `ITranslationMessage`."""
65 store = Store.of(self)
66
67- forms_match = (TranslationMessage.msgstr0 == self.msgstr0)
68+ forms_match = (TranslationMessage.msgstr0ID == self.msgstr0ID)
69 for form in xrange(1, TranslationConstants.MAX_PLURAL_FORMS):
70 form_name = 'msgstr%d' % form
71- form_value = getattr(self, form_name)
72+ form_value = getattr(self, 'msgstr%dID' % form)
73 forms_match = And(
74 forms_match,
75 getattr(TranslationMessage, form_name) == form_value)
76
77=== modified file 'lib/lp/translations/scripts/message_sharing_migration.py'
78--- lib/lp/translations/scripts/message_sharing_migration.py 2009-10-22 10:17:44 +0000
79+++ lib/lp/translations/scripts/message_sharing_migration.py 2009-10-28 14:58:15 +0000
80@@ -7,12 +7,16 @@
81 ]
82
83
84+import gc
85+import os
86+
87 from zope.component import getUtility
88 from zope.security.proxy import removeSecurityProxy
89
90 from canonical.launchpad.utilities.orderingcheck import OrderingCheck
91 from lp.translations.interfaces.potemplate import IPOTemplateSet
92 from lp.translations.interfaces.translations import TranslationConstants
93+from lp.translations.model.translationmessage import TranslationMessage
94 from lp.registry.interfaces.product import IProductSet
95 from lp.registry.interfaces.sourcepackagename import ISourcePackageNameSet
96 from lp.services.scripts.base import (
97@@ -153,14 +157,19 @@
98
99 template_set = None
100
101+ commit_count = 0
102+
103 def add_my_options(self):
104 self.parser.add_option('-d', '--distribution', dest='distribution',
105 help="Distribution to merge messages for.")
106+ self.parser.add_option('-D', '--remove-duplicates',
107+ dest='remove_duplicates', action='store_true',
108+ help="Phase 1: Remove problematic duplicate TranslationMessages.")
109 self.parser.add_option('-p', '--product', dest='product',
110 help="Product to merge messages for.")
111 self.parser.add_option('-P', '--merge-potmsgsets',
112 action='store_true', dest='merge_potmsgsets',
113- help="Merge POTMsgSets.")
114+ help="Phase 2: Merge POTMsgSets.")
115 self.parser.add_option('-s', '--source-package', dest='sourcepackage',
116 help="Source package name within a distribution.")
117 self.parser.add_option('-t', '--template-names',
118@@ -168,7 +177,7 @@
119 help="Merge for templates with name matching this regex pattern.")
120 self.parser.add_option('-T', '--merge-translationmessages',
121 action='store_true', dest='merge_translationmessages',
122- help="Merge TranslationMessages.")
123+ help="Phase 3: Merge TranslationMessages.")
124 self.parser.add_option('-x', '--dry-run', dest='dry_run',
125 action='store_true',
126 help="Dry run, don't really make any changes.")
127@@ -185,13 +194,14 @@
128
129 def main(self):
130 actions = (
131+ self.options.remove_duplicates or
132 self.options.merge_potmsgsets or
133 self.options.merge_translationmessages)
134
135 if not actions:
136 raise LaunchpadScriptFailure(
137- "Select at least one action: merge POTMsgSets, "
138- "TranslationMessages, or both.")
139+ "Select at least one action: remove duplicates, merge "
140+ "POTMsgSets, and/or merge TranslationMessages.")
141
142 if self.options.product and self.options.distribution:
143 raise LaunchpadScriptFailure(
144@@ -250,27 +260,85 @@
145 name, len(templates), number + 1, class_count))
146 self.logger.debug("Templates: %s" % str(templates))
147
148+ if self.options.remove_duplicates:
149+ self.logger.info("Removing duplicate messages.")
150+ self._removeDuplicateMessages(templates)
151+ self._endTransaction(intermediate=True)
152+
153 if self.options.merge_potmsgsets:
154+ self.logger.info("Merging POTMsgSets.")
155 self._mergePOTMsgSets(templates)
156+ self._endTransaction(intermediate=True)
157
158 if self.options.merge_translationmessages:
159+ self.logger.info("Merging TranslationMessages.")
160 self._mergeTranslationMessages(templates)
161
162 self._endTransaction()
163
164 self.logger.info("Done.")
165
166- def _endTransaction(self):
167+ def _endTransaction(self, intermediate=False):
168+ """End this transaction and start a new one.
169+
170+ :param intermediate: Whether this is an intermediate commit.
171+ Dry-run mode aborts transactions rather than committing
172+ them; where doing that may break dependencies between steps
173+ of the algorithm, pass `True` so that the abort can be
174+ skipped.
175+ """
176+ if self.txn is None:
177+ return
178+
179+ if self.commit_count % 100 == 0 or not intermediate:
180+ freed = gc.collect()
181+ objcount = len(gc.get_objects())
182+ garbage = len(gc.garbage)
183+ memsize = open("/proc/%s/statm" % os.getpid()).read().split()[5]
184+ self.logger.debug(
185+ "Freed: %d. Object count: %d. Garbage: %d. Memory size: %s"
186+ % (freed, objcount, garbage, memsize))
187+
188+ self.commit_count += 1
189+
190+ if intermediate and self.commit_count % 10 != 0:
191+ return
192+
193 if self.options.dry_run:
194- self.txn.abort()
195+ if not intermediate:
196+ self.txn.abort()
197 else:
198 self.txn.commit()
199- self.txn.begin()
200
201- def _mergePOTMsgSets(self, potemplates):
202- """Merge POTMsgSets for given sequence of sharing templates."""
203+ def _removeDuplicateMessages(self, potemplates):
204+ """Get rid of duplicate `TranslationMessages` where needed."""
205 self._setUpUtilities()
206
207+ representatives = {}
208+ order_check = OrderingCheck(cmp=self.compare_template_precedence)
209+
210+ for template in potemplates:
211+ order_check.check(template)
212+ for potmsgset in template.getPOTMsgSets(False, prefetch=False):
213+ key = get_potmsgset_key(potmsgset)
214+ if key not in representatives:
215+ representatives[key] = potmsgset
216+
217+ for representative in representatives.itervalues():
218+ self._scrubPOTMsgSetTranslations(representative)
219+ self._endTransaction(intermediate=True)
220+
221+ def _mapRepresentatives(self, potemplates):
222+ """Map out POTMsgSets' subordinates and templates.
223+
224+ :param potemplates: An equivalence class of `POTemplate`s to
225+ sort out.
226+ :return: A tuple of dicts. The first maps each `POTMsgSet`'s
227+ key (as returned by `get_potmsgset_key`) to a list of its
228+ subordinate `POTMsgSet`s. The second maps each
229+ representative `POTMsgSet` to its representative
230+ `POTemplate`.
231+ """
232 # Map each POTMsgSet key (context, msgid, plural) to its
233 # representative POTMsgSet.
234 representatives = {}
235@@ -302,21 +370,34 @@
236 else:
237 subordinates[representative] = []
238
239+ return subordinates, representative_templates
240+
241+ def _mergePOTMsgSets(self, potemplates):
242+ """Merge POTMsgSets for given sequence of sharing templates."""
243+ self._setUpUtilities()
244+
245+ subordinates, representative_templates = self._mapRepresentatives(
246+ potemplates)
247+
248+ num_representatives = len(subordinates)
249+ representative_num = 0
250+
251 for representative, potmsgsets in subordinates.iteritems():
252- # Scrub the representative POTMsgSet of any duplicate
253- # translation messages. We don't need do this for subordinates
254- # because the algorithm will refuse to add new duplicates to the
255- # representative POTMsgSet anyway.
256- self._scrubPOTMsgSetTranslations(representative)
257-
258- seen_potmsgsets = set([representative])
259+ representative_num += 1
260+ self.logger.info("Message %d/%d: %d subordinate(s)." % (
261+ representative_num, num_representatives, len(potmsgsets)))
262+
263+ seen_potmsgsets = set([representative.id])
264+
265+ potmsgset_deletions = 0
266+ tm_deletions = 0
267
268 # Merge each subordinate POTMsgSet into its representative.
269 for subordinate in potmsgsets:
270- if subordinate in seen_potmsgsets:
271+ if subordinate.id in seen_potmsgsets:
272 continue
273
274- seen_potmsgsets.add(subordinate)
275+ seen_potmsgsets.add(subordinate.id)
276
277 for message in subordinate.getAllTranslationMessages():
278 message = removeSecurityProxy(message)
279@@ -350,22 +431,40 @@
280 bequeathe_flags(
281 message, twin,
282 (clashing_current, clashing_imported))
283+ tm_deletions += 1
284
285 merge_translationtemplateitems(
286 subordinate, representative,
287 representative_templates[representative])
288 removeSecurityProxy(subordinate).destroySelf()
289+ potmsgset_deletions += 1
290+
291+ self._endTransaction(intermediate=True)
292+
293+ self.logger.info(
294+ "Deleted POTMsgSets: %d. TranslationMessages: %d." % (
295+ potmsgset_deletions, tm_deletions))
296
297 def _mergeTranslationMessages(self, potemplates):
298 """Share `TranslationMessage`s between templates where possible."""
299 self._setUpUtilities()
300 order_check = OrderingCheck(cmp=self.compare_template_precedence)
301 for template in potemplates:
302+ deletions = 0
303 order_check.check(template)
304 for potmsgset in template.getPOTMsgSets(False, prefetch=False):
305+ before = potmsgset.getAllTranslationMessages().count()
306+
307 for message in potmsgset.getAllTranslationMessages():
308 removeSecurityProxy(message).shareIfPossible()
309
310+ after = potmsgset.getAllTranslationMessages().count()
311+ self._endTransaction(intermediate=True)
312+
313+ deletions += max(0, before - after)
314+
315+ self.logger.info("Removed TranslationMessages: %d" % deletions)
316+
317 def _getPOTMsgSetTranslationMessageKey(self, tm):
318 """Return tuple that identifies a TranslationMessage in a POTMsgSet.
319
320@@ -375,36 +474,12 @@
321 that's sorted out in the nested dicts).
322 """
323 tm = removeSecurityProxy(tm)
324- msgstr_ids = [
325+ msgstr_ids = tuple([
326 getattr(tm, 'msgstr%dID' % form)
327 for form in xrange(TranslationConstants.MAX_PLURAL_FORMS)
328- ]
329+ ])
330
331- return (tm.language, tm.variant) + tuple(msgstr_ids)
332-
333- def _mapExistingMessages(self, potmsgset):
334- """Map out the existing TranslationMessages for `potmsgset`.
335-
336- :return: a dict mapping message keys (as returned by
337- `_getPOTMsgSetTranslationMessageKey`) to further dicts, which
338- map templates to TranslationMessages for that key and template.
339- Each list should normally have only one message in it, but in
340- the transition period this isn't always the case.
341- """
342- existing_tms = {}
343-
344- for tm in potmsgset.getAllTranslationMessages():
345- key = self._getPOTMsgSetTranslationMessageKey(tm)
346- if key not in existing_tms:
347- existing_tms[key] = {}
348- per_template = existing_tms[key]
349-
350- if tm.potemplate not in per_template:
351- per_template[tm.potemplate] = []
352-
353- per_template[tm.potemplate].append(tm)
354-
355- return existing_tms
356+ return (tm.potemplateID, tm.languageID, tm.variant) + msgstr_ids
357
358 def _scrubPOTMsgSetTranslations(self, potmsgset):
359 """Map out translations for `potmsgset`, and eliminate duplicates.
360@@ -416,34 +491,53 @@
361 # XXX JeroenVermeulen 2009-06-15
362 # spec=message-sharing-prevent-duplicates: We're going to have a
363 # unique index again at some point that will prevent this. When
364- # it becomes impossible to test this function, it can be
365- # removed.
366- translations = self._mapExistingMessages(potmsgset)
367-
368- for key, per_template in translations.iteritems():
369- for template, tms in per_template.iteritems():
370- assert len(tms) > 0, "Empty-list entry in translations dict."
371- if len(tms) == 1:
372- continue
373- self.logger.info(
374- "Cleaning up %s identical '%s' messages for: \"%s\"" % (
375- len(tms), tms[0].language.getFullCode(),
376- potmsgset.singular_text))
377- for tm in tms[1:]:
378- assert tm != tms[0], "Duplicate is listed twice."
379- assert tm.potmsgset == tms[0].potmsgset, (
380+ # it becomes impossible to test this function, this whole
381+ # migration phase can be scrapped.
382+ tms = potmsgset.getAllTranslationMessages().order_by(
383+ TranslationMessage.languageID, TranslationMessage.variant,
384+ TranslationMessage.id)
385+
386+ ids_per_language = {}
387+ for tm in tms:
388+ language = (tm.language, tm.variant)
389+ if language not in ids_per_language:
390+ ids_per_language[language] = []
391+ ids_per_language[language].append(tm.id)
392+
393+ self._endTransaction(intermediate=True)
394+
395+ deletions = 0
396+
397+ for ids in ids_per_language.itervalues():
398+ translations = {}
399+
400+ for tm_id in ids:
401+ tm = TranslationMessage.get(tm_id)
402+ key = self._getPOTMsgSetTranslationMessageKey(tm)
403+
404+ if key in translations:
405+ self.logger.info(
406+ "Cleaning up identical '%s' message for: \"%s\"" % (
407+ tm.language.getFullCode(), potmsgset.singular_text))
408+
409+ existing_tm = translations[key]
410+ assert tm != existing_tm, (
411+ "Message is duplicate of itself.")
412+ assert tm.potmsgset == existing_tm.potmsgset, (
413 "Different potmsgsets considered identical.")
414- assert tm.potemplate == tms[0].potemplate, (
415+ assert tm.potemplate == existing_tm.potemplate, (
416 "Different potemplates considered identical.")
417
418- # Transfer any current/imported flags to the first of
419- # the identical messages, and delete the duplicates.
420- bequeathe_flags(tm, tms[0])
421- removeSecurityProxy(tm).sync()
422-
423- per_template[template] = [tms[0]]
424-
425- return translations
426+ # Transfer any current/imported flags to the existing
427+ # message, and delete the duplicate.
428+ bequeathe_flags(tm, existing_tm)
429+ deletions += 1
430+ else:
431+ translations[key] = tm
432+
433+ self._endTransaction(intermediate=True)
434+
435+ self.logger.info("Deleted TranslationMessages: %d" % deletions)
436
437 def _findClashes(self, message, target_potmsgset, target_potemplate):
438 """What would clash if we moved `message` to the target environment?
439
440=== modified file 'lib/lp/translations/scripts/tests/test_message_sharing_migration.py'
441--- lib/lp/translations/scripts/tests/test_message_sharing_migration.py 2009-10-22 11:25:01 +0000
442+++ lib/lp/translations/scripts/tests/test_message_sharing_migration.py 2009-10-28 14:58:15 +0000
443@@ -18,6 +18,7 @@
444 IPOFileTranslatorSet)
445 from lp.translations.model.pomsgid import POMsgID
446 from lp.translations.model.potemplate import POTemplate
447+from lp.translations.model.potranslation import POTranslation
448 from lp.translations.scripts.message_sharing_migration import (
449 MessageSharingMerge)
450 from canonical.testing import LaunchpadZopelessLayer
451@@ -361,40 +362,6 @@
452 self.assertEqual(tms, expected_tms)
453 self.assertEqual(len(tms), 3)
454
455- def test_duplicatesAreCleanedUp(self):
456- # The POTMsgSet merging function cleans up any duplicate
457- # TranslationMessages that might get in the way.
458- trunk_message, stable_message = self._makeTranslationMessages(
459- 'snaggle', 'snaggle')
460- trunk_message.is_current = False
461- trunk_message.sync()
462-
463- potmsgset = trunk_message.potmsgset
464-
465- stable_message.is_imported = True
466- stable_message.potemplate = trunk_message.potemplate
467- stable_message.potmsgset = potmsgset
468- stable_message.sync()
469-
470- # We've set up a situation where trunk has two identical
471- # messages (one of which is current, the other imported) and
472- # stable has none.
473- self.assertEqual(self._getTranslations(), ('snaggle', None))
474- tms = set(potmsgset.getAllTranslationMessages())
475- self.assertEqual(tms, set([trunk_message, stable_message]))
476-
477- self.script._mergePOTMsgSets(self.templates)
478-
479- # The duplicates have been cleaned up.
480- self.assertEqual(potmsgset.getAllTranslationMessages().count(), 1)
481-
482- # The is_current and is_imported flags from the duplicate
483- # messages have been merged into a single, current, imported
484- # message.
485- message = self._getMessage(potmsgset, self.trunk_template)
486- self.assertTrue(message.is_current)
487- self.assertTrue(message.is_imported)
488-
489
490 class TestTranslationMessageNonMerging(TestCaseWithFactory,
491 TranslatedProductMixin):
492@@ -586,51 +553,61 @@
493 self.assertEqual(poft.latest_message, stable_message)
494
495
496-class TestMapMessages(TestCaseWithFactory, TranslatedProductMixin):
497- """Test _mapExistingMessages and friends."""
498+class TestRemoveDuplicates(TestCaseWithFactory, TranslatedProductMixin):
499+ """Test _scrubPOTMsgSetTranslations and friends."""
500 layer = LaunchpadZopelessLayer
501
502 def setUp(self):
503 self.layer.switchDbUser('postgres')
504- super(TestMapMessages, self).setUp(user='mark@example.com')
505- super(TestMapMessages, self).setUpProduct()
506-
507- def test_NoMessagesToMap(self):
508- # Mapping an untranslated POTMsgSet produces an empty dict.
509- empty = self.script._mapExistingMessages(self.trunk_potmsgset)
510- self.assertEqual(empty, {})
511-
512- def test_MapSharedMessage(self):
513- # Map existing, shared translation for a POTMsgSet.
514- message = self._makeTranslationMessage(
515- pofile=self.trunk_pofile, potmsgset=self.trunk_potmsgset,
516- text='winslow', diverged=False)
517-
518- map = self.script._mapExistingMessages(message.potmsgset)
519- key = self.script._getPOTMsgSetTranslationMessageKey(message)
520- expected = {key: { self.trunk_template: [message]}}
521-
522- def test_MapDivergedMessage(self):
523- # Map existing, diverged translation for a POTMsgSet.
524- message = self._makeTranslationMessage(
525- pofile=self.trunk_pofile, potmsgset=self.trunk_potmsgset,
526- text='winslow', diverged=True)
527-
528- map = self.script._mapExistingMessages(message.potmsgset)
529- key = self.script._getPOTMsgSetTranslationMessageKey(message)
530- expected = {key: {self.trunk_template: [message]}}
531-
532- self.assertEqual(map, expected)
533+ super(TestRemoveDuplicates, self).setUp(user='mark@example.com')
534+ super(TestRemoveDuplicates, self).setUpProduct()
535+
536+ def test_duplicatesAreCleanedUp(self):
537+ # The duplicates removal function cleans up any duplicate
538+ # TranslationMessages that might get in the way of merging.
539+ trunk_message, stable_message = self._makeTranslationMessages(
540+ 'snaggle', 'snaggle')
541+ trunk_message.is_current = False
542+ trunk_message.sync()
543+
544+ potmsgset = trunk_message.potmsgset
545+
546+ stable_message.is_imported = True
547+ stable_message.potemplate = trunk_message.potemplate
548+ stable_message.potmsgset = potmsgset
549+ stable_message.sync()
550+
551+ # We've set up a situation where trunk has two identical
552+ # messages (one of which is current, the other imported) and
553+ # stable has none.
554+ self.assertEqual(self._getTranslations(), ('snaggle', None))
555+ tms = set(potmsgset.getAllTranslationMessages())
556+ self.assertEqual(tms, set([trunk_message, stable_message]))
557+
558+ self.script._removeDuplicateMessages(self.templates)
559+
560+ # The duplicates have been cleaned up.
561+ self.assertEqual(potmsgset.getAllTranslationMessages().count(), 1)
562+
563+ # The is_current and is_imported flags from the duplicate
564+ # messages have been merged into a single, current, imported
565+ # message.
566+ message = self._getMessage(potmsgset, self.trunk_template)
567+ self.assertTrue(message.is_current)
568+ self.assertTrue(message.is_imported)
569
570 def test_ScrubPOTMsgSetTranslationsWithoutDuplication(self):
571 # _scrubPOTMsgSetTranslations eliminates duplicated
572 # TranslationMessages. If it doesn't find any, nothing happens.
573- message = self._makeTranslationMessage(
574+ self._makeTranslationMessage(
575 pofile=self.trunk_pofile, potmsgset=self.trunk_potmsgset,
576 text='gbzidh', diverged=False)
577- map = self.script._scrubPOTMsgSetTranslations(self.trunk_potmsgset)
578- key = self.script._getPOTMsgSetTranslationMessageKey(message)
579- self.assertEqual(map, {key: {None: [message]}})
580+
581+ self.script._scrubPOTMsgSetTranslations(self.trunk_potmsgset)
582+
583+ message1, message2 = self._getMessages()
584+ self.assertIsNot(None, message1)
585+ self.assertIs(None, message2)
586
587 def test_ScrubPOTMsgSetTranslationsWithDuplication(self):
588 # If there are duplicate TranslationMessages, one inherits all
589@@ -646,14 +623,14 @@
590 message2.is_imported = True
591 message2.potmsgset = self.trunk_potmsgset
592 message2.potemplate = self.trunk_template
593+ ids = (message1.id, message2.id)
594
595- map = self.script._scrubPOTMsgSetTranslations(self.trunk_potmsgset)
596+ self.script._scrubPOTMsgSetTranslations(self.trunk_potmsgset)
597
598 message, no_message = self._getMessages()
599
600- # The resulting map has only one of the identical messages.
601- key = self.script._getPOTMsgSetTranslationMessageKey(message)
602- self.assertEqual(map, {key: {self.trunk_template: [message]}})
603+ # One of the two messages is now gone.
604+ self.assertIs(None, no_message)
605
606 # The remaining message combines the flags from both its
607 # predecessors.
608@@ -763,16 +740,17 @@
609 transaction.commit()
610 gc.collect()
611
612- def _listLoadedPOMsgIDs(self, ignore_list=None):
613- """Return the set of all `POMsgID` objects that are in memory.
614+ def _listLoadedObjects(self, of_class, ignore_list=None):
615+ """Return the set of objects of a given type that are in memory.
616
617+ :param of_class: A class to filter for.
618 :param ignore_list: A previous return value. Any POMsgIDs that
619 were already in that list are ignored here.
620 """
621 pomsgids = set([
622 whatever
623 for whatever in gc.get_objects()
624- if isinstance(whatever, POMsgID)
625+ if isinstance(whatever, of_class)
626 ])
627 if ignore_list is not None:
628 pomsgids -= ignore_list
629@@ -805,19 +783,47 @@
630 def test_merging_loads_no_msgids(self):
631 # Migration does not load actual msgids into memory.
632 self._flushDbObjects()
633- msgids_before = self._listLoadedPOMsgIDs()
634-
635- self._makeTranslationMessages('x', 'y', trunk_diverged=True)
636- self._makeTranslationMessages('1', '2', stable_diverged=True)
637-
638- self._resetReferences()
639- self.assertNotEqual([], self.templates)
640- self.assertEqual(set(), self._listLoadedPOMsgIDs(msgids_before))
641-
642- self.script._mergePOTMsgSets(self.templates)
643- self.script._mergeTranslationMessages(self.templates)
644-
645- self.assertEqual(set(), self._listLoadedPOMsgIDs(msgids_before))
646+ msgids_before = self._listLoadedObjects(POMsgID)
647+
648+ self._makeTranslationMessages('x', 'y', trunk_diverged=True)
649+ self._makeTranslationMessages('1', '2', stable_diverged=True)
650+
651+ self._resetReferences()
652+ self.assertNotEqual([], self.templates)
653+ self.assertEqual(
654+ set(), self._listLoadedObjects(POMsgID, msgids_before))
655+
656+ self.script._mergePOTMsgSets(self.templates)
657+ self.script._mergeTranslationMessages(self.templates)
658+
659+ self.assertEqual(
660+ set(), self._listLoadedObjects(POMsgID, msgids_before))
661+
662+ def test_merging_loads_no_potranslations(self):
663+ # Migration does not load actual POTranslations into memory.
664+ self._flushDbObjects()
665+ potranslations_before = self._listLoadedObjects(POTranslation)
666+
667+ self._makeTranslationMessages('x', 'y', trunk_diverged=True)
668+ self._makeTranslationMessages('1', '2', stable_diverged=True)
669+
670+ self._resetReferences()
671+ self.assertNotEqual([], self.templates)
672+ self.assertEqual(
673+ set(),
674+ self._listLoadedObjects(POTranslation, potranslations_before))
675+
676+ self.script._mergePOTMsgSets(self.templates)
677+
678+ self.assertEqual(
679+ set(),
680+ self._listLoadedObjects(POTranslation, potranslations_before))
681+
682+ self.script._mergeTranslationMessages(self.templates)
683+
684+ self.assertEqual(
685+ set(),
686+ self._listLoadedObjects(POTranslation, potranslations_before))
687
688
689 def test_suite():