Merge lp:~jtv/launchpad/bug-994650-scrub-pofiletranslator into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merge reported by: Jeroen T. Vermeulen
Merged at revision: not available
Proposed branch: lp:~jtv/launchpad/bug-994650-scrub-pofiletranslator
Merge into: lp:launchpad
Diff against target: 423 lines (+385/-0)
4 files modified
database/schema/security.cfg (+1/-0)
lib/lp/scripts/garbo.py (+4/-0)
lib/lp/translations/scripts/scrub_pofiletranslator.py (+161/-0)
lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py (+219/-0)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-994650-scrub-pofiletranslator
Reviewer Review Type Date Requested Status
Benji York (community) code Approve
Review via email: mp+104866@code.launchpad.net

Commit message

POFileTranslator scrubber script.

Description of the change

This is as discussed on the internal mailing list. In a series of branches that simplifies and streamlines POFileTranslator management, this is the only one that introduces new code. It won't run while we still have POFileTranslator.latest_message, which I'm dropping in a separate pair of branches (devel/db-devel).

POFileTranslator is described as an “eager materialized view.” It's a cache of who contributed TranslationMessages to which POFiles, and when they last did.

What you see in this branch is a scrubbing script for POFileTranslator. Instead of trying (and in many ways failing) to maintain that table synchronously using a long and complex database trigger, Launchpad will keep it “eventually consistent” by running this script. The script doesn't care how data got where it is, so some painfully complex script code that updates the table can disappear. Instead, the script just finds out which entries are missing and which ones need adding, and does that.

The script may still need optimizing. I deliberately kept it simple. In the current code it would have been easy to check for and correct timestamps on the contributions as well, but I think that's of secondary importantance. Where we have a justified POFileTranslator record, its timestamp is likely to be correct; the known problems involve records being omitted altogether. And simplifications I'm introducing elsewhere may also cause records to be omitted, or left in place when they should be moved to different POFiles, or deleted.

To post a comment you must log in.
Revision history for this message
Benji York (benji) wrote :

This branch looks good. Here are a couple of things I spotted
while reading over it.

I believe that this bit is a syntax error in Python 2.6 as dict
comprehensions were added in 2.7. (line 132 of the diff, from
scrub_pofiletranslator.py):

    return {poft.personID: poft for poft in pofts}

Converting it to something like this should work (untested):

    return dict((poft.personID, poft) for poft in pofts)

Since you wrap all calls to measure_distance in abs, you could just put
the abs in measure_distance.

review: Approve (code)
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

It's a good thing you spot these things. A few months on 2.7 and I'm forgetting that we still need 2.6.

The hard part of making measure_distance return an absolute value is coming up with a good name. I arrived at the verb “to size”: size_distance. Which also happens to be shorter.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Late-breaking change: after a chat with StevenK, I turned this into a Garbo loop. See revisions 15215 and beyond.

Note that the “launchpad” database user does not have the privilege of deleting POFileTranslator records. I could just add that to facilitate testing, but it's going a bit far. Much as I hate the finicky SELECT restrictions of security.cfg, I do like the restricted write privileges for specialized tables.

Revision history for this message
Benji York (benji) wrote :

The changes look good to me. I can get behind fewer methods and more functions.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'database/schema/security.cfg'
2--- database/schema/security.cfg 2012-05-09 13:50:03 +0000
3+++ database/schema/security.cfg 2012-05-24 05:03:19 +0000
4@@ -2181,6 +2181,7 @@
5 public.openidconsumerassociation = SELECT, DELETE
6 public.openidconsumernonce = SELECT, DELETE
7 public.person = SELECT, DELETE
8+public.pofiletranslator = SELECT, INSERT, UPDATE, DELETE
9 public.potranslation = SELECT, DELETE
10 public.potmsgset = SELECT, DELETE
11 public.product = SELECT
12
13=== modified file 'lib/lp/scripts/garbo.py'
14--- lib/lp/scripts/garbo.py 2012-05-09 01:35:41 +0000
15+++ lib/lp/scripts/garbo.py 2012-05-24 05:03:19 +0000
16@@ -105,6 +105,9 @@
17 from lp.translations.model.translationtemplateitem import (
18 TranslationTemplateItem,
19 )
20+from lp.translations.scripts.scrub_pofiletranslator import (
21+ ScrubPOFileTranslator,
22+ )
23
24
25 ONE_DAY_IN_SECONDS = 24 * 60 * 60
26@@ -1418,6 +1421,7 @@
27 ObsoleteBugAttachmentPruner,
28 OldTimeLimitedTokenDeleter,
29 RevisionAuthorEmailLinker,
30+ ScrubPOFileTranslator,
31 SuggestiveTemplatesCacheUpdater,
32 POTranslationPruner,
33 UnusedPOTMsgSetPruner,
34
35=== added file 'lib/lp/translations/scripts/scrub_pofiletranslator.py'
36--- lib/lp/translations/scripts/scrub_pofiletranslator.py 1970-01-01 00:00:00 +0000
37+++ lib/lp/translations/scripts/scrub_pofiletranslator.py 2012-05-24 05:03:19 +0000
38@@ -0,0 +1,161 @@
39+# Copyright 2012 Canonical Ltd. This software is licensed under the
40+# GNU Affero General Public License version 3 (see the file LICENSE).
41+
42+"""Keep `POFileTranslator` more or less consistent with the real data."""
43+
44+__metaclass__ = type
45+__all__ = [
46+ 'ScrubPOFileTranslator',
47+ ]
48+
49+from storm.expr import (
50+ Coalesce,
51+ Desc,
52+ )
53+import transaction
54+
55+from lp.services.database.lpstorm import IStore
56+from lp.services.looptuner import TunableLoop
57+from lp.translations.model.pofile import POFile
58+from lp.translations.model.pofiletranslator import POFileTranslator
59+from lp.translations.model.potemplate import POTemplate
60+from lp.translations.model.translationmessage import TranslationMessage
61+from lp.translations.model.translationtemplateitem import (
62+ TranslationTemplateItem,
63+ )
64+
65+
66+def get_pofiles():
67+ """Retrieve POFiles to scrub.
68+
69+ The result's ordering is aimed at maximizing cache effectiveness:
70+ by POTemplate name for locality of shared POTMsgSets, and by language
71+ for locality of shared TranslationMessages.
72+ """
73+ store = IStore(POFile)
74+ query = store.find(
75+ POFile,
76+ POFile.potemplateID == POTemplate.id,
77+ POTemplate.iscurrent == True)
78+ return query.order_by(POTemplate.name, POFile.languageID)
79+
80+
81+def get_contributions(pofile):
82+ """Map all users' most recent contributions to `pofile`.
83+
84+ Returns a dict mapping `Person` id to the creation time of their most
85+ recent `TranslationMessage` in `POFile`.
86+
87+ This leaves some small room for error: a contribution that is masked by
88+ a diverged entry in this POFile will nevertheless produce a
89+ POFileTranslator record. Fixing that would complicate the work more than
90+ it is probably worth.
91+ """
92+ store = IStore(pofile)
93+ potmsgset_ids = store.find(
94+ TranslationTemplateItem.potmsgsetID,
95+ TranslationTemplateItem.potemplateID == pofile.potemplate.id,
96+ TranslationTemplateItem.sequence > 0)
97+ contribs = store.find(
98+ (TranslationMessage.submitterID, TranslationMessage.date_created),
99+ TranslationMessage.potmsgsetID.is_in(potmsgset_ids),
100+ TranslationMessage.languageID == pofile.language.id,
101+ TranslationMessage.msgstr0 != None,
102+ Coalesce(
103+ TranslationMessage.potemplateID,
104+ pofile.potemplate.id) == pofile.potemplate.id)
105+ contribs = contribs.config(distinct=(TranslationMessage.submitterID,))
106+ contribs = contribs.order_by(
107+ TranslationMessage.submitterID, Desc(TranslationMessage.date_created))
108+ return dict(contribs)
109+
110+
111+def get_pofiletranslators(pofile):
112+ """Get `POFileTranslator` entries for `pofile`.
113+
114+ Returns a dict mapping each contributor's person id to their
115+ `POFileTranslator` record.
116+ """
117+ store = IStore(pofile)
118+ pofts = store.find(
119+ POFileTranslator, POFileTranslator.pofileID == pofile.id)
120+ return dict((poft.personID, poft) for poft in pofts)
121+
122+
123+def remove_pofiletranslators(logger, pofile, person_ids):
124+ """Delete `POFileTranslator` records."""
125+ logger.debug(
126+ "Removing %d POFileTranslator(s) for %s.",
127+ len(person_ids), pofile.title)
128+ store = IStore(pofile)
129+ pofts = store.find(
130+ POFileTranslator,
131+ POFileTranslator.pofileID == pofile.id,
132+ POFileTranslator.personID.is_in(person_ids))
133+ pofts.remove()
134+
135+
136+def remove_unwarranted_pofiletranslators(logger, pofile, pofts, contribs):
137+ """Delete `POFileTranslator` records that shouldn't be there."""
138+ excess = set(pofts) - set(contribs)
139+ if len(excess) > 0:
140+ remove_pofiletranslators(logger, pofile, excess)
141+
142+
143+def create_missing_pofiletranslators(logger, pofile, pofts, contribs):
144+ """Create `POFileTranslator` records that were missing."""
145+ shortage = set(contribs) - set(pofts)
146+ if len(shortage) == 0:
147+ return
148+ logger.debug(
149+ "Adding %d POFileTranslator(s) for %s.",
150+ len(shortage), pofile.title)
151+ store = IStore(pofile)
152+ for missing_contributor in shortage:
153+ store.add(POFileTranslator(
154+ pofile=pofile, personID=missing_contributor,
155+ date_last_touched=contribs[missing_contributor]))
156+
157+
158+def scrub_pofile(logger, pofile):
159+ """Scrub `POFileTranslator` entries for one `POFile`.
160+
161+ Removes inappropriate entries and adds missing ones.
162+ """
163+ contribs = get_contributions(pofile)
164+ pofiletranslators = get_pofiletranslators(pofile)
165+ remove_unwarranted_pofiletranslators(
166+ logger, pofile, pofiletranslators, contribs)
167+ create_missing_pofiletranslators(
168+ logger, pofile, pofiletranslators, contribs)
169+
170+
171+class ScrubPOFileTranslator(TunableLoop):
172+ """Tunable loop, meant for running from inside Garbo."""
173+
174+ maximum_chunk_size = 500
175+
176+ def __init__(self, *args, **kwargs):
177+ super(ScrubPOFileTranslator, self).__init__(*args, **kwargs)
178+ # This does not listify the POFiles; they are batch-fetched on
179+ # demand. So iteration may not be entirely exact, but it
180+ # doesn't really need to be. It avoids loading all those
181+ # POFiles into memory when we only need a few per iteration.
182+ self.pofiles = get_pofiles()
183+ self.next_offset = 0
184+
185+ def __call__(self, chunk_size):
186+ """See `ITunableLoop`."""
187+ start_offset = self.next_offset
188+ self.next_offset = start_offset + int(chunk_size)
189+ batch = list(self.pofiles[start_offset:self.next_offset])
190+ if len(batch) == 0:
191+ self.next_offset = None
192+ else:
193+ for pofile in batch:
194+ scrub_pofile(self.log, pofile)
195+ transaction.commit()
196+
197+ def isDone(self):
198+ """See `ITunableLoop`."""
199+ return self.next_offset is None
200
201=== added file 'lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py'
202--- lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py 1970-01-01 00:00:00 +0000
203+++ lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py 2012-05-24 05:03:19 +0000
204@@ -0,0 +1,219 @@
205+# Copyright 2012 Canonical Ltd. This software is licensed under the
206+# GNU Affero General Public License version 3 (see the file LICENSE).
207+
208+"""Test scrubbing of `POFileTranslator`."""
209+
210+__metaclass__ = type
211+
212+from datetime import (
213+ datetime,
214+ timedelta,
215+ )
216+
217+import pytz
218+import transaction
219+
220+from lp.services.database.constants import UTC_NOW
221+from lp.services.database.lpstorm import IStore
222+from lp.services.log.logger import DevNullLogger
223+from lp.testing import TestCaseWithFactory
224+from lp.testing.layers import ZopelessDatabaseLayer
225+from lp.translations.model.pofiletranslator import POFileTranslator
226+from lp.translations.scripts.scrub_pofiletranslator import (
227+ get_contributions,
228+ get_pofiles,
229+ get_pofiletranslators,
230+ scrub_pofile,
231+ ScrubPOFileTranslator,
232+ )
233+
234+
235+fake_logger = DevNullLogger()
236+
237+
238+def size_distance(sequence, item1, item2):
239+ """Return the absolute distance between items in a sequence."""
240+ container = list(sequence)
241+ return abs(container.index(item2) - container.index(item1))
242+
243+
244+class TestScrubPOFileTranslator(TestCaseWithFactory):
245+
246+ layer = ZopelessDatabaseLayer
247+
248+ def query_pofiletranslator(self, pofile, person):
249+ """Query `POFileTranslator` for a specific record.
250+
251+ :return: Storm result set.
252+ """
253+ store = IStore(pofile)
254+ return store.find(POFileTranslator, pofile=pofile, person=person)
255+
256+ def make_message_with_pofiletranslator(self, pofile=None):
257+ """Create a normal `TranslationMessage` with `POFileTranslator`."""
258+ if pofile is None:
259+ pofile = self.factory.makePOFile()
260+ potmsgset = self.factory.makePOTMsgSet(
261+ potemplate=pofile.potemplate, sequence=1)
262+ # A database trigger on TranslationMessage automatically creates
263+ # a POFileTranslator record for each new TranslationMessage.
264+ return self.factory.makeSuggestion(pofile=pofile, potmsgset=potmsgset)
265+
266+ def make_message_without_pofiletranslator(self, pofile=None):
267+ """Create a `TranslationMessage` without `POFileTranslator`."""
268+ tm = self.make_message_with_pofiletranslator(pofile)
269+ IStore(pofile).flush()
270+ self.becomeDbUser('postgres')
271+ self.query_pofiletranslator(pofile, tm.submitter).remove()
272+ return tm
273+
274+ def make_pofiletranslator_without_message(self, pofile=None):
275+ """Create a `POFileTranslator` without `TranslationMessage`."""
276+ if pofile is None:
277+ pofile = self.factory.makePOFile()
278+ poft = POFileTranslator(
279+ pofile=pofile, person=self.factory.makePerson(),
280+ date_last_touched=UTC_NOW)
281+ IStore(poft.pofile).add(poft)
282+ return poft
283+
284+ def test_get_pofiles_gets_pofiles_for_active_templates(self):
285+ pofile = self.factory.makePOFile()
286+ self.assertIn(pofile, get_pofiles())
287+
288+ def test_get_pofiles_skips_inactive_templates(self):
289+ pofile = self.factory.makePOFile()
290+ pofile.potemplate.iscurrent = False
291+ self.assertNotIn(pofile, get_pofiles())
292+
293+ def test_get_pofiles_clusters_by_template_name(self):
294+ # POFiles for templates with the same name are bunched together
295+ # in the get_pofiles() output.
296+ templates = [
297+ self.factory.makePOTemplate(name='shared'),
298+ self.factory.makePOTemplate(name='other'),
299+ self.factory.makePOTemplate(name='andanother'),
300+ self.factory.makePOTemplate(
301+ name='shared', distroseries=self.factory.makeDistroSeries()),
302+ ]
303+ pofiles = [
304+ self.factory.makePOFile(potemplate=template)
305+ for template in templates]
306+ ordering = get_pofiles()
307+ self.assertEqual(1, size_distance(ordering, pofiles[0], pofiles[-1]))
308+
309+ def test_get_pofiles_clusters_by_language(self):
310+ # POFiles for sharing templates and the same language are
311+ # bunched together in the get_pofiles() output.
312+ templates = [
313+ self.factory.makePOTemplate(
314+ name='shared', distroseries=self.factory.makeDistroSeries())
315+ for counter in range(2)]
316+ # POFiles per language & template. We create these in a strange
317+ # way to avoid the risk of mistaking accidental orderings such
318+ # as per-id from being mistaken for the proper order.
319+ languages = ['nl', 'fr']
320+ pofiles_per_language = dict((language, []) for language in languages)
321+ for language, pofiles in pofiles_per_language.items():
322+ for template in templates:
323+ pofiles.append(
324+ self.factory.makePOFile(language, potemplate=template))
325+
326+ ordering = get_pofiles()
327+ for pofiles in pofiles_per_language.values():
328+ self.assertEqual(
329+ 1, size_distance(ordering, pofiles[0], pofiles[1]))
330+
331+ def test_get_contributions_gets_contributions(self):
332+ pofile = self.factory.makePOFile()
333+ tm = self.factory.makeSuggestion(pofile=pofile)
334+ self.assertEqual(
335+ {tm.submitter.id: tm.date_created}, get_contributions(pofile))
336+
337+ def test_get_contributions_uses_latest_contribution(self):
338+ pofile = self.factory.makePOFile()
339+ today = datetime.now(pytz.UTC)
340+ yesterday = today - timedelta(1, 1, 1)
341+ old_tm = self.factory.makeSuggestion(
342+ pofile=pofile, date_created=yesterday)
343+ new_tm = self.factory.makeSuggestion(
344+ translator=old_tm.submitter, pofile=pofile, date_created=today)
345+ self.assertNotEqual(old_tm.date_created, new_tm.date_created)
346+ self.assertContentEqual(
347+ [new_tm.date_created], get_contributions(pofile).values())
348+
349+ def test_get_contributions_ignores_inactive_potmsgsets(self):
350+ pofile = self.factory.makePOFile()
351+ potmsgset = self.factory.makePOTMsgSet(
352+ potemplate=pofile.potemplate, sequence=0)
353+ self.factory.makeSuggestion(pofile=pofile, potmsgset=potmsgset)
354+ self.assertEqual({}, get_contributions(pofile))
355+
356+ def test_get_contributions_includes_diverged_messages_for_template(self):
357+ pofile = self.factory.makePOFile()
358+ tm = self.factory.makeSuggestion(pofile=pofile)
359+ tm.potemplate = pofile.potemplate
360+ self.assertContentEqual(
361+ [tm.submitter.id], get_contributions(pofile).keys())
362+
363+ def test_get_contributions_excludes_other_diverged_messages(self):
364+ pofile = self.factory.makePOFile()
365+ tm = self.factory.makeSuggestion(pofile=pofile)
366+ tm.potemplate = self.factory.makePOTemplate()
367+ self.assertEqual({}, get_contributions(pofile))
368+
369+ def test_get_pofiletranslators_gets_pofiletranslators_for_pofile(self):
370+ pofile = self.factory.makePOFile()
371+ tm = self.make_message_with_pofiletranslator(pofile)
372+ pofts = get_pofiletranslators(pofile)
373+ self.assertContentEqual([tm.submitter.id], pofts.keys())
374+ poft = pofts[tm.submitter.id]
375+ self.assertEqual(pofile, poft.pofile)
376+
377+ def test_scrub_pofile_leaves_good_pofiletranslator_in_place(self):
378+ pofile = self.factory.makePOFile()
379+ tm = self.make_message_with_pofiletranslator(pofile)
380+ old_poft = self.query_pofiletranslator(pofile, tm.submitter).one()
381+
382+ scrub_pofile(fake_logger, pofile)
383+
384+ new_poft = self.query_pofiletranslator(pofile, tm.submitter).one()
385+ self.assertEqual(old_poft, new_poft)
386+
387+ def test_scrub_pofile_deletes_unwarranted_entries(self):
388+ # Deleting POFileTranslator records is not something the app
389+ # server ever does, so it requires special privileges.
390+ self.becomeDbUser('postgres')
391+ poft = self.make_pofiletranslator_without_message()
392+ (pofile, person) = (poft.pofile, poft.person)
393+ scrub_pofile(fake_logger, poft.pofile)
394+ self.assertIsNone(self.query_pofiletranslator(pofile, person).one())
395+
396+ def test_scrub_pofile_adds_missing_entries(self):
397+ pofile = self.factory.makePOFile()
398+ tm = self.make_message_without_pofiletranslator(pofile)
399+
400+ scrub_pofile(fake_logger, pofile)
401+
402+ new_poft = self.query_pofiletranslator(pofile, tm.submitter).one()
403+ self.assertEqual(tm.submitter, new_poft.person)
404+ self.assertEqual(pofile, new_poft.pofile)
405+
406+ def test_tunable_loop(self):
407+ pofile = self.factory.makePOFile()
408+ tm = self.make_message_without_pofiletranslator(pofile)
409+ bad_poft = self.make_pofiletranslator_without_message(pofile)
410+ noncontributor = bad_poft.person
411+ transaction.commit()
412+
413+ self.becomeDbUser('garbo')
414+ ScrubPOFileTranslator(fake_logger).run()
415+ # Try to break the loop if it failed to commit its changes.
416+ transaction.abort()
417+
418+ # The unwarranted POFileTranslator record has been deleted.
419+ self.assertIsNotNone(
420+ self.query_pofiletranslator(pofile, tm.submitter).one())
421+ # The missing POFileTranslator has been created.
422+ self.assertIsNone(
423+ self.query_pofiletranslator(pofile, noncontributor).one())