Merge lp:~jtv/launchpad/bug-994650-scrub-pofiletranslator into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merge reported by: Jeroen T. Vermeulen
Merged at revision: not available
Proposed branch: lp:~jtv/launchpad/bug-994650-scrub-pofiletranslator
Merge into: lp:launchpad
Diff against target: 423 lines (+385/-0)
4 files modified
database/schema/security.cfg (+1/-0)
lib/lp/scripts/garbo.py (+4/-0)
lib/lp/translations/scripts/scrub_pofiletranslator.py (+161/-0)
lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py (+219/-0)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-994650-scrub-pofiletranslator
Reviewer Review Type Date Requested Status
Benji York (community) code Approve
Review via email: mp+104866@code.launchpad.net

Commit message

POFileTranslator scrubber script.

Description of the change

This is as discussed on the internal mailing list. In a series of branches that simplifies and streamlines POFileTranslator management, this is the only one that introduces new code. It won't run while we still have POFileTranslator.latest_message, which I'm dropping in a separate pair of branches (devel/db-devel).

POFileTranslator is described as an “eager materialized view.” It's a cache of who contributed TranslationMessages to which POFiles, and when they last did.

What you see in this branch is a scrubbing script for POFileTranslator. Instead of trying (and in many ways failing) to maintain that table synchronously using a long and complex database trigger, Launchpad will keep it “eventually consistent” by running this script. The script doesn't care how data got where it is, so some painfully complex script code that updates the table can disappear. Instead, the script just finds out which entries are missing and which ones need adding, and does that.

The script may still need optimizing. I deliberately kept it simple. In the current code it would have been easy to check for and correct timestamps on the contributions as well, but I think that's of secondary importantance. Where we have a justified POFileTranslator record, its timestamp is likely to be correct; the known problems involve records being omitted altogether. And simplifications I'm introducing elsewhere may also cause records to be omitted, or left in place when they should be moved to different POFiles, or deleted.

To post a comment you must log in.
Revision history for this message
Benji York (benji) wrote :

This branch looks good. Here are a couple of things I spotted
while reading over it.

I believe that this bit is a syntax error in Python 2.6 as dict
comprehensions were added in 2.7. (line 132 of the diff, from
scrub_pofiletranslator.py):

    return {poft.personID: poft for poft in pofts}

Converting it to something like this should work (untested):

    return dict((poft.personID, poft) for poft in pofts)

Since you wrap all calls to measure_distance in abs, you could just put
the abs in measure_distance.

review: Approve (code)
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

It's a good thing you spot these things. A few months on 2.7 and I'm forgetting that we still need 2.6.

The hard part of making measure_distance return an absolute value is coming up with a good name. I arrived at the verb “to size”: size_distance. Which also happens to be shorter.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Late-breaking change: after a chat with StevenK, I turned this into a Garbo loop. See revisions 15215 and beyond.

Note that the “launchpad” database user does not have the privilege of deleting POFileTranslator records. I could just add that to facilitate testing, but it's going a bit far. Much as I hate the finicky SELECT restrictions of security.cfg, I do like the restricted write privileges for specialized tables.

Revision history for this message
Benji York (benji) wrote :

The changes look good to me. I can get behind fewer methods and more functions.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'database/schema/security.cfg'
--- database/schema/security.cfg 2012-05-09 13:50:03 +0000
+++ database/schema/security.cfg 2012-05-24 05:03:19 +0000
@@ -2181,6 +2181,7 @@
2181public.openidconsumerassociation = SELECT, DELETE2181public.openidconsumerassociation = SELECT, DELETE
2182public.openidconsumernonce = SELECT, DELETE2182public.openidconsumernonce = SELECT, DELETE
2183public.person = SELECT, DELETE2183public.person = SELECT, DELETE
2184public.pofiletranslator = SELECT, INSERT, UPDATE, DELETE
2184public.potranslation = SELECT, DELETE2185public.potranslation = SELECT, DELETE
2185public.potmsgset = SELECT, DELETE2186public.potmsgset = SELECT, DELETE
2186public.product = SELECT2187public.product = SELECT
21872188
=== modified file 'lib/lp/scripts/garbo.py'
--- lib/lp/scripts/garbo.py 2012-05-09 01:35:41 +0000
+++ lib/lp/scripts/garbo.py 2012-05-24 05:03:19 +0000
@@ -105,6 +105,9 @@
105from lp.translations.model.translationtemplateitem import (105from lp.translations.model.translationtemplateitem import (
106 TranslationTemplateItem,106 TranslationTemplateItem,
107 )107 )
108from lp.translations.scripts.scrub_pofiletranslator import (
109 ScrubPOFileTranslator,
110 )
108111
109112
110ONE_DAY_IN_SECONDS = 24 * 60 * 60113ONE_DAY_IN_SECONDS = 24 * 60 * 60
@@ -1418,6 +1421,7 @@
1418 ObsoleteBugAttachmentPruner,1421 ObsoleteBugAttachmentPruner,
1419 OldTimeLimitedTokenDeleter,1422 OldTimeLimitedTokenDeleter,
1420 RevisionAuthorEmailLinker,1423 RevisionAuthorEmailLinker,
1424 ScrubPOFileTranslator,
1421 SuggestiveTemplatesCacheUpdater,1425 SuggestiveTemplatesCacheUpdater,
1422 POTranslationPruner,1426 POTranslationPruner,
1423 UnusedPOTMsgSetPruner,1427 UnusedPOTMsgSetPruner,
14241428
=== added file 'lib/lp/translations/scripts/scrub_pofiletranslator.py'
--- lib/lp/translations/scripts/scrub_pofiletranslator.py 1970-01-01 00:00:00 +0000
+++ lib/lp/translations/scripts/scrub_pofiletranslator.py 2012-05-24 05:03:19 +0000
@@ -0,0 +1,161 @@
1# Copyright 2012 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Keep `POFileTranslator` more or less consistent with the real data."""
5
6__metaclass__ = type
7__all__ = [
8 'ScrubPOFileTranslator',
9 ]
10
11from storm.expr import (
12 Coalesce,
13 Desc,
14 )
15import transaction
16
17from lp.services.database.lpstorm import IStore
18from lp.services.looptuner import TunableLoop
19from lp.translations.model.pofile import POFile
20from lp.translations.model.pofiletranslator import POFileTranslator
21from lp.translations.model.potemplate import POTemplate
22from lp.translations.model.translationmessage import TranslationMessage
23from lp.translations.model.translationtemplateitem import (
24 TranslationTemplateItem,
25 )
26
27
28def get_pofiles():
29 """Retrieve POFiles to scrub.
30
31 The result's ordering is aimed at maximizing cache effectiveness:
32 by POTemplate name for locality of shared POTMsgSets, and by language
33 for locality of shared TranslationMessages.
34 """
35 store = IStore(POFile)
36 query = store.find(
37 POFile,
38 POFile.potemplateID == POTemplate.id,
39 POTemplate.iscurrent == True)
40 return query.order_by(POTemplate.name, POFile.languageID)
41
42
43def get_contributions(pofile):
44 """Map all users' most recent contributions to `pofile`.
45
46 Returns a dict mapping `Person` id to the creation time of their most
47 recent `TranslationMessage` in `POFile`.
48
49 This leaves some small room for error: a contribution that is masked by
50 a diverged entry in this POFile will nevertheless produce a
51 POFileTranslator record. Fixing that would complicate the work more than
52 it is probably worth.
53 """
54 store = IStore(pofile)
55 potmsgset_ids = store.find(
56 TranslationTemplateItem.potmsgsetID,
57 TranslationTemplateItem.potemplateID == pofile.potemplate.id,
58 TranslationTemplateItem.sequence > 0)
59 contribs = store.find(
60 (TranslationMessage.submitterID, TranslationMessage.date_created),
61 TranslationMessage.potmsgsetID.is_in(potmsgset_ids),
62 TranslationMessage.languageID == pofile.language.id,
63 TranslationMessage.msgstr0 != None,
64 Coalesce(
65 TranslationMessage.potemplateID,
66 pofile.potemplate.id) == pofile.potemplate.id)
67 contribs = contribs.config(distinct=(TranslationMessage.submitterID,))
68 contribs = contribs.order_by(
69 TranslationMessage.submitterID, Desc(TranslationMessage.date_created))
70 return dict(contribs)
71
72
73def get_pofiletranslators(pofile):
74 """Get `POFileTranslator` entries for `pofile`.
75
76 Returns a dict mapping each contributor's person id to their
77 `POFileTranslator` record.
78 """
79 store = IStore(pofile)
80 pofts = store.find(
81 POFileTranslator, POFileTranslator.pofileID == pofile.id)
82 return dict((poft.personID, poft) for poft in pofts)
83
84
85def remove_pofiletranslators(logger, pofile, person_ids):
86 """Delete `POFileTranslator` records."""
87 logger.debug(
88 "Removing %d POFileTranslator(s) for %s.",
89 len(person_ids), pofile.title)
90 store = IStore(pofile)
91 pofts = store.find(
92 POFileTranslator,
93 POFileTranslator.pofileID == pofile.id,
94 POFileTranslator.personID.is_in(person_ids))
95 pofts.remove()
96
97
98def remove_unwarranted_pofiletranslators(logger, pofile, pofts, contribs):
99 """Delete `POFileTranslator` records that shouldn't be there."""
100 excess = set(pofts) - set(contribs)
101 if len(excess) > 0:
102 remove_pofiletranslators(logger, pofile, excess)
103
104
105def create_missing_pofiletranslators(logger, pofile, pofts, contribs):
106 """Create `POFileTranslator` records that were missing."""
107 shortage = set(contribs) - set(pofts)
108 if len(shortage) == 0:
109 return
110 logger.debug(
111 "Adding %d POFileTranslator(s) for %s.",
112 len(shortage), pofile.title)
113 store = IStore(pofile)
114 for missing_contributor in shortage:
115 store.add(POFileTranslator(
116 pofile=pofile, personID=missing_contributor,
117 date_last_touched=contribs[missing_contributor]))
118
119
120def scrub_pofile(logger, pofile):
121 """Scrub `POFileTranslator` entries for one `POFile`.
122
123 Removes inappropriate entries and adds missing ones.
124 """
125 contribs = get_contributions(pofile)
126 pofiletranslators = get_pofiletranslators(pofile)
127 remove_unwarranted_pofiletranslators(
128 logger, pofile, pofiletranslators, contribs)
129 create_missing_pofiletranslators(
130 logger, pofile, pofiletranslators, contribs)
131
132
133class ScrubPOFileTranslator(TunableLoop):
134 """Tunable loop, meant for running from inside Garbo."""
135
136 maximum_chunk_size = 500
137
138 def __init__(self, *args, **kwargs):
139 super(ScrubPOFileTranslator, self).__init__(*args, **kwargs)
140 # This does not listify the POFiles; they are batch-fetched on
141 # demand. So iteration may not be entirely exact, but it
142 # doesn't really need to be. It avoids loading all those
143 # POFiles into memory when we only need a few per iteration.
144 self.pofiles = get_pofiles()
145 self.next_offset = 0
146
147 def __call__(self, chunk_size):
148 """See `ITunableLoop`."""
149 start_offset = self.next_offset
150 self.next_offset = start_offset + int(chunk_size)
151 batch = list(self.pofiles[start_offset:self.next_offset])
152 if len(batch) == 0:
153 self.next_offset = None
154 else:
155 for pofile in batch:
156 scrub_pofile(self.log, pofile)
157 transaction.commit()
158
159 def isDone(self):
160 """See `ITunableLoop`."""
161 return self.next_offset is None
0162
=== added file 'lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py'
--- lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py 1970-01-01 00:00:00 +0000
+++ lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py 2012-05-24 05:03:19 +0000
@@ -0,0 +1,219 @@
1# Copyright 2012 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Test scrubbing of `POFileTranslator`."""
5
6__metaclass__ = type
7
8from datetime import (
9 datetime,
10 timedelta,
11 )
12
13import pytz
14import transaction
15
16from lp.services.database.constants import UTC_NOW
17from lp.services.database.lpstorm import IStore
18from lp.services.log.logger import DevNullLogger
19from lp.testing import TestCaseWithFactory
20from lp.testing.layers import ZopelessDatabaseLayer
21from lp.translations.model.pofiletranslator import POFileTranslator
22from lp.translations.scripts.scrub_pofiletranslator import (
23 get_contributions,
24 get_pofiles,
25 get_pofiletranslators,
26 scrub_pofile,
27 ScrubPOFileTranslator,
28 )
29
30
31fake_logger = DevNullLogger()
32
33
34def size_distance(sequence, item1, item2):
35 """Return the absolute distance between items in a sequence."""
36 container = list(sequence)
37 return abs(container.index(item2) - container.index(item1))
38
39
40class TestScrubPOFileTranslator(TestCaseWithFactory):
41
42 layer = ZopelessDatabaseLayer
43
44 def query_pofiletranslator(self, pofile, person):
45 """Query `POFileTranslator` for a specific record.
46
47 :return: Storm result set.
48 """
49 store = IStore(pofile)
50 return store.find(POFileTranslator, pofile=pofile, person=person)
51
52 def make_message_with_pofiletranslator(self, pofile=None):
53 """Create a normal `TranslationMessage` with `POFileTranslator`."""
54 if pofile is None:
55 pofile = self.factory.makePOFile()
56 potmsgset = self.factory.makePOTMsgSet(
57 potemplate=pofile.potemplate, sequence=1)
58 # A database trigger on TranslationMessage automatically creates
59 # a POFileTranslator record for each new TranslationMessage.
60 return self.factory.makeSuggestion(pofile=pofile, potmsgset=potmsgset)
61
62 def make_message_without_pofiletranslator(self, pofile=None):
63 """Create a `TranslationMessage` without `POFileTranslator`."""
64 tm = self.make_message_with_pofiletranslator(pofile)
65 IStore(pofile).flush()
66 self.becomeDbUser('postgres')
67 self.query_pofiletranslator(pofile, tm.submitter).remove()
68 return tm
69
70 def make_pofiletranslator_without_message(self, pofile=None):
71 """Create a `POFileTranslator` without `TranslationMessage`."""
72 if pofile is None:
73 pofile = self.factory.makePOFile()
74 poft = POFileTranslator(
75 pofile=pofile, person=self.factory.makePerson(),
76 date_last_touched=UTC_NOW)
77 IStore(poft.pofile).add(poft)
78 return poft
79
80 def test_get_pofiles_gets_pofiles_for_active_templates(self):
81 pofile = self.factory.makePOFile()
82 self.assertIn(pofile, get_pofiles())
83
84 def test_get_pofiles_skips_inactive_templates(self):
85 pofile = self.factory.makePOFile()
86 pofile.potemplate.iscurrent = False
87 self.assertNotIn(pofile, get_pofiles())
88
89 def test_get_pofiles_clusters_by_template_name(self):
90 # POFiles for templates with the same name are bunched together
91 # in the get_pofiles() output.
92 templates = [
93 self.factory.makePOTemplate(name='shared'),
94 self.factory.makePOTemplate(name='other'),
95 self.factory.makePOTemplate(name='andanother'),
96 self.factory.makePOTemplate(
97 name='shared', distroseries=self.factory.makeDistroSeries()),
98 ]
99 pofiles = [
100 self.factory.makePOFile(potemplate=template)
101 for template in templates]
102 ordering = get_pofiles()
103 self.assertEqual(1, size_distance(ordering, pofiles[0], pofiles[-1]))
104
105 def test_get_pofiles_clusters_by_language(self):
106 # POFiles for sharing templates and the same language are
107 # bunched together in the get_pofiles() output.
108 templates = [
109 self.factory.makePOTemplate(
110 name='shared', distroseries=self.factory.makeDistroSeries())
111 for counter in range(2)]
112 # POFiles per language & template. We create these in a strange
113 # way to avoid the risk of mistaking accidental orderings such
114 # as per-id from being mistaken for the proper order.
115 languages = ['nl', 'fr']
116 pofiles_per_language = dict((language, []) for language in languages)
117 for language, pofiles in pofiles_per_language.items():
118 for template in templates:
119 pofiles.append(
120 self.factory.makePOFile(language, potemplate=template))
121
122 ordering = get_pofiles()
123 for pofiles in pofiles_per_language.values():
124 self.assertEqual(
125 1, size_distance(ordering, pofiles[0], pofiles[1]))
126
127 def test_get_contributions_gets_contributions(self):
128 pofile = self.factory.makePOFile()
129 tm = self.factory.makeSuggestion(pofile=pofile)
130 self.assertEqual(
131 {tm.submitter.id: tm.date_created}, get_contributions(pofile))
132
133 def test_get_contributions_uses_latest_contribution(self):
134 pofile = self.factory.makePOFile()
135 today = datetime.now(pytz.UTC)
136 yesterday = today - timedelta(1, 1, 1)
137 old_tm = self.factory.makeSuggestion(
138 pofile=pofile, date_created=yesterday)
139 new_tm = self.factory.makeSuggestion(
140 translator=old_tm.submitter, pofile=pofile, date_created=today)
141 self.assertNotEqual(old_tm.date_created, new_tm.date_created)
142 self.assertContentEqual(
143 [new_tm.date_created], get_contributions(pofile).values())
144
145 def test_get_contributions_ignores_inactive_potmsgsets(self):
146 pofile = self.factory.makePOFile()
147 potmsgset = self.factory.makePOTMsgSet(
148 potemplate=pofile.potemplate, sequence=0)
149 self.factory.makeSuggestion(pofile=pofile, potmsgset=potmsgset)
150 self.assertEqual({}, get_contributions(pofile))
151
152 def test_get_contributions_includes_diverged_messages_for_template(self):
153 pofile = self.factory.makePOFile()
154 tm = self.factory.makeSuggestion(pofile=pofile)
155 tm.potemplate = pofile.potemplate
156 self.assertContentEqual(
157 [tm.submitter.id], get_contributions(pofile).keys())
158
159 def test_get_contributions_excludes_other_diverged_messages(self):
160 pofile = self.factory.makePOFile()
161 tm = self.factory.makeSuggestion(pofile=pofile)
162 tm.potemplate = self.factory.makePOTemplate()
163 self.assertEqual({}, get_contributions(pofile))
164
165 def test_get_pofiletranslators_gets_pofiletranslators_for_pofile(self):
166 pofile = self.factory.makePOFile()
167 tm = self.make_message_with_pofiletranslator(pofile)
168 pofts = get_pofiletranslators(pofile)
169 self.assertContentEqual([tm.submitter.id], pofts.keys())
170 poft = pofts[tm.submitter.id]
171 self.assertEqual(pofile, poft.pofile)
172
173 def test_scrub_pofile_leaves_good_pofiletranslator_in_place(self):
174 pofile = self.factory.makePOFile()
175 tm = self.make_message_with_pofiletranslator(pofile)
176 old_poft = self.query_pofiletranslator(pofile, tm.submitter).one()
177
178 scrub_pofile(fake_logger, pofile)
179
180 new_poft = self.query_pofiletranslator(pofile, tm.submitter).one()
181 self.assertEqual(old_poft, new_poft)
182
183 def test_scrub_pofile_deletes_unwarranted_entries(self):
184 # Deleting POFileTranslator records is not something the app
185 # server ever does, so it requires special privileges.
186 self.becomeDbUser('postgres')
187 poft = self.make_pofiletranslator_without_message()
188 (pofile, person) = (poft.pofile, poft.person)
189 scrub_pofile(fake_logger, poft.pofile)
190 self.assertIsNone(self.query_pofiletranslator(pofile, person).one())
191
192 def test_scrub_pofile_adds_missing_entries(self):
193 pofile = self.factory.makePOFile()
194 tm = self.make_message_without_pofiletranslator(pofile)
195
196 scrub_pofile(fake_logger, pofile)
197
198 new_poft = self.query_pofiletranslator(pofile, tm.submitter).one()
199 self.assertEqual(tm.submitter, new_poft.person)
200 self.assertEqual(pofile, new_poft.pofile)
201
202 def test_tunable_loop(self):
203 pofile = self.factory.makePOFile()
204 tm = self.make_message_without_pofiletranslator(pofile)
205 bad_poft = self.make_pofiletranslator_without_message(pofile)
206 noncontributor = bad_poft.person
207 transaction.commit()
208
209 self.becomeDbUser('garbo')
210 ScrubPOFileTranslator(fake_logger).run()
211 # Try to break the loop if it failed to commit its changes.
212 transaction.abort()
213
214 # The unwarranted POFileTranslator record has been deleted.
215 self.assertIsNotNone(
216 self.query_pofiletranslator(pofile, tm.submitter).one())
217 # The missing POFileTranslator has been created.
218 self.assertIsNone(
219 self.query_pofiletranslator(pofile, noncontributor).one())