Merge lp:~jtv/launchpad/bug-407590 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Brad Crittenden
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~jtv/launchpad/bug-407590
Merge into: lp:launchpad
Diff against target: None lines
To merge this branch: bzr merge lp:~jtv/launchpad/bug-407590
Reviewer Review Type Date Requested Status
Brad Crittenden (community) Approve
Review via email: mp+11799@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :
Download full text (3.5 KiB)

= Bug 407590 =

384 lines of diff.

We have a cron script that exports snapshots of a ProductSeries'
translations to a bzr branch. Bug 407590 is that currently, this script
exports every POFile again at every run. This branch fixes that.

It's not as easy as it seems to find out whether a POFile has changed
since its last export. There is an attribute POFile.date_changed, but
that marks the beginning of the last change operation rather than its
completion. Worse, the timestamp on the last commit to the branch marks
the completion of the last export. For a conservative estimate of
whether a POFile had changed since the last export, we'd need these
exactly the other way around.

A more accurate solution would be to go through the files stored in the
branch and read the export dates from their headers. But that would
involve accessing the actual branch data and so risk the performance and
scalability that the script currently has.

We could also have extended our database schema to store timing
information, but that adds complexity at a time when we are more focused
on the 3.0 release. It's also easy to make subtle mistakes in how the
information is associated with branches or productseries and when it
should be cleared.

Instead, I chose to get the date of the last translations commit to the
branch and subtract a fixed "fudge factor." As long as the error in the
time window is less than this fudge factor, the worst that can happen is
that a file is exported more often than needed. That's basically what
is already happening anyway, just less of it. Dirty, simple, and
probably effective.

== Performance ==

The performance overhead of this approach is limited to fetching past
revisions to the branch. Typically several revisions will fit in a
single disk block, so fetching these should be quite cheap. To avoid
scanning very long histories in vain, there is a cutoff age: if the
script fails to find a previous translations commit in the past week, it
assumes that there isn't one and just exports everything.

We may want to improve on this algorithm in the feature. For now, it
should cut down on the script's workload enough to avoid its performance
problems with ddtp-ubuntu, which currently takes over 6 hours to export
in full.

Actually, there was another problem with the ddtp-ubuntu exports: their
memory usage. Just in case we're keeping semantic garbage referenced,
I also added some clearing of POTemplate's internal POFile cache. I
have no particular indication that this will solve the problem, but it's
a cheap win if it does.

== Diff details ==

In translations_to_branch.py, _makeDirectBranchCommit: the parameter was
called bzrbranch, but that was simply wrong. It was always a Launchpad
Branch object.

_isTranslationsCommit recognizes the translations commits by their fixed
commit message. We considered tagging the commits, but that turned out
to be harder than expected and require extending the bzrlib API. There
is also the username, but that might change as easily as the commit
message. Plus, in the future we may sometimes find it useful for users
to be able to fake a translations commit e.g. to prevent very large
sudden e...

Read more...

Revision history for this message
Brad Crittenden (bac) wrote :

Jeroen this fix looks good. I think you've made a reasoned choice for your approach and the implementation and testing looks a-ok.

Could I trouble you to change the output message at line 25 to be "Committed %d" and the corresponding tests? It just reads better and is consistent.

Thanks for the fix.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/translations/doc/translations-export-to-branch.txt'
2--- lib/lp/translations/doc/translations-export-to-branch.txt 2009-08-03 11:40:57 +0000
3+++ lib/lp/translations/doc/translations-export-to-branch.txt 2009-09-15 15:30:59 +0000
4@@ -26,8 +26,9 @@
5
6 >>> class MockDirectBranchCommit:
7 ... """Mock DirectBranchCommit. Prints actions."""
8- ...
9 ... simulate_race = False
10+ ... bzrbranch = None
11+ ... written_files = 0
12 ...
13 ... def __init__(self, logger):
14 ... self.logger = logger
15@@ -35,6 +36,7 @@
16 ... def writeFile(self, path, contents):
17 ... self.logger.info("Writing file '%s':" % path)
18 ... self.logger.info(contents)
19+ ... self.written_files += 1
20 ...
21 ... def lockForCommit(self):
22 ... pass
23@@ -45,7 +47,7 @@
24 ...
25 ... def commit(self, message=None):
26 ... self._checkForRace()
27- ... self.logger.info("Commit.")
28+ ... self.logger.info("Commit %d file(s)." % self.written_files)
29 ...
30 ... def unlock(self):
31 ... self.logger.info("Unlock.")
32@@ -53,14 +55,18 @@
33 >>> from canonical.launchpad.scripts.logger import FakeLogger
34
35 >>> class MockExportTranslationsToBranch(ExportTranslationsToBranch):
36- ...
37+ ... """Test version of ExportTranslationsToBranch."""
38 ... simulate_race = False
39+ ... simulated_latest_commit = None
40 ...
41 ... def __init__(self, *args, **kwargs):
42 ... super(MockExportTranslationsToBranch, self).__init__(
43 ... *args, **kwargs)
44 ... self.logger = FakeLogger()
45 ...
46+ ... def _getLatestTranslationsCommit(self, branch):
47+ ... return self.simulated_latest_commit
48+ ...
49 ... def _makeDirectBranchCommit(self, bzrbranch):
50 ... committer = MockDirectBranchCommit(self.logger)
51 ... committer.simulate_race = self.simulate_race
52@@ -106,14 +112,16 @@
53 ...
54 ... if not iscurrent:
55 ... removeSecurityProxy(template).iscurrent = False
56+ ...
57+ ... return pofile
58
59- >>> main_template = setup_template_and_translations(
60+ >>> main_pofile = setup_template_and_translations(
61 ... 'po/main/gazpot.pot', 'maingazpot')
62
63- >>> module_template = setup_template_and_translations(
64+ >>> module_pofile = setup_template_and_translations(
65 ... 'po/module/module.pot', 'gazmod')
66
67- >>> old_template = setup_template_and_translations(
68+ >>> old_pofile = setup_template_and_translations(
69 ... 'po/gazpot.pot', 'oldgazpot', iscurrent=False)
70
71 When the translations-export-to-branch script runs, it feeds the
72@@ -125,6 +133,7 @@
73 >>> script.main()
74 INFO Exporting to translations branches.
75 INFO Exporting Gazblachko trunk series.
76+ DEBUG No previous translations commit found.
77 INFO Writing file 'po/main/nl.po':
78 INFO # ...
79 msgid ""
80@@ -143,7 +152,7 @@
81 <BLANKLINE>
82 msgid "gazmod msgid"
83 msgstr "gazmod msgstr"
84- INFO Commit.
85+ INFO Commit 2 file(s).
86 INFO Unlock.
87 INFO Processed 1 item(s); 0 failure(s).
88
89@@ -156,8 +165,49 @@
90 INFO Exporting to translations branches.
91 INFO Processed 0 item(s); 0 failure(s).
92
93-
94-== Race conditions ==
95+ >>> gazblachko.official_rosetta = True
96+ >>> transaction.commit()
97+
98+
99+Incremental exports
100+-------------------
101+
102+If the script detects that POFiles have not been touched roughly since
103+the time it last exported them, it won't export them again.
104+
105+ >>> from datetime import datetime, timedelta
106+ >>> from pytz import UTC
107+ >>> now = datetime.now(UTC)
108+ >>> script.simulated_latest_commit = now
109+ >>> main_pofile.date_changed = now - timedelta(days=3)
110+ >>> module_pofile.date_changed = now - timedelta(days=4)
111+ >>> old_pofile.date_changed = now - timedelta(days=5)
112+
113+ >>> script.main()
114+ INFO Exporting to translations branches.
115+ INFO Exporting Gazblachko trunk series.
116+ DEBUG Last commit was at ....
117+ INFO Commit 0 file(s).
118+ INFO Unlock.
119+ INFO Processed 1 item(s); 0 failure(s).
120+
121+If one of the files is updated, it is exported again. Unchanged files
122+are not.
123+
124+ >>> main_pofile.date_changed = now
125+ >>> script.main()
126+ INFO Exporting to translations branches.
127+ INFO Exporting Gazblachko trunk series.
128+ DEBUG Last commit was at ...
129+ INFO Writing file 'po/main/nl.po':
130+ INFO ...
131+ INFO Commit 1 file(s).
132+ INFO Unlock.
133+ INFO Processed 1 item(s); 0 failure(s).
134+
135+
136+Race conditions
137+---------------
138
139 The script checks for possible race conditions. Otherwise it might
140 overwrite translations committed to the branch that hadn't been
141@@ -174,13 +224,12 @@
142 If the DirectBranchCommit detects a concurrent update, the script will
143 refuse to commit to the branch.
144
145- >>> gazblachko.official_rosetta = True
146- >>> transaction.commit()
147-
148 >>> script.simulate_race = True
149+ >>> script.simulated_latest_commit = None
150 >>> script.main()
151 INFO Exporting to translations branches.
152 INFO Exporting Gazblachko trunk series.
153+ DEBUG No previous translations commit found.
154 INFO Writing file 'po/main/nl.po':
155 ...
156 msgstr "gazmod msgstr"
157
158=== modified file 'lib/lp/translations/interfaces/potemplate.py'
159--- lib/lp/translations/interfaces/potemplate.py 2009-07-29 18:53:50 +0000
160+++ lib/lp/translations/interfaces/potemplate.py 2009-09-15 15:30:59 +0000
161@@ -319,6 +319,16 @@
162 def __iter__():
163 """Return an iterator over current `IPOTMsgSet` in this template."""
164
165+ def clearPOFileCache():
166+ """Clear `POFile`-related cached data.
167+
168+ As you work with a `POTemplate`, some data about its `POFile`s
169+ gets cached. But if you're iterating over the template's
170+ translations one `POFile` at a time, you can drop any cached
171+ data about a `POFile` as soon as you're done with it. Use this
172+ method to do that.
173+ """
174+
175 def getHeader():
176 """Return an `ITranslationHeaderData` representing its header."""
177
178
179=== modified file 'lib/lp/translations/model/potemplate.py'
180--- lib/lp/translations/model/potemplate.py 2009-08-05 14:54:11 +0000
181+++ lib/lp/translations/model/potemplate.py 2009-09-15 15:30:59 +0000
182@@ -182,8 +182,12 @@
183 _uses_english_msgids = None
184
185 def __storm_invalidated__(self):
186+ self.clearPOFileCache()
187+ self._uses_english_msgids = None
188+
189+ def clearPOFileCache(self):
190+ """See `IPOTemplate`."""
191 self._cached_pofiles_by_language = None
192- self._uses_english_msgids = None
193
194 @property
195 def uses_english_msgids(self):
196
197=== modified file 'lib/lp/translations/scripts/tests/test_translations_to_branch.py'
198--- lib/lp/translations/scripts/tests/test_translations_to_branch.py 2009-07-17 18:47:44 +0000
199+++ lib/lp/translations/scripts/tests/test_translations_to_branch.py 2009-09-15 15:30:59 +0000
200@@ -9,8 +9,6 @@
201
202 from textwrap import dedent
203
204-from bzrlib.branch import Branch
205-
206 from zope.security.proxy import removeSecurityProxy
207
208 from canonical.launchpad.scripts.tests import run_script
209@@ -23,6 +21,12 @@
210
211 layer = ZopelessAppServerLayer
212
213+ def _filterOutput(self, output):
214+ """Remove DEBUG lines from output."""
215+ return '\n'.join([
216+ line for line in output.splitlines()
217+ if not line.startswith('DEBUG')])
218+
219 def test_translations_export_to_branch(self):
220 # End-to-end test of the script doing its work.
221
222@@ -60,15 +64,16 @@
223
224 # Run The Script.
225 retcode, stdout, stderr = run_script(
226- 'cronscripts/translations-export-to-branch.py', [])
227+ 'cronscripts/translations-export-to-branch.py', ['-vvv'])
228
229 self.assertEqual('', stdout)
230 self.assertEqual(
231 'INFO creating lockfile\n'
232 'INFO Exporting to translations branches.\n'
233 'INFO Exporting Committobranch trunk series.\n'
234- 'INFO Processed 1 item(s); 0 failure(s).\n',
235- stderr)
236+ 'INFO Processed 1 item(s); 0 failure(s).',
237+ self._filterOutput(stderr))
238+ self.assertIn('No previous translations commit found.', stderr)
239 self.assertEqual(0, retcode)
240
241 # The branch now contains a snapshot of the translation. (Only
242@@ -102,6 +107,13 @@
243 if not re.match(pattern, contents, re.MULTILINE):
244 self.assertEqual(pattern, contents)
245
246+ # If we run the script again at this point, it won't export
247+ # anything because it sees that the POFile has not been changed
248+ # since the last export.
249+ retcode, stdout, stderr = run_script(
250+ 'cronscripts/translations-export-to-branch.py', ['-vvv'])
251+ self.assertIn('Last commit was at', stderr)
252+
253
254 def test_suite():
255 return unittest.TestLoader().loadTestsFromName(__name__)
256
257=== modified file 'lib/lp/translations/scripts/translations_to_branch.py'
258--- lib/lp/translations/scripts/translations_to_branch.py 2009-08-03 11:34:35 +0000
259+++ lib/lp/translations/scripts/translations_to_branch.py 2009-09-15 15:30:59 +0000
260@@ -8,6 +8,9 @@
261
262
263 import os.path
264+from datetime import datetime, timedelta
265+from pytz import UTC
266+
267 from zope.component import getUtility
268
269 from storm.expr import Join, SQL
270@@ -27,6 +30,19 @@
271 class ExportTranslationsToBranch(LaunchpadCronScript):
272 """Commit translations to translations_branches where requested."""
273
274+ commit_message = "Launchpad automatic translations update."
275+
276+ # Don't bother looking for a previous translations commit if it's
277+ # longer than this ago.
278+ previous_commit_cutoff_age = timedelta(days=7)
279+
280+ # We can find out when the last translations commit to a branch
281+ # completed, and we can find out when the last transaction changing
282+ # a POFile started. This is exactly the wrong way around for
283+ # figuring out which POFiles need a fresh export, so assume a fudge
284+ # factor.
285+ fudge_factor = timedelta(hours=6)
286+
287 def _checkForObjections(self, source):
288 """Check for reasons why we can't commit to this branch.
289
290@@ -46,17 +62,42 @@
291 "Translations branch for %s has pending translations "
292 "changes. Not committing." % source.title)
293
294- def _makeDirectBranchCommit(self, bzrbranch):
295+ def _makeDirectBranchCommit(self, db_branch):
296 """Create a `DirectBranchCommit`.
297
298 This factory is a mock-injection point for tests.
299+
300+ :param db_branch: A `Branch` object as defined in Launchpad.
301+ :return: A `DirectBranchCommit` for `db_branch`.
302 """
303- return DirectBranchCommit(bzrbranch)
304+ return DirectBranchCommit(db_branch)
305
306 def _commit(self, source, committer):
307 """Commit changes to branch. Check for race conditions."""
308 self._checkForObjections(source)
309- committer.commit("Launchpad automatic translations update.")
310+ committer.commit(self.commit_message)
311+
312+ def _isTranslationsCommit(self, revision):
313+ """Is `revision` an automatic translations commit?"""
314+ return revision.message == self.commit_message
315+
316+ def _getLatestTranslationsCommit(self, branch):
317+ """Get date of last translations commit to `branch`, if any."""
318+ cutoff_date = datetime.now(UTC) - self.previous_commit_cutoff_age
319+
320+ revno, current_rev = branch.last_revision_info()
321+ repository = branch.repository
322+ for rev_id in repository.iter_reverse_revision_history(current_rev):
323+ revision = repository.get_revision(rev_id)
324+ revision_date = revision.timestamp
325+ if self._isTranslationsCommit(revision):
326+ return revision_date
327+
328+ if revision_date < cutoff_date:
329+ # Going too far back in history. Give up.
330+ return None
331+
332+ return None
333
334 def _exportToBranch(self, source):
335 """Export translations for source into source.translations_branch.
336@@ -68,6 +109,22 @@
337
338 committer = self._makeDirectBranchCommit(source.translations_branch)
339
340+ bzr_branch = committer.bzrbranch
341+
342+ last_commit_date = self._getLatestTranslationsCommit(bzr_branch)
343+
344+ if last_commit_date is None:
345+ self.logger.debug("No previous translations commit found.")
346+ changed_since = None
347+ else:
348+ # Export files that have been touched since the last export.
349+ # Subtract a fudge factor because the last-export date marks
350+ # the end of the previous export, and the POFiles'
351+ # last-touched timestamp marks the beginning of the last
352+ # transaction that changed them.
353+ self.logger.debug("Last commit was at %s." % last_commit_date)
354+ changed_since = last_commit_date - self.fudge_factor
355+
356 try:
357 subset = getUtility(IPOTemplateSet).getSubset(
358 productseries=source, iscurrent=True)
359@@ -75,6 +132,13 @@
360 base_path = os.path.dirname(template.path)
361
362 for pofile in template.pofiles:
363+
364+ has_changed = (
365+ changed_since is None or
366+ pofile.date_changed > changed_since)
367+ if not has_changed:
368+ continue
369+
370 pofile_path = os.path.join(
371 base_path, pofile.getFullLanguageCode() + '.po')
372 pofile_contents = pofile.export()
373@@ -87,6 +151,10 @@
374 if self.txn:
375 self.txn.commit()
376
377+ # We're done with this POFile. Don't bother caching
378+ # anything about it any longer.
379+ template.clearPOFileCache()
380+
381 self._commit(source, committer)
382 finally:
383 committer.unlock()