Merge lp:~jtv/launchpad/bug-702228 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Robert Collins
Approved revision: no longer in the source branch.
Merged at revision: 12215
Proposed branch: lp:~jtv/launchpad/bug-702228
Merge into: lp:launchpad
Diff against target: 242 lines (+134/-27)
4 files modified
cronscripts/rosetta-export-queue.py (+5/-2)
lib/lp/translations/doc/poexport-queue.txt (+1/-1)
lib/lp/translations/scripts/po_export_queue.py (+61/-24)
lib/lp/translations/tests/test_exportresult.py (+67/-0)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-702228
Reviewer Review Type Date Requested Status
j.c.sackett (community) code* Approve
Robert Collins (community) Approve
Review via email: mp+46165@code.launchpad.net

Commit message

[r=jcsackett,lifeless][ui=none][bug=702228] Avoid long-running master transactions during translations export.

Description of the change

= Bug 702228 =

A particularly difficult translations export request was breaking the export queue. It took over 3 hours to run, at which point our watchdog script that looks for long transactions on the master database killed its master store connection.

This branch addresses that by avoiding use of the master database, so that the connection will not start a transaction until the very end of a request (where a master connection is really needed). An occasional long transaction on the slave is not such a big problem, and certainly won't take any write locks. So the watchdog will leave the slave connection in peace.

There is a nesting of database policies: the SlaveDatabasePolicy makes the slave store the default, so that e.g. some use of cursor() that we still have will use the slave. This is merely a little help in reducing accidental use of the master store. But I also bracketed the time-critical part in a SlaveOnlyDatabasePolicy. This outright disallows use of the master store.

It's not quite obvious, but the method that fetches the next request off the queue accesses the master store. It doesn't make any changes there, but it does verify that the oldest fully-replicated request it gets off the slave store has not already been deleted from the master store. That situation could otherwise occur if the script deleted the previous request from the master store, then again tried to fetch the oldest request off the slave before the deletion had replicated.

I tested thusly:
{{{
./bin/test -vvc lp.translations -t export
}}}

To Q/A, first ensure that regular exports still work (both in Ubuntu and upstream; include XPI) by running the script manually, and seeing that it still produces valid tarballs and individual files. On staging that requires manually clearing the backlog first.

Watch pg_stat_activity on the master database during a large export. The export script's master backend process (as logged by the script) should be "<IDLE>" until the end, not "<IDLE> in transaction."

No lint,

Jeroen

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) :
review: Approve
Revision history for this message
j.c.sackett (jcsackett) wrote :
Download full text (3.9 KiB)

Jeroen--

This looks good. I have couple comments below, but the one on the test case is the only one I feel is important. This is good to land with that change.

> === modified file 'lib/lp/translations/scripts/po_export_queue.py'
> --- lib/lp/translations/scripts/po_export_queue.py 2010-08-20 20:31:18 +0000
> +++ lib/lp/translations/scripts/po_export_queue.py 2011-01-13 18:25:08 +0000
> @@ -250,6 +252,46 @@
> 'request_url': self.request_url,
> }
>
> + def setExportFile(self, exported_file):
> + """Attach an exported file to the result, for upload to the Librarian.
> +
> + After this is set, `upload` will perform the actual upload. The two
> + actions are separated so as to isolate write access to the database.
> +
> + :param exported_file: An `IExportedTranslationFile` containing the
> + exported data.
> + """
> + self.exported_file = exported_file

Given how very little this method is doing, I'm not sure it needs to be a separate method. It's not hurting anything to have it this way, but seems slightly unnecesary.

> === added file 'lib/lp/translations/tests/test_exportresult.py'
> --- lib/lp/translations/tests/test_exportresult.py 1970-01-01 00:00:00 +0000
> +++ lib/lp/translations/tests/test_exportresult.py 2011-01-13 18:25:08 +0000
> @@ -0,0 +1,67 @@
> +# Copyright 2010 Canonical Ltd. This software is licensed under the
> +# GNU Affero General Public License version 3 (see the file LICENSE).
> +
> +"""Tests for `ExportResult`."""
> +
> +__metaclass__ = type
> +
> +import hashlib
> +import os.path
> +from StringIO import StringIO
> +
> +from canonical.librarian.testing.fake import FakeLibrarian
> +from canonical.testing.layers import ZopelessDatabaseLayer
> +from lp.services.log.logger import DevNullLogger
> +from lp.testing import TestCaseWithFactory
> +from lp.translations.scripts.po_export_queue import ExportResult
> +
> +
> +class FakeExportedTranslationFile:
> + """Fake `IExportedTranslationFile` for testing."""
> +
> + def __init__(self, path, content, content_type):
> + self.path = path
> + base, ext = os.path.splitext(path)
> + self.file_extension = ext
> + self.size = len(content)
> + self.content = content
> + self.file = StringIO(content)
> + self.content_type = content_type
> +
> + def read(self, *args, **kwargs):
> + return self.file.read(*args, **kwargs)
> +
> +
> +class TestExportResult(TestCaseWithFactory):
> +
> + layer = ZopelessDatabaseLayer
> +
> + def makeExportResult(self):
> + request = [self.factory.makePOFile()]
> + requester = self.factory.makePerson()
> + logger = DevNullLogger()
> + return ExportResult(requester, request, logger)
> +
> + def makeExportedTranslationFile(self):
> + filename = self.factory.getUniqueString()
> + content = self.factory.getUniqueString()
> + mime_type = "text/plain"
> + return FakeExportedTranslationFile(filename, content, mime_type)
> +
> + def test_upload_exported_file(self):
> + librarian = self.useFixture(FakeLibrarian())
> + export = self.makeExp...

Read more...

review: Approve (code*)
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

You're right about the assertNotEqual. Fixed with thanks.

You may also be right about the other method, but I can't think of anything to do about it. I wouldn't want to set the property on the object from outside the object's own code: I feel that would be highly nonobvious, sort of like an out-of-place parameter. So I left that as it was.

Jeroen

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'cronscripts/rosetta-export-queue.py'
2--- cronscripts/rosetta-export-queue.py 2010-04-27 19:48:39 +0000
3+++ cronscripts/rosetta-export-queue.py 2011-01-13 23:31:34 +0000
4@@ -8,17 +8,20 @@
5 import _pythonpath
6
7 from canonical.database.sqlbase import ISOLATION_LEVEL_READ_COMMITTED
8+from canonical.launchpad.webapp.dbpolicy import SlaveDatabasePolicy
9 from lp.translations.scripts.po_export_queue import process_queue
10 from lp.services.scripts.base import LaunchpadCronScript
11
12
13 class RosettaExportQueue(LaunchpadCronScript):
14+ """Translation exports."""
15+
16 def main(self):
17 self.txn.set_isolation_level(ISOLATION_LEVEL_READ_COMMITTED)
18- process_queue(self.txn, self.logger)
19+ with SlaveDatabasePolicy():
20+ process_queue(self.txn, self.logger)
21
22
23 if __name__ == '__main__':
24 script = RosettaExportQueue('rosetta-export-queue', dbuser='poexport')
25 script.lock_and_run()
26-
27
28=== modified file 'lib/lp/translations/doc/poexport-queue.txt'
29--- lib/lp/translations/doc/poexport-queue.txt 2010-12-02 16:13:51 +0000
30+++ lib/lp/translations/doc/poexport-queue.txt 2011-01-13 23:31:34 +0000
31@@ -42,7 +42,7 @@
32 ... result.addFailure()
33
34 In this case, there is an error, so there shouldn't be a URL to download
35-anything, if we set it, the system will fail:
36+anything. If we set it, the system will fail:
37
38 >>> result.url = 'http://someplace.com/somefile.tar.gz'
39
40
41=== modified file 'lib/lp/translations/scripts/po_export_queue.py'
42--- lib/lp/translations/scripts/po_export_queue.py 2010-08-20 20:31:18 +0000
43+++ lib/lp/translations/scripts/po_export_queue.py 2011-01-13 23:31:34 +0000
44@@ -23,6 +23,7 @@
45 from canonical.launchpad.interfaces.librarian import ILibraryFileAliasSet
46 from canonical.launchpad.mail import simple_sendmail
47 from canonical.launchpad.webapp import canonical_url
48+from canonical.launchpad.webapp.dbpolicy import SlaveOnlyDatabasePolicy
49 from lp.registry.interfaces.productseries import IProductSeries
50 from lp.registry.interfaces.sourcepackage import ISourcePackage
51 from lp.translations.interfaces.poexportrequest import IPOExportRequestSet
52@@ -54,6 +55,7 @@
53 self.url = None
54 self.failure = None
55 self.logger = logger
56+ self.exported_file = None
57
58 self.requested_exports = list(requested_exports)
59 export_requested_at = self._getExportRequestOrigin()
60@@ -250,6 +252,46 @@
61 'request_url': self.request_url,
62 }
63
64+ def setExportFile(self, exported_file):
65+ """Attach an exported file to the result, for upload to the Librarian.
66+
67+ After this is set, `upload` will perform the actual upload. The two
68+ actions are separated so as to isolate write access to the database.
69+
70+ :param exported_file: An `IExportedTranslationFile` containing the
71+ exported data.
72+ """
73+ self.exported_file = exported_file
74+
75+ def upload(self, logger=None):
76+ """Upload exported file as set with `setExportFile` to the Librarian.
77+
78+ If no file has been set, do nothing.
79+ """
80+ if self.exported_file is None:
81+ # There's nothing to upload.
82+ return
83+
84+ if self.exported_file.path is None:
85+ # The exported path is unknown, use translation domain as its
86+ # filename.
87+ assert self.exported_file.file_extension, (
88+ 'File extension must have a value!.')
89+ path = 'launchpad-export.%s' % self.exported_file.file_extension
90+ else:
91+ # Convert the path to a single file name so it's noted in
92+ # librarian.
93+ path = self.exported_file.path.replace(os.sep, '_')
94+
95+ alias_set = getUtility(ILibraryFileAliasSet)
96+ alias = alias_set.create(
97+ name=path, size=self.exported_file.size, file=self.exported_file,
98+ contentType=self.exported_file.content_type)
99+
100+ self.url = alias.http_url
101+ if logger is not None:
102+ logger.info("Stored file at %s" % self.url)
103+
104 def notify(self):
105 """Send a notification email to the given person about the export.
106
107@@ -331,6 +373,9 @@
108 multiple files) and information about files that we failed to export (if
109 any).
110 """
111+ # Keep as much work off the master store as possible, so we avoid
112+ # opening unnecessary transactions there. It could be a while
113+ # before we get to the commit.
114 translation_exporter = getUtility(ITranslationExporter)
115 requested_objects = list(objects)
116
117@@ -352,28 +397,9 @@
118 # error, we add the entry to the list of errors.
119 result.addFailure()
120 else:
121- if exported_file.path is None:
122- # The exported path is unknown, use translation domain as its
123- # filename.
124- assert exported_file.file_extension, (
125- 'File extension must have a value!.')
126- exported_path = 'launchpad-export.%s' % (
127- exported_file.file_extension)
128- else:
129- # Convert the path to a single file name so it's noted in
130- # librarian.
131- exported_path = exported_file.path.replace(os.sep, '_')
132-
133- alias_set = getUtility(ILibraryFileAliasSet)
134- alias = alias_set.create(
135- name=exported_path,
136- size=exported_file.size,
137- file=exported_file,
138- contentType=exported_file.content_type)
139- result.url = alias.http_url
140- logger.info("Stored file at %s" % result.url)
141-
142- result.notify()
143+ result.setExportFile(exported_file)
144+
145+ return result
146
147
148 def process_queue(transaction_manager, logger):
149@@ -386,8 +412,19 @@
150
151 request = request_set.getRequest()
152 while request != no_request:
153- person, objects, format, request_ids = request
154- process_request(person, objects, format, logger)
155+
156+ # This can take a long time. Make sure we don't open any
157+ # transactions on the master store before we really need to.
158+ transaction_manager.commit()
159+ with SlaveOnlyDatabasePolicy():
160+ person, objects, format, request_ids = request
161+ result = process_request(person, objects, format, logger)
162+
163+ # Almost done. Now we can go back to using the master database
164+ # where needed.
165+ result.upload(logger=logger)
166+ result.notify()
167+
168 request_set.removeRequest(request_ids)
169 transaction_manager.commit()
170
171
172=== added file 'lib/lp/translations/tests/test_exportresult.py'
173--- lib/lp/translations/tests/test_exportresult.py 1970-01-01 00:00:00 +0000
174+++ lib/lp/translations/tests/test_exportresult.py 2011-01-13 23:31:34 +0000
175@@ -0,0 +1,67 @@
176+# Copyright 2010 Canonical Ltd. This software is licensed under the
177+# GNU Affero General Public License version 3 (see the file LICENSE).
178+
179+"""Tests for `ExportResult`."""
180+
181+__metaclass__ = type
182+
183+import hashlib
184+import os.path
185+from StringIO import StringIO
186+
187+from canonical.librarian.testing.fake import FakeLibrarian
188+from canonical.testing.layers import ZopelessDatabaseLayer
189+from lp.services.log.logger import DevNullLogger
190+from lp.testing import TestCaseWithFactory
191+from lp.translations.scripts.po_export_queue import ExportResult
192+
193+
194+class FakeExportedTranslationFile:
195+ """Fake `IExportedTranslationFile` for testing."""
196+
197+ def __init__(self, path, content, content_type):
198+ self.path = path
199+ base, ext = os.path.splitext(path)
200+ self.file_extension = ext
201+ self.size = len(content)
202+ self.content = content
203+ self.file = StringIO(content)
204+ self.content_type = content_type
205+
206+ def read(self, *args, **kwargs):
207+ return self.file.read(*args, **kwargs)
208+
209+
210+class TestExportResult(TestCaseWithFactory):
211+
212+ layer = ZopelessDatabaseLayer
213+
214+ def makeExportResult(self):
215+ request = [self.factory.makePOFile()]
216+ requester = self.factory.makePerson()
217+ logger = DevNullLogger()
218+ return ExportResult(requester, request, logger)
219+
220+ def makeExportedTranslationFile(self):
221+ filename = self.factory.getUniqueString()
222+ content = self.factory.getUniqueString()
223+ mime_type = "text/plain"
224+ return FakeExportedTranslationFile(filename, content, mime_type)
225+
226+ def test_upload_exported_file(self):
227+ librarian = self.useFixture(FakeLibrarian())
228+ export = self.makeExportedTranslationFile()
229+ export_result = self.makeExportResult()
230+ export_result.setExportFile(export)
231+ export_result.upload()
232+
233+ self.assertIsNot(None, export_result.url)
234+ sha1 = hashlib.sha1(export.content).hexdigest()
235+ self.assertEqual(sha1, librarian.aliases.values()[0].content.sha1)
236+ alias = librarian.findBySHA1(sha1)
237+ self.assertEqual(export.path, alias.filename)
238+
239+ def test_upload_without_exported_file_does_nothing(self):
240+ export_result = self.makeExportResult()
241+ export_result.upload()
242+ self.assertIs(None, export_result.url)