Merge lp:~jtv/launchpad/bug-181368-parallelize into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: 12474
Proposed branch: lp:~jtv/launchpad/bug-181368-parallelize
Merge into: lp:launchpad
Diff against target: 246 lines (+143/-34)
2 files modified
lib/lp/archivepublisher/model/ftparchive.py (+39/-15)
lib/lp/archivepublisher/tests/test_ftparchive.py (+104/-19)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-181368-parallelize
Reviewer Review Type Date Requested Status
Leonard Richardson (community) Approve
Graham Binns (community) code Approve
Review via email: mp+48903@code.launchpad.net

Commit message

[r=gmb,leonardr][bug=181368] Parallel apt-ftparchive.

Description of the change

= Summary =

The archive publisher takes far longer to run than anyone would like.

It spends most of its time running apt-ftparchive, which writes index files for each CPU architecture in turn.

== Proposed fix ==

This branch parallelizes the apt-ftparchive process into concurrent invocations, one for each architecture. It's made easy by the earlier introduction of the CommandSpawner, a helper class that lets you issue commands in parallel and then wait for them all to complete.

== Pre-implementation notes ==

The idea was wgrant's. Since this is a little scary, I also discussed the idea with bigjools. To validate the operational side, mthaddon observed the memory footprint of a full apt-ftparchive run. It took up less than a percent of available memory.

== Implementation details ==

The command-line option that lets us limit apt-ftparchive to a single architecture was introduced in Maverick. Previous versions allowed the same thing through the configuration file, but using it that way would have involved rewriting configuration files. Backpatching the new option was considered the lesser risk.

For the purposes of apt-ftparchive, "source" is just another architecture.

== Tests ==

To exercise the change:
{{{
./bin/test -vvc lp.archivepublisher.tests.test_ftparchive
}}}

I removed a few assertions on runApt's return value. The method used to return apt-ftparchive's exit code, but it would raise an exception first if that exit code was not zero. Therefore it always returned zero. I removed it. A new test verifies that runApt does indeed raise an exception if the command fails.

== Demo and Q/A ==

This is not something we can easily reproduce without Soyuz skills. I rely entirely on Julian there.

He has already tested the change on dogfood, though only with one architecture (plus the "source" one, of course). We'll have to try a larger run with multiple architectures as part of regular Q/A.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/tests/test_ftparchive.py
  lib/lp/archivepublisher/ftparchive.py

So there!

To post a comment you must log in.
Revision history for this message
Graham Binns (gmb) wrote :

From IRC:

[13:57] gmb: jtv: Most of the tests in TestFTPArchiveRunApt need comments, if you please. Otherwise I'm happy with the code, but I'm not convinced that I actually have enough domain knowledge to review this - my brain seems to keep skipping off it.
[13:58] gmb: jtv: I'll give it r=me, but you might want to get a review from someone who knows more about it than I do.

review: Approve (code)
Revision history for this message
Leonard Richardson (leonardr) wrote :

I echo Graham's comments. I found test_getArchitectureTags_contains_no_duplicates and test_getArchitectureTags_counts_any_architecture_enabled_once confusing to read, and we discussed on IRC a way to improve them with a helper method.

I also suspect that test_runApt_reports_failure proves too much. Can you really break this code by naming a distro arch series "--something"? If so, that's its own problem and should be fixed. Or does the error happen because the string "bogus-config" is passed in instead of a real config? In that case, the command might not be run at all, and you'll need another test to show what happens when the command runs and fails.

Revision history for this message
Leonard Richardson (leonardr) wrote :

r=me with these (hopefully) minor changes.

review: Approve
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Discussed on IRC, but for the record: any way of breaking the script will do, and the bogus-config is enough.

Ordinary users are not in a position to set a bogus architecture tag, and the shell is not invoked to do clever/risky things with the script arguments. So the damage from setting one is limited to archive publication being prevented, by a user who has privileges to do that anyway.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/archivepublisher/model/ftparchive.py'
2--- lib/lp/archivepublisher/model/ftparchive.py 2011-02-04 09:07:36 +0000
3+++ lib/lp/archivepublisher/model/ftparchive.py 2011-02-25 12:00:23 +0000
4@@ -102,6 +102,10 @@
5 """
6
7
8+class AptFTPArchiveFailure(Exception):
9+ """Failure while running apt-ftparchive."""
10+
11+
12 class FTPArchiveHandler:
13 """Produces Sources and Packages files via apt-ftparchive.
14
15@@ -135,34 +139,54 @@
16 apt_config_filename = self.generateConfig(is_careful)
17 self.runApt(apt_config_filename)
18
19+ def _getArchitectureTags(self):
20+ """List tags of all architectures enabled in this distro."""
21+ archs = set()
22+ for series in self.distro.series:
23+ archs.update(set([
24+ distroarchseries.architecturetag
25+ for distroarchseries in series.enabled_architectures]))
26+ return archs
27+
28 def runApt(self, apt_config_filename):
29- """Run apt in a subprocess and verify its return value. """
30+ """Run apt-ftparchive in subprocesses.
31+
32+ :raise: AptFTPArchiveFailure if any of the apt-ftparchive
33+ commands failed.
34+ """
35 self.log.debug("Filepath: %s" % apt_config_filename)
36- # XXX JeroenVermeulen 2011-02-01 bug=181368: Run parallel
37- # apt-ftparchive processes for the various architectures (plus
38- # source).
39+
40 stdout_handler = OutputLineHandler(self.log.debug, "a-f: ")
41 stderr_handler = OutputLineHandler(self.log.warning, "a-f: ")
42- completion_handler = ReturnCodeReceiver()
43- command = [
44+ base_command = [
45 "apt-ftparchive",
46 "--no-contents",
47 "generate",
48 apt_config_filename,
49 ]
50 spawner = CommandSpawner()
51- spawner.start(
52- command, stdout_handler=stdout_handler,
53- stderr_handler=stderr_handler,
54- completion_handler=completion_handler)
55+
56+ returncodes = {}
57+ for arch_tag in self._getArchitectureTags().union(set(["source"])):
58+ completion_handler = ReturnCodeReceiver()
59+ returncodes[arch_tag] = completion_handler
60+ command = base_command + ["--arch", arch_tag]
61+ spawner.start(
62+ command, stdout_handler=stdout_handler,
63+ stderr_handler=stderr_handler,
64+ completion_handler=completion_handler)
65+
66 spawner.complete()
67 stdout_handler.finalize()
68 stderr_handler.finalize()
69- if completion_handler.returncode != 0:
70- raise AssertionError(
71- "Failure from apt-ftparchive. Return code %s"
72- % completion_handler.returncode)
73- return completion_handler.returncode
74+ failures = sorted([
75+ (tag, receiver.returncode)
76+ for tag, receiver in returncodes.iteritems()
77+ if receiver.returncode != 0])
78+ if len(failures) > 0:
79+ by_arch = ["%s (returned %d)" % failure for failure in failures]
80+ raise AptFTPArchiveFailure(
81+ "Failure(s) from apt-ftparchive: %s" % ", ".join(by_arch))
82
83 #
84 # Empty Pocket Requests
85
86=== modified file 'lib/lp/archivepublisher/tests/test_ftparchive.py'
87--- lib/lp/archivepublisher/tests/test_ftparchive.py 2011-02-04 09:07:36 +0000
88+++ lib/lp/archivepublisher/tests/test_ftparchive.py 2011-02-25 12:00:23 +0000
89@@ -1,4 +1,4 @@
90-# Copyright 2009-2010 Canonical Ltd. This software is licensed under the
91+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
92 # GNU Affero General Public License version 3 (see the file LICENSE).
93
94 """Tests for ftparchive.py"""
95@@ -15,11 +15,18 @@
96 from zope.component import getUtility
97
98 from canonical.config import config
99-from lp.services.log.logger import BufferLogger
100-from canonical.testing.layers import LaunchpadZopelessLayer
101+from lp.services.log.logger import (
102+ BufferLogger,
103+ DevNullLogger,
104+ )
105+from canonical.testing.layers import (
106+ LaunchpadZopelessLayer,
107+ ZopelessDatabaseLayer,
108+ )
109 from lp.archivepublisher.config import getPubConfig
110 from lp.archivepublisher.diskpool import DiskPool
111 from lp.archivepublisher.model.ftparchive import (
112+ AptFTPArchiveFailure,
113 f_touch,
114 FTPArchiveHandler,
115 )
116@@ -290,7 +297,7 @@
117 # those kind of tests and avoid to run it when performing 'make
118 # check'. Although they should remain active in PQM to avoid possible
119 # regressions.
120- assert fa.runApt(apt_conf) == 0
121+ fa.runApt(apt_conf)
122 self._verifyFile("Packages",
123 os.path.join(self._distsdir, "hoary-test", "main", "binary-i386"))
124 self._verifyFile("Sources",
125@@ -305,7 +312,7 @@
126 assert len(file(apt_conf).readlines()) == 23
127
128 # XXX cprov 2007-03-21: see above, do not run a-f on dev machines.
129- assert fa.runApt(apt_conf) == 0
130+ fa.runApt(apt_conf)
131
132 def test_generateConfig_empty_and_careful(self):
133 # Generate apt-ftparchive config for an specific empty suite.
134@@ -359,20 +366,98 @@
135 self.assertEqual(apt_conf_content, sample_content)
136
137 # XXX cprov 2007-03-21: see above, do not run a-f on dev machines.
138- self.assertEqual(fa.runApt(apt_conf), 0)
139- self.assertTrue(os.path.exists(
140- os.path.join(self._distsdir, "hoary-test-updates", "main",
141- "binary-i386", "Packages")))
142- self.assertTrue(os.path.exists(
143- os.path.join(self._distsdir, "hoary-test-updates", "main",
144- "source", "Sources")))
145-
146- self.assertFalse(os.path.exists(
147- os.path.join(self._distsdir, "hoary-test", "main",
148- "binary-i386", "Packages")))
149- self.assertFalse(os.path.exists(
150- os.path.join(self._distsdir, "hoary-test", "main",
151- "source", "Sources")))
152+ fa.runApt(apt_conf)
153+ self.assertTrue(os.path.exists(
154+ os.path.join(self._distsdir, "hoary-test-updates", "main",
155+ "binary-i386", "Packages")))
156+ self.assertTrue(os.path.exists(
157+ os.path.join(self._distsdir, "hoary-test-updates", "main",
158+ "source", "Sources")))
159+
160+ self.assertFalse(os.path.exists(
161+ os.path.join(self._distsdir, "hoary-test", "main",
162+ "binary-i386", "Packages")))
163+ self.assertFalse(os.path.exists(
164+ os.path.join(self._distsdir, "hoary-test", "main",
165+ "source", "Sources")))
166+
167+
168+class TestFTPArchiveRunApt(TestCaseWithFactory):
169+ """Test `FTPArchive`'s execution of apt-ftparchive."""
170+
171+ layer = ZopelessDatabaseLayer
172+
173+ def _makeMatchingDistroArchSeries(self):
174+ """Create two `DistroArchSeries` for the same distro and processor."""
175+ distro = self.factory.makeDistribution()
176+ processor = self.factory.makeProcessor()
177+ return (
178+ self.factory.makeDistroArchSeries(
179+ distroseries=self.factory.makeDistroSeries(distro),
180+ processorfamily=processor.family,
181+ architecturetag=processor.name)
182+ for counter in (1, 2))
183+
184+ def test_getArchitectureTags_starts_out_empty(self):
185+ fa = FTPArchiveHandler(
186+ DevNullLogger(), None, None, self.factory.makeDistribution(),
187+ None)
188+ self.assertContentEqual([], fa._getArchitectureTags())
189+
190+ def test_getArchitectureTags_includes_enabled_architectures(self):
191+ distroarchseries = self.factory.makeDistroArchSeries()
192+ fa = FTPArchiveHandler(
193+ DevNullLogger(), None, None,
194+ distroarchseries.distroseries.distribution, None)
195+ self.assertContentEqual(
196+ [distroarchseries.architecturetag], fa._getArchitectureTags())
197+
198+ def test_getArchitectureTags_considers_all_series(self):
199+ distro = self.factory.makeDistribution()
200+ affluent_antilope = self.factory.makeDistroSeries(distribution=distro)
201+ bilious_baboon = self.factory.makeDistroSeries(distribution=distro)
202+ affluent_arch = self.factory.makeDistroArchSeries(
203+ distroseries=affluent_antilope)
204+ bilious_arch = self.factory.makeDistroArchSeries(
205+ distroseries=bilious_baboon)
206+ fa = FTPArchiveHandler(DevNullLogger(), None, None, distro, None)
207+ self.assertContentEqual(
208+ [affluent_arch.architecturetag, bilious_arch.architecturetag],
209+ fa._getArchitectureTags())
210+
211+ def test_getArchitectureTags_ignores_disabled_architectures(self):
212+ distroarchseries = self.factory.makeDistroArchSeries()
213+ distroarchseries.enabled = False
214+ fa = FTPArchiveHandler(
215+ DevNullLogger(), None, None,
216+ distroarchseries.distroseries.distribution, None)
217+ self.assertContentEqual([], fa._getArchitectureTags())
218+
219+ def test_getArchitectureTags_contains_no_duplicates(self):
220+ ominous_okapi, pilfering_puppy = self._makeMatchingDistroArchSeries()
221+ fa = FTPArchiveHandler(
222+ DevNullLogger(), None, None,
223+ ominous_okapi.distroseries.distribution, None)
224+ self.assertEqual(1, len(list(fa._getArchitectureTags())))
225+ self.assertContentEqual(
226+ [ominous_okapi.architecturetag], fa._getArchitectureTags())
227+
228+ def test_getArchitectureTags_counts_any_architecture_enabled_once(self):
229+ manic_mantis, nervous_nit = self._makeMatchingDistroArchSeries()
230+ nervous_nit.enabled = False
231+ fa = FTPArchiveHandler(
232+ DevNullLogger(), None, None,
233+ manic_mantis.distroseries.distribution, None)
234+ self.assertContentEqual(
235+ [manic_mantis.architecturetag], fa._getArchitectureTags())
236+
237+ def test_runApt_reports_failure(self):
238+ # If we sabotage apt-ftparchive, runApt notices that it failed
239+ # and raises an exception.
240+ distroarchseries = self.factory.makeDistroArchSeries()
241+ distro = distroarchseries.distroseries.distribution
242+ fa = FTPArchiveHandler(DevNullLogger(), None, None, distro, None)
243+ self.assertRaises(AptFTPArchiveFailure, fa.runApt, "bogus-config")
244
245
246 class TestFTouch(unittest.TestCase):