Merge lp:~jtv/launchpad/bug-181368-parallelize into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: 12474
Proposed branch: lp:~jtv/launchpad/bug-181368-parallelize
Merge into: lp:launchpad
Diff against target: 246 lines (+143/-34)
2 files modified
lib/lp/archivepublisher/model/ftparchive.py (+39/-15)
lib/lp/archivepublisher/tests/test_ftparchive.py (+104/-19)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-181368-parallelize
Reviewer Review Type Date Requested Status
Leonard Richardson (community) Approve
Graham Binns (community) code Approve
Review via email: mp+48903@code.launchpad.net

Commit message

[r=gmb,leonardr][bug=181368] Parallel apt-ftparchive.

Description of the change

= Summary =

The archive publisher takes far longer to run than anyone would like.

It spends most of its time running apt-ftparchive, which writes index files for each CPU architecture in turn.

== Proposed fix ==

This branch parallelizes the apt-ftparchive process into concurrent invocations, one for each architecture. It's made easy by the earlier introduction of the CommandSpawner, a helper class that lets you issue commands in parallel and then wait for them all to complete.

== Pre-implementation notes ==

The idea was wgrant's. Since this is a little scary, I also discussed the idea with bigjools. To validate the operational side, mthaddon observed the memory footprint of a full apt-ftparchive run. It took up less than a percent of available memory.

== Implementation details ==

The command-line option that lets us limit apt-ftparchive to a single architecture was introduced in Maverick. Previous versions allowed the same thing through the configuration file, but using it that way would have involved rewriting configuration files. Backpatching the new option was considered the lesser risk.

For the purposes of apt-ftparchive, "source" is just another architecture.

== Tests ==

To exercise the change:
{{{
./bin/test -vvc lp.archivepublisher.tests.test_ftparchive
}}}

I removed a few assertions on runApt's return value. The method used to return apt-ftparchive's exit code, but it would raise an exception first if that exit code was not zero. Therefore it always returned zero. I removed it. A new test verifies that runApt does indeed raise an exception if the command fails.

== Demo and Q/A ==

This is not something we can easily reproduce without Soyuz skills. I rely entirely on Julian there.

He has already tested the change on dogfood, though only with one architecture (plus the "source" one, of course). We'll have to try a larger run with multiple architectures as part of regular Q/A.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/tests/test_ftparchive.py
  lib/lp/archivepublisher/ftparchive.py

So there!

To post a comment you must log in.
Revision history for this message
Graham Binns (gmb) wrote :

From IRC:

[13:57] gmb: jtv: Most of the tests in TestFTPArchiveRunApt need comments, if you please. Otherwise I'm happy with the code, but I'm not convinced that I actually have enough domain knowledge to review this - my brain seems to keep skipping off it.
[13:58] gmb: jtv: I'll give it r=me, but you might want to get a review from someone who knows more about it than I do.

review: Approve (code)
Revision history for this message
Leonard Richardson (leonardr) wrote :

I echo Graham's comments. I found test_getArchitectureTags_contains_no_duplicates and test_getArchitectureTags_counts_any_architecture_enabled_once confusing to read, and we discussed on IRC a way to improve them with a helper method.

I also suspect that test_runApt_reports_failure proves too much. Can you really break this code by naming a distro arch series "--something"? If so, that's its own problem and should be fixed. Or does the error happen because the string "bogus-config" is passed in instead of a real config? In that case, the command might not be run at all, and you'll need another test to show what happens when the command runs and fails.

Revision history for this message
Leonard Richardson (leonardr) wrote :

r=me with these (hopefully) minor changes.

review: Approve
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Discussed on IRC, but for the record: any way of breaking the script will do, and the bogus-config is enough.

Ordinary users are not in a position to set a bogus architecture tag, and the shell is not invoked to do clever/risky things with the script arguments. So the damage from setting one is limited to archive publication being prevented, by a user who has privileges to do that anyway.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/archivepublisher/model/ftparchive.py'
--- lib/lp/archivepublisher/model/ftparchive.py 2011-02-04 09:07:36 +0000
+++ lib/lp/archivepublisher/model/ftparchive.py 2011-02-25 12:00:23 +0000
@@ -102,6 +102,10 @@
102"""102"""
103103
104104
105class AptFTPArchiveFailure(Exception):
106 """Failure while running apt-ftparchive."""
107
108
105class FTPArchiveHandler:109class FTPArchiveHandler:
106 """Produces Sources and Packages files via apt-ftparchive.110 """Produces Sources and Packages files via apt-ftparchive.
107111
@@ -135,34 +139,54 @@
135 apt_config_filename = self.generateConfig(is_careful)139 apt_config_filename = self.generateConfig(is_careful)
136 self.runApt(apt_config_filename)140 self.runApt(apt_config_filename)
137141
142 def _getArchitectureTags(self):
143 """List tags of all architectures enabled in this distro."""
144 archs = set()
145 for series in self.distro.series:
146 archs.update(set([
147 distroarchseries.architecturetag
148 for distroarchseries in series.enabled_architectures]))
149 return archs
150
138 def runApt(self, apt_config_filename):151 def runApt(self, apt_config_filename):
139 """Run apt in a subprocess and verify its return value. """152 """Run apt-ftparchive in subprocesses.
153
154 :raise: AptFTPArchiveFailure if any of the apt-ftparchive
155 commands failed.
156 """
140 self.log.debug("Filepath: %s" % apt_config_filename)157 self.log.debug("Filepath: %s" % apt_config_filename)
141 # XXX JeroenVermeulen 2011-02-01 bug=181368: Run parallel158
142 # apt-ftparchive processes for the various architectures (plus
143 # source).
144 stdout_handler = OutputLineHandler(self.log.debug, "a-f: ")159 stdout_handler = OutputLineHandler(self.log.debug, "a-f: ")
145 stderr_handler = OutputLineHandler(self.log.warning, "a-f: ")160 stderr_handler = OutputLineHandler(self.log.warning, "a-f: ")
146 completion_handler = ReturnCodeReceiver()161 base_command = [
147 command = [
148 "apt-ftparchive",162 "apt-ftparchive",
149 "--no-contents",163 "--no-contents",
150 "generate",164 "generate",
151 apt_config_filename,165 apt_config_filename,
152 ]166 ]
153 spawner = CommandSpawner()167 spawner = CommandSpawner()
154 spawner.start(168
155 command, stdout_handler=stdout_handler,169 returncodes = {}
156 stderr_handler=stderr_handler,170 for arch_tag in self._getArchitectureTags().union(set(["source"])):
157 completion_handler=completion_handler)171 completion_handler = ReturnCodeReceiver()
172 returncodes[arch_tag] = completion_handler
173 command = base_command + ["--arch", arch_tag]
174 spawner.start(
175 command, stdout_handler=stdout_handler,
176 stderr_handler=stderr_handler,
177 completion_handler=completion_handler)
178
158 spawner.complete()179 spawner.complete()
159 stdout_handler.finalize()180 stdout_handler.finalize()
160 stderr_handler.finalize()181 stderr_handler.finalize()
161 if completion_handler.returncode != 0:182 failures = sorted([
162 raise AssertionError(183 (tag, receiver.returncode)
163 "Failure from apt-ftparchive. Return code %s"184 for tag, receiver in returncodes.iteritems()
164 % completion_handler.returncode)185 if receiver.returncode != 0])
165 return completion_handler.returncode186 if len(failures) > 0:
187 by_arch = ["%s (returned %d)" % failure for failure in failures]
188 raise AptFTPArchiveFailure(
189 "Failure(s) from apt-ftparchive: %s" % ", ".join(by_arch))
166190
167 #191 #
168 # Empty Pocket Requests192 # Empty Pocket Requests
169193
=== modified file 'lib/lp/archivepublisher/tests/test_ftparchive.py'
--- lib/lp/archivepublisher/tests/test_ftparchive.py 2011-02-04 09:07:36 +0000
+++ lib/lp/archivepublisher/tests/test_ftparchive.py 2011-02-25 12:00:23 +0000
@@ -1,4 +1,4 @@
1# Copyright 2009-2010 Canonical Ltd. This software is licensed under the1# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).2# GNU Affero General Public License version 3 (see the file LICENSE).
33
4"""Tests for ftparchive.py"""4"""Tests for ftparchive.py"""
@@ -15,11 +15,18 @@
15from zope.component import getUtility15from zope.component import getUtility
1616
17from canonical.config import config17from canonical.config import config
18from lp.services.log.logger import BufferLogger18from lp.services.log.logger import (
19from canonical.testing.layers import LaunchpadZopelessLayer19 BufferLogger,
20 DevNullLogger,
21 )
22from canonical.testing.layers import (
23 LaunchpadZopelessLayer,
24 ZopelessDatabaseLayer,
25 )
20from lp.archivepublisher.config import getPubConfig26from lp.archivepublisher.config import getPubConfig
21from lp.archivepublisher.diskpool import DiskPool27from lp.archivepublisher.diskpool import DiskPool
22from lp.archivepublisher.model.ftparchive import (28from lp.archivepublisher.model.ftparchive import (
29 AptFTPArchiveFailure,
23 f_touch,30 f_touch,
24 FTPArchiveHandler,31 FTPArchiveHandler,
25 )32 )
@@ -290,7 +297,7 @@
290 # those kind of tests and avoid to run it when performing 'make297 # those kind of tests and avoid to run it when performing 'make
291 # check'. Although they should remain active in PQM to avoid possible298 # check'. Although they should remain active in PQM to avoid possible
292 # regressions.299 # regressions.
293 assert fa.runApt(apt_conf) == 0300 fa.runApt(apt_conf)
294 self._verifyFile("Packages",301 self._verifyFile("Packages",
295 os.path.join(self._distsdir, "hoary-test", "main", "binary-i386"))302 os.path.join(self._distsdir, "hoary-test", "main", "binary-i386"))
296 self._verifyFile("Sources",303 self._verifyFile("Sources",
@@ -305,7 +312,7 @@
305 assert len(file(apt_conf).readlines()) == 23312 assert len(file(apt_conf).readlines()) == 23
306313
307 # XXX cprov 2007-03-21: see above, do not run a-f on dev machines.314 # XXX cprov 2007-03-21: see above, do not run a-f on dev machines.
308 assert fa.runApt(apt_conf) == 0315 fa.runApt(apt_conf)
309316
310 def test_generateConfig_empty_and_careful(self):317 def test_generateConfig_empty_and_careful(self):
311 # Generate apt-ftparchive config for an specific empty suite.318 # Generate apt-ftparchive config for an specific empty suite.
@@ -359,20 +366,98 @@
359 self.assertEqual(apt_conf_content, sample_content)366 self.assertEqual(apt_conf_content, sample_content)
360367
361 # XXX cprov 2007-03-21: see above, do not run a-f on dev machines.368 # XXX cprov 2007-03-21: see above, do not run a-f on dev machines.
362 self.assertEqual(fa.runApt(apt_conf), 0)369 fa.runApt(apt_conf)
363 self.assertTrue(os.path.exists(370 self.assertTrue(os.path.exists(
364 os.path.join(self._distsdir, "hoary-test-updates", "main",371 os.path.join(self._distsdir, "hoary-test-updates", "main",
365 "binary-i386", "Packages")))372 "binary-i386", "Packages")))
366 self.assertTrue(os.path.exists(373 self.assertTrue(os.path.exists(
367 os.path.join(self._distsdir, "hoary-test-updates", "main",374 os.path.join(self._distsdir, "hoary-test-updates", "main",
368 "source", "Sources")))375 "source", "Sources")))
369376
370 self.assertFalse(os.path.exists(377 self.assertFalse(os.path.exists(
371 os.path.join(self._distsdir, "hoary-test", "main",378 os.path.join(self._distsdir, "hoary-test", "main",
372 "binary-i386", "Packages")))379 "binary-i386", "Packages")))
373 self.assertFalse(os.path.exists(380 self.assertFalse(os.path.exists(
374 os.path.join(self._distsdir, "hoary-test", "main",381 os.path.join(self._distsdir, "hoary-test", "main",
375 "source", "Sources")))382 "source", "Sources")))
383
384
385class TestFTPArchiveRunApt(TestCaseWithFactory):
386 """Test `FTPArchive`'s execution of apt-ftparchive."""
387
388 layer = ZopelessDatabaseLayer
389
390 def _makeMatchingDistroArchSeries(self):
391 """Create two `DistroArchSeries` for the same distro and processor."""
392 distro = self.factory.makeDistribution()
393 processor = self.factory.makeProcessor()
394 return (
395 self.factory.makeDistroArchSeries(
396 distroseries=self.factory.makeDistroSeries(distro),
397 processorfamily=processor.family,
398 architecturetag=processor.name)
399 for counter in (1, 2))
400
401 def test_getArchitectureTags_starts_out_empty(self):
402 fa = FTPArchiveHandler(
403 DevNullLogger(), None, None, self.factory.makeDistribution(),
404 None)
405 self.assertContentEqual([], fa._getArchitectureTags())
406
407 def test_getArchitectureTags_includes_enabled_architectures(self):
408 distroarchseries = self.factory.makeDistroArchSeries()
409 fa = FTPArchiveHandler(
410 DevNullLogger(), None, None,
411 distroarchseries.distroseries.distribution, None)
412 self.assertContentEqual(
413 [distroarchseries.architecturetag], fa._getArchitectureTags())
414
415 def test_getArchitectureTags_considers_all_series(self):
416 distro = self.factory.makeDistribution()
417 affluent_antilope = self.factory.makeDistroSeries(distribution=distro)
418 bilious_baboon = self.factory.makeDistroSeries(distribution=distro)
419 affluent_arch = self.factory.makeDistroArchSeries(
420 distroseries=affluent_antilope)
421 bilious_arch = self.factory.makeDistroArchSeries(
422 distroseries=bilious_baboon)
423 fa = FTPArchiveHandler(DevNullLogger(), None, None, distro, None)
424 self.assertContentEqual(
425 [affluent_arch.architecturetag, bilious_arch.architecturetag],
426 fa._getArchitectureTags())
427
428 def test_getArchitectureTags_ignores_disabled_architectures(self):
429 distroarchseries = self.factory.makeDistroArchSeries()
430 distroarchseries.enabled = False
431 fa = FTPArchiveHandler(
432 DevNullLogger(), None, None,
433 distroarchseries.distroseries.distribution, None)
434 self.assertContentEqual([], fa._getArchitectureTags())
435
436 def test_getArchitectureTags_contains_no_duplicates(self):
437 ominous_okapi, pilfering_puppy = self._makeMatchingDistroArchSeries()
438 fa = FTPArchiveHandler(
439 DevNullLogger(), None, None,
440 ominous_okapi.distroseries.distribution, None)
441 self.assertEqual(1, len(list(fa._getArchitectureTags())))
442 self.assertContentEqual(
443 [ominous_okapi.architecturetag], fa._getArchitectureTags())
444
445 def test_getArchitectureTags_counts_any_architecture_enabled_once(self):
446 manic_mantis, nervous_nit = self._makeMatchingDistroArchSeries()
447 nervous_nit.enabled = False
448 fa = FTPArchiveHandler(
449 DevNullLogger(), None, None,
450 manic_mantis.distroseries.distribution, None)
451 self.assertContentEqual(
452 [manic_mantis.architecturetag], fa._getArchitectureTags())
453
454 def test_runApt_reports_failure(self):
455 # If we sabotage apt-ftparchive, runApt notices that it failed
456 # and raises an exception.
457 distroarchseries = self.factory.makeDistroArchSeries()
458 distro = distroarchseries.distroseries.distribution
459 fa = FTPArchiveHandler(DevNullLogger(), None, None, distro, None)
460 self.assertRaises(AptFTPArchiveFailure, fa.runApt, "bogus-config")
376461
377462
378class TestFTouch(unittest.TestCase):463class TestFTouch(unittest.TestCase):