Merge lp:~jtv/launchpad/bug-884649-branch-3 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Julian Edwards
Approved revision: no longer in the source branch.
Merged at revision: 14247
Proposed branch: lp:~jtv/launchpad/bug-884649-branch-3
Merge into: lp:launchpad
Prerequisite: lp:~jtv/launchpad/bug-884649-branch-2
Diff against target: 110 lines (+57/-9)
2 files modified
lib/lp/archivepublisher/domination.py (+31/-9)
lib/lp/archivepublisher/tests/test_dominator.py (+26/-0)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-884649-branch-3
Reviewer Review Type Date Requested Status
Julian Edwards (community) Approve
Review via email: mp+81136@code.launchpad.net

This proposal supersedes a proposal from 2011-11-03.

Commit message

[r=julian-edwards] Limit 2nd binary-domination pass to packages with arch-indep publications.

Description of the change

= Summary =

Domination is slow. A major reason is the two-pass domination algorithm needed for binary publications.

== Proposed fix ==

The first binary-domination pass touches only architecture-specific publications. This pass is relatively cheap, per package dominated.

The second pass touches only architecture-independent publications. This pass is expensive per package dominated. It probably dominates the same number of packages as the first pass, but for most, does nothing.

So: keep track during the first pass of which packages have architecture-independent publications, and limit the second pass to just those.

== Pre-implementation notes ==

This was Julian's idea.

== Implementation details ==

I made the second pass iterate over the intersection of “packages with multiple live publications” and “packages that were found during the first pass to have architecture-independent live publications.” This is because a package might conceivably have architecture-independent live publications in one architecture, but no live publications at all in another.

That's not really supposed to happen, which is to say it can be helpful to be prepared for the case but it's not worth optimizing for.

== Tests ==

All the high-level desired outcomes and integration are tested in scenario tests; those remain unchanged because this is a functionally neutral optimization.

I did add one helper function, which is short but easy to mess up and so it gets its own series of tests.

{{{
./bin/test -vvc lp.archivepublisher.tests.test_dominator
}}}

== Demo and Q/A ==

Run the dominator. It should be tons faster, but still dominate even architecture-independent binary publications.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/domination.py
  lib/lp/archivepublisher/tests/test_dominator.py

To post a comment you must log in.
Revision history for this message
Julian Edwards (julian-edwards) wrote :

8 +def contains_arch_indep(bpphs):
9 + """Are any of the publications among `bpphs` architecture-independent?"""
10 + return any(not bpph.architecture_specific for bpph in bpphs)
11 +

Again be careful of the extra masked query. I assume the FK to BPR is already loaded though.

review: Approve
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Yup, already loaded. Thanks.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/archivepublisher/domination.py'
--- lib/lp/archivepublisher/domination.py 2011-11-03 12:00:12 +0000
+++ lib/lp/archivepublisher/domination.py 2011-11-03 12:00:14 +0000
@@ -296,6 +296,11 @@
296 return get_binary_versions([latest] + arch_specific_pubs + reprieved_pubs)296 return get_binary_versions([latest] + arch_specific_pubs + reprieved_pubs)
297297
298298
299def contains_arch_indep(bpphs):
300 """Are any of the publications among `bpphs` architecture-independent?"""
301 return any(not bpph.architecture_specific for bpph in bpphs)
302
303
299class Dominator:304class Dominator:
300 """Manage the process of marking packages as superseded.305 """Manage the process of marking packages as superseded.
301306
@@ -568,6 +573,18 @@
568 """573 """
569 generalization = GeneralizedPublication(is_source=False)574 generalization = GeneralizedPublication(is_source=False)
570575
576 # Domination happens in two passes. The first tries to
577 # supersede architecture-dependent publications; the second
578 # tries to supersede architecture-independent ones. An
579 # architecture-independent pub is kept alive as long as any
580 # architecture-dependent pubs from the same source package build
581 # are still live for any architecture, because they may depend
582 # on the architecture-independent package.
583 # Thus we limit the second pass to those packages that have
584 # published, architecture-independent publications; anything
585 # else will have completed domination in the first pass.
586 packages_w_arch_indep = set()
587
571 for distroarchseries in distroseries.architectures:588 for distroarchseries in distroseries.architectures:
572 self.logger.info(589 self.logger.info(
573 "Performing domination across %s/%s (%s)",590 "Performing domination across %s/%s (%s)",
@@ -583,20 +600,25 @@
583 assert len(pubs) > 0, "Dominating zero binaries!"600 assert len(pubs) > 0, "Dominating zero binaries!"
584 live_versions = find_live_binary_versions_pass_1(pubs)601 live_versions = find_live_binary_versions_pass_1(pubs)
585 self.dominatePackage(pubs, live_versions, generalization)602 self.dominatePackage(pubs, live_versions, generalization)
586603 if contains_arch_indep(pubs):
587 # We need to make a second pass to cover the cases where:604 packages_w_arch_indep.add(name)
588 # * arch-specific binaries were not all dominated before arch-all605
589 # ones that depend on them606 packages_w_arch_indep = frozenset(packages_w_arch_indep)
590 # * An arch-all turned into an arch-specific, or vice-versa607
591 # * A package is completely schizophrenic and changes all of608 # The second pass attempts to supersede arch-all publications of
592 # its binaries between arch-all and arch-any (apparently609 # older versions, from source package releases that no longer
593 # occurs sometimes!)610 # have any active arch-specific publications that might depend
611 # on the arch-indep ones.
612 # (In maintaining this code, bear in mind that some or all of a
613 # source package's binary packages may switch between
614 # arch-specific and arch-indep between releases.)
594 for distroarchseries in distroseries.architectures:615 for distroarchseries in distroseries.architectures:
595 self.logger.info("Finding binaries...(2nd pass)")616 self.logger.info("Finding binaries...(2nd pass)")
596 bins = self.findBinariesForDomination(distroarchseries, pocket)617 bins = self.findBinariesForDomination(distroarchseries, pocket)
597 sorted_packages = self._sortPackages(bins, generalization)618 sorted_packages = self._sortPackages(bins, generalization)
598 self.logger.info("Dominating binaries...(2nd pass)")619 self.logger.info("Dominating binaries...(2nd pass)")
599 for name, pubs in sorted_packages.iteritems():620 for name in packages_w_arch_indep.intersection(sorted_packages):
621 pubs = sorted_packages[name]
600 self.logger.debug("Dominating %s" % name)622 self.logger.debug("Dominating %s" % name)
601 assert len(pubs) > 0, "Dominating zero binaries in 2nd pass!"623 assert len(pubs) > 0, "Dominating zero binaries in 2nd pass!"
602 live_versions = find_live_binary_versions_pass_2(pubs)624 live_versions = find_live_binary_versions_pass_2(pubs)
603625
=== modified file 'lib/lp/archivepublisher/tests/test_dominator.py'
--- lib/lp/archivepublisher/tests/test_dominator.py 2011-11-03 12:00:12 +0000
+++ lib/lp/archivepublisher/tests/test_dominator.py 2011-11-03 12:00:14 +0000
@@ -15,6 +15,7 @@
15from canonical.database.sqlbase import flush_database_updates15from canonical.database.sqlbase import flush_database_updates
16from canonical.testing.layers import ZopelessDatabaseLayer16from canonical.testing.layers import ZopelessDatabaseLayer
17from lp.archivepublisher.domination import (17from lp.archivepublisher.domination import (
18 contains_arch_indep,
18 Dominator,19 Dominator,
19 find_live_binary_versions_pass_1,20 find_live_binary_versions_pass_1,
20 find_live_binary_versions_pass_2,21 find_live_binary_versions_pass_2,
@@ -1190,3 +1191,28 @@
1190 make_publications_arch_specific([dependent], True)1191 make_publications_arch_specific([dependent], True)
1191 self.assertEqual(1192 self.assertEqual(
1192 ['1.2', '1.1'], find_live_binary_versions_pass_2(bpphs))1193 ['1.2', '1.1'], find_live_binary_versions_pass_2(bpphs))
1194
1195
1196class TestDominationHelpers(TestCaseWithFactory):
1197 """Test lightweight helpers for the `Dominator`."""
1198
1199 layer = ZopelessDatabaseLayer
1200
1201 def test_contains_arch_indep_says_True_for_arch_indep(self):
1202 bpphs = [self.factory.makeBinaryPackagePublishingHistory()]
1203 make_publications_arch_specific(bpphs, False)
1204 self.assertTrue(contains_arch_indep(bpphs))
1205
1206 def test_contains_arch_indep_says_False_for_arch_specific(self):
1207 bpphs = [self.factory.makeBinaryPackagePublishingHistory()]
1208 make_publications_arch_specific(bpphs, True)
1209 self.assertFalse(contains_arch_indep(bpphs))
1210
1211 def test_contains_arch_indep_says_True_for_combination(self):
1212 bpphs = make_bpphs_for_versions(self.factory, ['1.1', '1.0'])
1213 make_publications_arch_specific(bpphs[:1], True)
1214 make_publications_arch_specific(bpphs[1:], False)
1215 self.assertTrue(contains_arch_indep(bpphs))
1216
1217 def test_contains_arch_indep_says_False_for_empty_list(self):
1218 self.assertFalse(contains_arch_indep([]))