Merge lp:~jtv/launchpad/bug-884649-branch-3 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Julian Edwards
Approved revision: no longer in the source branch.
Merged at revision: 14247
Proposed branch: lp:~jtv/launchpad/bug-884649-branch-3
Merge into: lp:launchpad
Prerequisite: lp:~jtv/launchpad/bug-884649-branch-2
Diff against target: 110 lines (+57/-9)
2 files modified
lib/lp/archivepublisher/domination.py (+31/-9)
lib/lp/archivepublisher/tests/test_dominator.py (+26/-0)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-884649-branch-3
Reviewer Review Type Date Requested Status
Julian Edwards (community) Approve
Review via email: mp+81136@code.launchpad.net

This proposal supersedes a proposal from 2011-11-03.

Commit message

[r=julian-edwards] Limit 2nd binary-domination pass to packages with arch-indep publications.

Description of the change

= Summary =

Domination is slow. A major reason is the two-pass domination algorithm needed for binary publications.

== Proposed fix ==

The first binary-domination pass touches only architecture-specific publications. This pass is relatively cheap, per package dominated.

The second pass touches only architecture-independent publications. This pass is expensive per package dominated. It probably dominates the same number of packages as the first pass, but for most, does nothing.

So: keep track during the first pass of which packages have architecture-independent publications, and limit the second pass to just those.

== Pre-implementation notes ==

This was Julian's idea.

== Implementation details ==

I made the second pass iterate over the intersection of “packages with multiple live publications” and “packages that were found during the first pass to have architecture-independent live publications.” This is because a package might conceivably have architecture-independent live publications in one architecture, but no live publications at all in another.

That's not really supposed to happen, which is to say it can be helpful to be prepared for the case but it's not worth optimizing for.

== Tests ==

All the high-level desired outcomes and integration are tested in scenario tests; those remain unchanged because this is a functionally neutral optimization.

I did add one helper function, which is short but easy to mess up and so it gets its own series of tests.

{{{
./bin/test -vvc lp.archivepublisher.tests.test_dominator
}}}

== Demo and Q/A ==

Run the dominator. It should be tons faster, but still dominate even architecture-independent binary publications.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/domination.py
  lib/lp/archivepublisher/tests/test_dominator.py

To post a comment you must log in.
Revision history for this message
Julian Edwards (julian-edwards) wrote :

8 +def contains_arch_indep(bpphs):
9 + """Are any of the publications among `bpphs` architecture-independent?"""
10 + return any(not bpph.architecture_specific for bpph in bpphs)
11 +

Again be careful of the extra masked query. I assume the FK to BPR is already loaded though.

review: Approve
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Yup, already loaded. Thanks.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/archivepublisher/domination.py'
2--- lib/lp/archivepublisher/domination.py 2011-11-03 12:00:12 +0000
3+++ lib/lp/archivepublisher/domination.py 2011-11-03 12:00:14 +0000
4@@ -296,6 +296,11 @@
5 return get_binary_versions([latest] + arch_specific_pubs + reprieved_pubs)
6
7
8+def contains_arch_indep(bpphs):
9+ """Are any of the publications among `bpphs` architecture-independent?"""
10+ return any(not bpph.architecture_specific for bpph in bpphs)
11+
12+
13 class Dominator:
14 """Manage the process of marking packages as superseded.
15
16@@ -568,6 +573,18 @@
17 """
18 generalization = GeneralizedPublication(is_source=False)
19
20+ # Domination happens in two passes. The first tries to
21+ # supersede architecture-dependent publications; the second
22+ # tries to supersede architecture-independent ones. An
23+ # architecture-independent pub is kept alive as long as any
24+ # architecture-dependent pubs from the same source package build
25+ # are still live for any architecture, because they may depend
26+ # on the architecture-independent package.
27+ # Thus we limit the second pass to those packages that have
28+ # published, architecture-independent publications; anything
29+ # else will have completed domination in the first pass.
30+ packages_w_arch_indep = set()
31+
32 for distroarchseries in distroseries.architectures:
33 self.logger.info(
34 "Performing domination across %s/%s (%s)",
35@@ -583,20 +600,25 @@
36 assert len(pubs) > 0, "Dominating zero binaries!"
37 live_versions = find_live_binary_versions_pass_1(pubs)
38 self.dominatePackage(pubs, live_versions, generalization)
39-
40- # We need to make a second pass to cover the cases where:
41- # * arch-specific binaries were not all dominated before arch-all
42- # ones that depend on them
43- # * An arch-all turned into an arch-specific, or vice-versa
44- # * A package is completely schizophrenic and changes all of
45- # its binaries between arch-all and arch-any (apparently
46- # occurs sometimes!)
47+ if contains_arch_indep(pubs):
48+ packages_w_arch_indep.add(name)
49+
50+ packages_w_arch_indep = frozenset(packages_w_arch_indep)
51+
52+ # The second pass attempts to supersede arch-all publications of
53+ # older versions, from source package releases that no longer
54+ # have any active arch-specific publications that might depend
55+ # on the arch-indep ones.
56+ # (In maintaining this code, bear in mind that some or all of a
57+ # source package's binary packages may switch between
58+ # arch-specific and arch-indep between releases.)
59 for distroarchseries in distroseries.architectures:
60 self.logger.info("Finding binaries...(2nd pass)")
61 bins = self.findBinariesForDomination(distroarchseries, pocket)
62 sorted_packages = self._sortPackages(bins, generalization)
63 self.logger.info("Dominating binaries...(2nd pass)")
64- for name, pubs in sorted_packages.iteritems():
65+ for name in packages_w_arch_indep.intersection(sorted_packages):
66+ pubs = sorted_packages[name]
67 self.logger.debug("Dominating %s" % name)
68 assert len(pubs) > 0, "Dominating zero binaries in 2nd pass!"
69 live_versions = find_live_binary_versions_pass_2(pubs)
70
71=== modified file 'lib/lp/archivepublisher/tests/test_dominator.py'
72--- lib/lp/archivepublisher/tests/test_dominator.py 2011-11-03 12:00:12 +0000
73+++ lib/lp/archivepublisher/tests/test_dominator.py 2011-11-03 12:00:14 +0000
74@@ -15,6 +15,7 @@
75 from canonical.database.sqlbase import flush_database_updates
76 from canonical.testing.layers import ZopelessDatabaseLayer
77 from lp.archivepublisher.domination import (
78+ contains_arch_indep,
79 Dominator,
80 find_live_binary_versions_pass_1,
81 find_live_binary_versions_pass_2,
82@@ -1190,3 +1191,28 @@
83 make_publications_arch_specific([dependent], True)
84 self.assertEqual(
85 ['1.2', '1.1'], find_live_binary_versions_pass_2(bpphs))
86+
87+
88+class TestDominationHelpers(TestCaseWithFactory):
89+ """Test lightweight helpers for the `Dominator`."""
90+
91+ layer = ZopelessDatabaseLayer
92+
93+ def test_contains_arch_indep_says_True_for_arch_indep(self):
94+ bpphs = [self.factory.makeBinaryPackagePublishingHistory()]
95+ make_publications_arch_specific(bpphs, False)
96+ self.assertTrue(contains_arch_indep(bpphs))
97+
98+ def test_contains_arch_indep_says_False_for_arch_specific(self):
99+ bpphs = [self.factory.makeBinaryPackagePublishingHistory()]
100+ make_publications_arch_specific(bpphs, True)
101+ self.assertFalse(contains_arch_indep(bpphs))
102+
103+ def test_contains_arch_indep_says_True_for_combination(self):
104+ bpphs = make_bpphs_for_versions(self.factory, ['1.1', '1.0'])
105+ make_publications_arch_specific(bpphs[:1], True)
106+ make_publications_arch_specific(bpphs[1:], False)
107+ self.assertTrue(contains_arch_indep(bpphs))
108+
109+ def test_contains_arch_indep_says_False_for_empty_list(self):
110+ self.assertFalse(contains_arch_indep([]))