Merge lp:~jtv/launchpad/bug-884649-branch-1 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Julian Edwards
Approved revision: no longer in the source branch.
Merged at revision: 14232
Proposed branch: lp:~jtv/launchpad/bug-884649-branch-1
Merge into: lp:launchpad
Diff against target: 502 lines (+276/-89)
3 files modified
lib/lp/archivepublisher/domination.py (+104/-75)
lib/lp/archivepublisher/tests/test_dominator.py (+170/-2)
lib/lp/soyuz/model/publishing.py (+2/-12)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-884649-branch-1
Reviewer Review Type Date Requested Status
Julian Edwards (community) Approve
Review via email: mp+80988@code.launchpad.net

Commit message

[r=julian-edwards][bug=884649] Part 1 of Dominator optimization.

Description of the change

= Summary =

The Dominator is having performance problems, now that we've introduced binary double-domination. (It's probably not as exciting as it sounds, but it'll fix a lot of uninstallable packages.)

Getting dominator performance under control needs some careful but radical internal reorganizations.

== Proposed fix ==

Prepare for some of the coming changes (yes, there's a bug-884649-branch-2 in the works) and, while we're at it, cut some dead wood out of the main query involved in the added functionality.

That main query is executed for almost any binary package publication at some point in its lifetime. The dead wood is two tables being joined through SourcePackageRelease, when all that's needed is an equijoin on the foreign keys to SourcePackageRelease. The table itself isn't needed in the join; this was easy to miss given the complexity of the search.

== Pre-implementation notes ==

Discussed extensively with Julian. see bug 884649 for notes.

== Implementation details ==

You'll see two complex queries extracted into new methods, so as to keep the higher-level logic clearer and more isolated. This serves another purpose: the same logic was constructed once and then invoked twice, from a little locally-defined function that I need to break up. Had to find other ways to minimize repetition.

I'm also adding some code duplication: two poorly-related conditions in an if/elif chain that have the same, very simple body. It was clearer as an intermediate step to separate those into separate elifs. But rest assured: my next branch hoists that logic out of there and into a dedicated policy function (which is used only where needed, thus saving time). So the duplication is only there so that the complexity can be more easily removed.

== Tests ==

I ran them all, actually. But anything with “dominator” or “domination” or “dominate” in the name is particularly likely to be relevant:
{{{
./bin/test -vvc -t dominat
}}}

== Demo and Q/A ==

The dominator should still work, and probably be just slightly faster. It's hard to benchmark accurately though, since every run will face different sets of active publications. We could just run it twice on a second server and time the second run; the second run won't be as representative because it has no changes to make, but at least it will run in predictable time.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/domination.py
  lib/lp/soyuz/model/publishing.py

To post a comment you must log in.
Revision history for this message
Julian Edwards (julian-edwards) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/archivepublisher/domination.py'
2--- lib/lp/archivepublisher/domination.py 2011-10-18 11:56:09 +0000
3+++ lib/lp/archivepublisher/domination.py 2011-11-02 10:31:27 +0000
4@@ -53,6 +53,7 @@
5 __all__ = ['Dominator']
6
7 from datetime import timedelta
8+from operator import itemgetter
9
10 import apt_pkg
11 from storm.expr import (
12@@ -67,6 +68,9 @@
13 flush_database_updates,
14 sqlvalues,
15 )
16+from canonical.launchpad.components.decoratedresultset import (
17+ DecoratedResultSet,
18+ )
19 from canonical.launchpad.interfaces.lpstorm import IStore
20 from lp.registry.model.sourcepackagename import SourcePackageName
21 from lp.services.database.bulk import load_related
22@@ -258,7 +262,8 @@
23 # Go through publications from latest version to oldest. This
24 # makes it easy to figure out which release superseded which:
25 # the dominant is always the oldest live release that is newer
26- # than the one being superseded.
27+ # than the one being superseded. In this loop, that means the
28+ # dominant is always the last live publication we saw.
29 publications = sorted(
30 publications, cmp=generalization.compare, reverse=True)
31
32@@ -272,25 +277,29 @@
33 for pub in publications:
34 version = generalization.getPackageVersion(pub)
35 # There should never be two published releases with the same
36- # version. So this comparison is really a string
37- # comparison, not a version comparison: if the versions are
38- # equal by either measure, they're from the same release.
39- if dominant_version is not None and version == dominant_version:
40+ # version. So it doesn't matter whether this comparison is
41+ # really a string comparison or a version comparison: if the
42+ # versions are equal by either measure, they're from the same
43+ # release.
44+ if version == dominant_version:
45 # This publication is for a live version, but has been
46 # superseded by a newer publication of the same version.
47 # Supersede it.
48 pub.supersede(current_dominant, logger=self.logger)
49 self.logger.debug2(
50 "Superseding older publication for version %s.", version)
51- elif (version in live_versions or
52- (not generalization.is_source and
53- not self._checkArchIndep(pub))):
54+ elif version in live_versions:
55 # This publication stays active; if any publications
56 # that follow right after this are to be superseded,
57 # this is the release that they are superseded by.
58 current_dominant = pub
59 dominant_version = version
60 self.logger.debug2("Keeping version %s.", version)
61+ elif not (generalization.is_source or self._checkArchIndep(pub)):
62+ # As a special case, we keep this version live as well.
63+ current_dominant = pub
64+ dominant_version = version
65+ self.logger.debug2("Keeping version %s.", version)
66 elif current_dominant is None:
67 # This publication is no longer live, but there is no
68 # newer version to supersede it either. Therefore it
69@@ -442,72 +451,81 @@
70 # always equals to "scheduleddeletiondate - quarantine".
71 pub_record.datemadepending = UTC_NOW
72
73+ def findBinariesForDomination(self, distroarchseries, pocket):
74+ """Find binary publications that need dominating.
75+
76+ This is only for traditional domination, where the latest published
77+ publication is always kept published. It will ignore publications
78+ that have no other publications competing for the same binary package.
79+ """
80+ # Avoid circular imports.
81+ from lp.soyuz.model.publishing import BinaryPackagePublishingHistory
82+
83+ bpph_location_clauses = [
84+ BinaryPackagePublishingHistory.status ==
85+ PackagePublishingStatus.PUBLISHED,
86+ BinaryPackagePublishingHistory.distroarchseries ==
87+ distroarchseries,
88+ BinaryPackagePublishingHistory.archive == self.archive,
89+ BinaryPackagePublishingHistory.pocket == pocket,
90+ ]
91+ candidate_binary_names = Select(
92+ BinaryPackageName.id,
93+ And(
94+ BinaryPackageRelease.binarypackagenameID ==
95+ BinaryPackageName.id,
96+ BinaryPackagePublishingHistory.binarypackagereleaseID ==
97+ BinaryPackageRelease.id,
98+ bpph_location_clauses,
99+ ),
100+ group_by=BinaryPackageName.id,
101+ having=Count(BinaryPackagePublishingHistory.id) > 1)
102+ main_clauses = [
103+ BinaryPackageRelease.id ==
104+ BinaryPackagePublishingHistory.binarypackagereleaseID,
105+ BinaryPackageRelease.binarypackagenameID.is_in(
106+ candidate_binary_names),
107+ BinaryPackageRelease.binpackageformat !=
108+ BinaryPackageFormat.DDEB,
109+ ]
110+ main_clauses.extend(bpph_location_clauses)
111+
112+ store = IStore(BinaryPackagePublishingHistory)
113+ return store.find(BinaryPackagePublishingHistory, *main_clauses)
114+
115 def dominateBinaries(self, distroseries, pocket):
116 """Perform domination on binary package publications.
117
118 Dominates binaries, restricted to `distroseries`, `pocket`, and
119 `self.archive`.
120 """
121- # Avoid circular imports.
122- from lp.soyuz.model.publishing import BinaryPackagePublishingHistory
123-
124 generalization = GeneralizedPublication(is_source=False)
125
126 for distroarchseries in distroseries.architectures:
127 self.logger.info(
128 "Performing domination across %s/%s (%s)",
129- distroseries.name, pocket.title,
130+ distroarchseries.distroseries.name, pocket.title,
131 distroarchseries.architecturetag)
132
133- bpph_location_clauses = [
134- BinaryPackagePublishingHistory.status ==
135- PackagePublishingStatus.PUBLISHED,
136- BinaryPackagePublishingHistory.distroarchseries ==
137- distroarchseries,
138- BinaryPackagePublishingHistory.archive == self.archive,
139- BinaryPackagePublishingHistory.pocket == pocket,
140- ]
141- candidate_binary_names = Select(
142- BinaryPackageName.id,
143- And(
144- BinaryPackageRelease.binarypackagenameID ==
145- BinaryPackageName.id,
146- BinaryPackagePublishingHistory.binarypackagereleaseID ==
147- BinaryPackageRelease.id,
148- bpph_location_clauses,
149- ),
150- group_by=BinaryPackageName.id,
151- having=Count(BinaryPackagePublishingHistory.id) > 1)
152- main_clauses = [
153- BinaryPackagePublishingHistory,
154- BinaryPackageRelease.id ==
155- BinaryPackagePublishingHistory.binarypackagereleaseID,
156- BinaryPackageRelease.binarypackagenameID.is_in(
157- candidate_binary_names),
158- BinaryPackageRelease.binpackageformat !=
159- BinaryPackageFormat.DDEB,
160- ]
161- main_clauses.extend(bpph_location_clauses)
162-
163- def do_domination(pass2_msg=""):
164- msg = "binaries..." + pass2_msg
165- self.logger.info("Finding %s" % msg)
166- bins = IStore(
167- BinaryPackagePublishingHistory).find(*main_clauses)
168- self.logger.info("Dominating %s" % msg)
169- sorted_packages = self._sortPackages(bins, generalization)
170- self._dominatePublications(sorted_packages, generalization)
171-
172- do_domination()
173-
174- # We need to make a second pass to cover the cases where:
175- # * arch-specific binaries were not all dominated before arch-all
176- # ones that depend on them
177- # * An arch-all turned into an arch-specific, or vice-versa
178- # * A package is completely schizophrenic and changes all of
179- # its binaries between arch-all and arch-any (apparently
180- # occurs sometimes!)
181- do_domination("(2nd pass)")
182+ self.logger.info("Finding binaries...")
183+ bins = self.findBinariesForDomination(distroarchseries, pocket)
184+ sorted_packages = self._sortPackages(bins, generalization)
185+ self.logger.info("Dominating binaries...")
186+ self._dominatePublications(sorted_packages, generalization)
187+
188+ # We need to make a second pass to cover the cases where:
189+ # * arch-specific binaries were not all dominated before arch-all
190+ # ones that depend on them
191+ # * An arch-all turned into an arch-specific, or vice-versa
192+ # * A package is completely schizophrenic and changes all of
193+ # its binaries between arch-all and arch-any (apparently
194+ # occurs sometimes!)
195+ for distroarchseries in distroseries.architectures:
196+ self.logger.info("Finding binaries...(2nd pass)")
197+ bins = self.findBinariesForDomination(distroarchseries, pocket)
198+ sorted_packages = self._sortPackages(bins, generalization)
199+ self.logger.info("Dominating binaries...(2nd pass)")
200+ self._dominatePublications(sorted_packages, generalization)
201
202 def _composeActiveSourcePubsCondition(self, distroseries, pocket):
203 """Compose ORM condition for restricting relevant source pubs."""
204@@ -522,21 +540,11 @@
205 SourcePackagePublishingHistory.pocket == pocket,
206 )
207
208- def dominateSources(self, distroseries, pocket):
209- """Perform domination on source package publications.
210-
211- Dominates sources, restricted to `distroseries`, `pocket`, and
212- `self.archive`.
213- """
214+ def findSourcesForDomination(self, distroseries, pocket):
215+ """Find binary publications that need dominating."""
216 # Avoid circular imports.
217 from lp.soyuz.model.publishing import SourcePackagePublishingHistory
218
219- generalization = GeneralizedPublication(is_source=True)
220-
221- self.logger.debug(
222- "Performing domination across %s/%s (Source)",
223- distroseries.name, pocket.title)
224-
225 spph_location_clauses = self._composeActiveSourcePubsCondition(
226 distroseries, pocket)
227 having_multiple_active_publications = (
228@@ -546,12 +554,33 @@
229 And(join_spph_spr(), join_spr_spn(), spph_location_clauses),
230 group_by=SourcePackageName.id,
231 having=having_multiple_active_publications)
232- sources = IStore(SourcePackagePublishingHistory).find(
233- SourcePackagePublishingHistory,
234+
235+ # We'll also access the SourcePackageReleases associated with
236+ # the publications we find. Since they're in the join anyway,
237+ # load them alongside the publications.
238+ # Actually we'll also want the SourcePackageNames, but adding
239+ # those to the (outer) query would complicate it, and
240+ # potentially slow it down.
241+ query = IStore(SourcePackagePublishingHistory).find(
242+ (SourcePackagePublishingHistory, SourcePackageRelease),
243 join_spph_spr(),
244 SourcePackageRelease.sourcepackagenameID.is_in(
245 candidate_source_names),
246 spph_location_clauses)
247+ return DecoratedResultSet(query, itemgetter(0))
248+
249+ def dominateSources(self, distroseries, pocket):
250+ """Perform domination on source package publications.
251+
252+ Dominates sources, restricted to `distroseries`, `pocket`, and
253+ `self.archive`.
254+ """
255+ self.logger.debug(
256+ "Performing domination across %s/%s (Source)",
257+ distroseries.name, pocket.title)
258+
259+ generalization = GeneralizedPublication(is_source=True)
260+ sources = self.findSourcesForDomination(distroseries, pocket)
261
262 self.logger.debug("Dominating sources...")
263 self._dominatePublications(
264
265=== modified file 'lib/lp/archivepublisher/tests/test_dominator.py'
266--- lib/lp/archivepublisher/tests/test_dominator.py 2011-10-27 11:36:13 +0000
267+++ lib/lp/archivepublisher/tests/test_dominator.py 2011-11-02 10:31:27 +0000
268@@ -361,7 +361,10 @@
269 def make_spphs_for_versions(factory, versions):
270 """Create publication records for each of `versions`.
271
272- They records are created in the same order in which they are specified.
273+ All these publications will be in the same source package, archive,
274+ distroseries, and pocket. They will all be in Published status.
275+
276+ The records are created in the same order in which they are specified.
277 Make the order irregular to prove that version ordering is not a
278 coincidence of object creation order etc.
279
280@@ -371,6 +374,7 @@
281 spn = factory.makeSourcePackageName()
282 distroseries = factory.makeDistroSeries()
283 pocket = factory.getAnyPocket()
284+ archive = distroseries.main_archive
285 sprs = [
286 factory.makeSourcePackageRelease(
287 sourcepackagename=spn, version=version)
288@@ -378,11 +382,34 @@
289 return [
290 factory.makeSourcePackagePublishingHistory(
291 distroseries=distroseries, pocket=pocket,
292- sourcepackagerelease=spr,
293+ sourcepackagerelease=spr, archive=archive,
294 status=PackagePublishingStatus.PUBLISHED)
295 for spr in sprs]
296
297
298+def make_bpphs_for_versions(factory, versions):
299+ """Create publication records for each of `versions`.
300+
301+ All these publications will be in the same binary package, source
302+ package, archive, distroarchseries, and pocket. They will all be in
303+ Published status.
304+ """
305+ bpn = factory.makeBinaryPackageName()
306+ spn = factory.makeSourcePackageName()
307+ das = factory.makeDistroArchSeries()
308+ archive = das.distroseries.main_archive
309+ pocket = factory.getAnyPocket()
310+ bprs = [
311+ factory.makeBinaryPackageRelease(binarypackagename=bpn)
312+ for version in versions]
313+ return [
314+ factory.makeBinaryPackagePublishingHistory(
315+ binarypackagerelease=bpr, binarypackagename=bpn,
316+ distroarchseries=das, pocket=pocket, archive=archive,
317+ sourcepackagename=spn, status=PackagePublishingStatus.PUBLISHED)
318+ for bpr in bprs]
319+
320+
321 def list_source_versions(spphs):
322 """Extract the versions from `spphs` as a list, in the same order."""
323 return [spph.sourcepackagerelease.version for spph in spphs]
324@@ -921,3 +948,144 @@
325 [],
326 dominator.findPublishedSPPHs(
327 spph.distroseries, spph.pocket, other_package.name))
328+
329+ def test_findBinariesForDomination_finds_published_publications(self):
330+ bpphs = make_bpphs_for_versions(self.factory, ['1.0', '1.1'])
331+ dominator = self.makeDominator(bpphs)
332+ self.assertContentEqual(
333+ bpphs, dominator.findBinariesForDomination(
334+ bpphs[0].distroarchseries, bpphs[0].pocket))
335+
336+ def test_findBinariesForDomination_skips_single_pub_packages(self):
337+ # The domination algorithm that uses findBinariesForDomination
338+ # always keeps the latest version live. Thus, a single
339+ # publication isn't worth dominating. findBinariesForDomination
340+ # won't return it.
341+ bpphs = make_bpphs_for_versions(self.factory, ['1.0'])
342+ dominator = self.makeDominator(bpphs)
343+ self.assertContentEqual(
344+ [], dominator.findBinariesForDomination(
345+ bpphs[0].distroarchseries, bpphs[0].pocket))
346+
347+ def test_findBinariesForDomination_ignores_other_distroseries(self):
348+ bpphs = make_bpphs_for_versions(self.factory, ['1.0', '1.1'])
349+ dominator = self.makeDominator(bpphs)
350+ das = bpphs[0].distroarchseries
351+ other_series = self.factory.makeDistroSeries(
352+ distribution=das.distroseries.distribution)
353+ other_das = self.factory.makeDistroArchSeries(
354+ distroseries=other_series, architecturetag=das.architecturetag,
355+ processorfamily=das.processorfamily)
356+ self.assertContentEqual(
357+ [], dominator.findBinariesForDomination(
358+ other_das, bpphs[0].pocket))
359+
360+ def test_findBinariesForDomination_ignores_other_architectures(self):
361+ bpphs = make_bpphs_for_versions(self.factory, ['1.0', '1.1'])
362+ dominator = self.makeDominator(bpphs)
363+ other_das = self.factory.makeDistroArchSeries(
364+ distroseries=bpphs[0].distroseries)
365+ self.assertContentEqual(
366+ [], dominator.findBinariesForDomination(
367+ other_das, bpphs[0].pocket))
368+
369+ def test_findBinariesForDomination_ignores_other_archive(self):
370+ bpphs = make_bpphs_for_versions(self.factory, ['1.0', '1.1'])
371+ dominator = self.makeDominator(bpphs)
372+ dominator.archive = self.factory.makeArchive()
373+ self.assertContentEqual(
374+ [], dominator.findBinariesForDomination(
375+ bpphs[0].distroarchseries, bpphs[0].pocket))
376+
377+ def test_findBinariesForDomination_ignores_other_pocket(self):
378+ bpphs = make_bpphs_for_versions(self.factory, ['1.0', '1.1'])
379+ dominator = self.makeDominator(bpphs)
380+ for bpph in bpphs:
381+ removeSecurityProxy(bpph).pocket = PackagePublishingPocket.UPDATES
382+ self.assertContentEqual(
383+ [], dominator.findBinariesForDomination(
384+ bpphs[0].distroarchseries, PackagePublishingPocket.SECURITY))
385+
386+ def test_findBinariesForDomination_ignores_other_status(self):
387+ # If we have one BPPH for each possible status, plus one
388+ # Published one to stop findBinariesForDomination from skipping
389+ # the package, findBinariesForDomination returns only the
390+ # Published ones.
391+ versions = [
392+ '1.%d' % self.factory.getUniqueInteger()
393+ for status in PackagePublishingStatus.items] + ['0.9']
394+ bpphs = make_bpphs_for_versions(self.factory, versions)
395+ dominator = self.makeDominator(bpphs)
396+
397+ for bpph, status in zip(bpphs, PackagePublishingStatus.items):
398+ bpph.status = status
399+
400+ # These are the Published publications. The other ones will all
401+ # be ignored.
402+ published_bpphs = [
403+ bpph
404+ for bpph in bpphs
405+ if bpph.status == PackagePublishingStatus.PUBLISHED]
406+
407+ self.assertContentEqual(
408+ published_bpphs,
409+ dominator.findBinariesForDomination(
410+ bpphs[0].distroarchseries, bpphs[0].pocket))
411+
412+ def test_findSourcesForDomination_finds_published_publications(self):
413+ spphs = make_spphs_for_versions(self.factory, ['2.0', '2.1'])
414+ dominator = self.makeDominator(spphs)
415+ self.assertContentEqual(
416+ spphs, dominator.findSourcesForDomination(
417+ spphs[0].distroseries, spphs[0].pocket))
418+
419+ def test_findSourcesForDomination_skips_single_pub_packages(self):
420+ # The domination algorithm that uses findSourcesForDomination
421+ # always keeps the latest version live. Thus, a single
422+ # publication isn't worth dominating. findSourcesForDomination
423+ # won't return it.
424+ spphs = make_spphs_for_versions(self.factory, ['2.0'])
425+ dominator = self.makeDominator(spphs)
426+ self.assertContentEqual(
427+ [], dominator.findSourcesForDomination(
428+ spphs[0].distroseries, spphs[0].pocket))
429+
430+ def test_findSourcesForDomination_ignores_other_distroseries(self):
431+ spphs = make_spphs_for_versions(self.factory, ['2.0', '2.1'])
432+ dominator = self.makeDominator(spphs)
433+ other_series = self.factory.makeDistroSeries(
434+ distribution=spphs[0].distroseries.distribution)
435+ self.assertContentEqual(
436+ [], dominator.findSourcesForDomination(
437+ other_series, spphs[0].pocket))
438+
439+ def test_findSourcesForDomination_ignores_other_pocket(self):
440+ spphs = make_spphs_for_versions(self.factory, ['2.0', '2.1'])
441+ dominator = self.makeDominator(spphs)
442+ for spph in spphs:
443+ removeSecurityProxy(spph).pocket = PackagePublishingPocket.UPDATES
444+ self.assertContentEqual(
445+ [], dominator.findSourcesForDomination(
446+ spphs[0].distroseries, PackagePublishingPocket.SECURITY))
447+
448+ def test_findSourcesForDomination_ignores_other_status(self):
449+ versions = [
450+ '1.%d' % self.factory.getUniqueInteger()
451+ for status in PackagePublishingStatus.items] + ['0.9']
452+ spphs = make_spphs_for_versions(self.factory, versions)
453+ dominator = self.makeDominator(spphs)
454+
455+ for spph, status in zip(spphs, PackagePublishingStatus.items):
456+ spph.status = status
457+
458+ # These are the Published publications. The other ones will all
459+ # be ignored.
460+ published_spphs = [
461+ spph
462+ for spph in spphs
463+ if spph.status == PackagePublishingStatus.PUBLISHED]
464+
465+ self.assertContentEqual(
466+ published_spphs,
467+ dominator.findSourcesForDomination(
468+ spphs[0].distroseries, spphs[0].pocket))
469
470=== modified file 'lib/lp/soyuz/model/publishing.py'
471--- lib/lp/soyuz/model/publishing.py 2011-10-23 02:58:56 +0000
472+++ lib/lp/soyuz/model/publishing.py 2011-11-02 10:31:27 +0000
473@@ -33,7 +33,6 @@
474 Desc,
475 LeftJoin,
476 Or,
477- Select,
478 Sum,
479 )
480 from storm.store import Store
481@@ -1154,19 +1153,10 @@
482 # Avoid circular wotsits.
483 from lp.soyuz.model.binarypackagebuild import BinaryPackageBuild
484 from lp.soyuz.model.distroarchseries import DistroArchSeries
485- from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
486- source_select = Select(
487- SourcePackageRelease.id,
488- And(
489- BinaryPackageBuild.source_package_release_id ==
490- SourcePackageRelease.id,
491- BinaryPackageRelease.build == BinaryPackageBuild.id,
492- self.binarypackagereleaseID == BinaryPackageRelease.id,
493- ))
494+
495 pubs = [
496 BinaryPackageBuild.source_package_release_id ==
497- SourcePackageRelease.id,
498- SourcePackageRelease.id.is_in(source_select),
499+ self.binarypackagerelease.build.source_package_release_id,
500 BinaryPackageRelease.build == BinaryPackageBuild.id,
501 BinaryPackagePublishingHistory.binarypackagereleaseID ==
502 BinaryPackageRelease.id,