Merge lp:~jtv/launchpad/bug-884649-branch-2 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: j.c.sackett
Approved revision: no longer in the source branch.
Merged at revision: 14249
Proposed branch: lp:~jtv/launchpad/bug-884649-branch-2
Merge into: lp:launchpad
Prerequisite: lp:~jtv/launchpad/getBinariesForDomination-bulk
Diff against target: 676 lines (+325/-121)
2 files modified
lib/lp/archivepublisher/domination.py (+183/-93)
lib/lp/archivepublisher/tests/test_dominator.py (+142/-28)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-884649-branch-2
Reviewer Review Type Date Requested Status
Julian Edwards (community) Approve
j.c.sackett (community) Approve
Review via email: mp+81049@code.launchpad.net

This proposal supersedes a proposal from 2011-11-02.

Commit message

[r=jcsackett,julian-edwards] Move special-casing for publication liveness out of inner domination loop; specialize binary passes.

Description of the change

= Summary =

This is part of the Dominator optimization work needed for bug 884649.

Binary domination needs 2 passes. This is because publications for the "all" architecture, which are copied into each architecture, can't be superseded until all architecture-specific publications coming from the same source package release have been superseded. Only then can we decide whether arch-all publications can be superseded. This is because a binary package from the same source package release may still depend on the arch-all binary package, and may not have finished building yet.

The code for dealing with this subtlety is a bit hard to place, because it's deep down in the inner loop as a special case. The special case is applied to both binary-domination passes, although there's really no point: the first pass may conceivably supersede a few arch-all publications but it won't deal with all of them. Meanwhile we pay a hefty price for the extra check.

== Proposed fix ==

Hoist the decision of which publications must stay live out of dominatePackage. It's a policy decision for the caller; the API lets the caller specify which versions stay live.

With that, we can specialize the passes more, and cut out complexity that is slowing things down.

You'll see three new functions explaining exactly what liveness criteria each domination pass uses. The source pass is easy. The first binary pass deals with architecture-specific publications; arch-all ones must be considered, but they are simply kept alive. And once all architectures have gone through first-pass binary domination, superseding as many arch-specific publications as possible, the second binary-domination pass considers just what needs to happen to those arch-all publications.

== Pre-implementation notes ==

Discussed in detail with Julian; see bug 884649 for the plan we're following in addressing this. We should also be able to re-use liveness decisions across architectures (and probably binary packages building from the same source package).

Raphaƫl suggested another clever improvement: Steve K. has just initialized a new, denormalized column on BinaryPackagePublishingHistory that contains the package name. That should enable a few shortcuts as well.

By the way, note that we can't just dominate arch-all and arch-specific publications separately. Packages may vacillate between the two, producing successive releases that need to be considered together even if domination of some publications may need to be postponed.

== Implementation details ==

I had to ditch _dominatePublications, because the discovery of live versions went into the middle of it. I reasoned that if removing it introduced too much code duplication, it could return with a callback for live-version discovery. But the loss turned out to be pretty small: it's really just a 3-line loop, including log output.

== Tests ==

{{{
./bin/test -vvc lp.archivepublisher.tests.test_dominator
}}}

== Demo and Q/A ==

Dominate Ubuntu and you'll see.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/domination.py
  lib/lp/soyuz/model/publishing.py
  lib/lp/archivepublisher/tests/test_dominator.py

To post a comment you must log in.
Revision history for this message
j.c.sackett (jcsackett) wrote :

Whew. Very extensive commenting, and thanks for it or I don't think I would have followed all of that!

This is some very complex work, but I think it looks good to move forward.

review: Approve
Revision history for this message
Julian Edwards (julian-edwards) wrote :
Download full text (26.2 KiB)

(I reviewed this several hours ago by email and only just noticed that LP rejected my email..GRRRRRRRRRRR)

Thanks for doing this Jeroen, it looks pretty good so far. I've just got a
few nits inline:

> === modified file 'lib/lp/archivepublisher/domination.py'
> --- lib/lp/archivepublisher/domination.py 2011-11-02 12:58:31 +0000
> +++ lib/lp/archivepublisher/domination.py 2011-11-03 10:12:17 +0000
> @@ -52,8 +52,12 @@
>
> __all__ = ['Dominator']
>
> +from collections import defaultdict
> from datetime import timedelta
> -from operator import itemgetter
> +from operator import (
> + attrgetter,
> + itemgetter,
> + )
>
> import apt_pkg
> from storm.expr import (
> @@ -72,6 +76,7 @@
> DecoratedResultSet,
> )
> from canonical.launchpad.interfaces.lpstorm import IStore
> +from canonical.launchpad.utilities.orderingcheck import OrderingCheck
> from lp.registry.model.sourcepackagename import SourcePackageName
> from lp.services.database.bulk import load_related
> from lp.soyuz.enums import (
> @@ -192,6 +197,104 @@
> else:
> return version_comparison
>
> + def sortPublications(self, publications):
> + """Sort publications from most to least current versions."""
> + # Listify; we want to iterate this twice, which won't do for a
> + # non-persistent sequence.
> + sorted_publications = list(publications)

This variable is prematurely named, I think I'd just write:
    publications = list(publications)
and return publications.sort(...) later.

> + # Batch-load associated package releases; we'll be needing them
> + # to compare versions.
> + self.load_releases(sorted_publications)
> + # Now sort. This is that second iteration. An in-place sort
> + # won't hurt the original, because we're working on a copy of
> + # the original iterable.
> + sorted_publications.sort(cmp=self.compare, reverse=True)
> + return sorted_publications
> +
> +
> +def find_live_source_versions(publications):
> + """Find versions out of Published `publications` that should stay live.
> +
> + This particular notion of liveness applies to source domination: the
> + latest version stays live, and that's it.
> +
> + :param publications: An iterable of `SourcePackagePublishingHistory`
> + sorted by descending package version.
> + :return: A list of live versions.
> + """
> + # Given the required sort order, the latest version is at the head
> + # of the list.
> + return [publications[0].sourcepackagerelease.version]

Nicely abstracted.

> +
> +
> +def get_binary_versions(binary_publications):
> + """List versions for sequence of `BinaryPackagePublishingHistory`."""
> + return [pub.binarypackagerelease.version for pub in
binary_publications]

If you're going to do :param: and :return: for the other functions, you
should do it here too!

> +
> +
> +def find_live_binary_versions_pass_1(publications):
> + """Find versions out of Published `publications` that should stay live.
> +
> + This particular notion of liveness applies to first-pass binary
> + domination: the latest version stays l...

review: Approve
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Jon, thanks for taking on this very tough review. It's a crying shame that Julian had also reviewed it, but the extra pair of eyeballs is still appreciated. Even if Julian knows more of the innards, it's also great to hear that someone less familiar with the intricacies of Soyuz and the archive publisher can make sense at least some of this.

Let's not do this too much though or you'll get sucked into Soyuz-related work like I did. :)

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :
Download full text (11.4 KiB)

> Thanks for doing this Jeroen, it looks pretty good so far. I've just got a
> few nits inline:

Ah, one of those reviews that actually make my code better. Good!

> > === modified file 'lib/lp/archivepublisher/domination.py'
> > --- lib/lp/archivepublisher/domination.py 2011-11-02 12:58:31 +0000
> > +++ lib/lp/archivepublisher/domination.py 2011-11-03 10:12:17 +0000

> > @@ -192,6 +197,104 @@
> > else:
> > return version_comparison
> >
> > + def sortPublications(self, publications):
> > + """Sort publications from most to least current versions."""
> > + # Listify; we want to iterate this twice, which won't do for a
> > + # non-persistent sequence.
> > + sorted_publications = list(publications)
>
> This variable is prematurely named, I think I'd just write:
> publications = list(publications)
> and return publications.sort(...) later.

Yup, you're right. Done.

(Apart from one nasty little subtlety: surprisingly, list.sort does not return the sorted list. I suppose that's to avoid obscuring the fact that it modifies the list in-place.)

> > +def get_binary_versions(binary_publications):
> > + """List versions for sequence of `BinaryPackagePublishingHistory`."""
> > + return [pub.binarypackagerelease.version for pub in
> binary_publications]
>
> If you're going to do :param: and :return: for the other functions, you
> should do it here too!

Hoist by my own pedantard. Done.

> > +def find_live_binary_versions_pass_1(publications):
> > + """Find versions out of Published `publications` that should stay live.
> > +
> > + This particular notion of liveness applies to first-pass binary
> > + domination: the latest version stays live, and so do publications of
> > + binary packages for the "all" architecture.
> > +
> > + :param publications: An iterable of `BinaryPackagePublishingHistory`,
> > + sorted by descending package version.
> > + :return: A list of live versions.
> > + """
> > + publications = list(publications)
> > + latest = publications.pop(0)
> > + return get_binary_versions(
> > + [latest] + [
> > + pub for pub in publications if not pub.architecture_specific])
>
> pub.architecture_specific masks a query, in case you didn't know. I
> absolutely detest properties that do that, it encourages SQL death-by-a-
> thousand-cuts.

I spotted it through sheer luck. It's just dereferencing a Storm reference though, and I took care to bulk-load the referenced BinaryPackageReleases.

> > +def find_live_binary_versions_pass_2(publications):
> > + """Find versions out of Published `publications` that should stay live.
> > +
> > + This particular notion of liveness applies to second-pass binary
> > + domination: the latest version stays live, and architecture-specific
> > + publications stay live (i.e, ones that are not for the "all"
> > + architecture).
> > +
> > + More importantly, any publication for binary packages of the "all"
> > + architecture stay live if any of the non-"all" binary packages from
> > + the same source package release are still active -- even if they are
> > + f...

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Friday 04 November 2011 03:58:24 you wrote:
> > Thanks for doing this Jeroen, it looks pretty good so far. I've just
> > got a
> > few nits inline:
> Ah, one of those reviews that actually make my code better. Good!

You asked for it!

> (Apart from one nasty little subtlety: surprisingly, list.sort does not
> return the sorted list. I suppose that's to avoid obscuring the fact that
> it modifies the list in-place.)

Right, that's annoying.

> That's what I keep thinking too. But the python documentation says
> otherwise. Perhaps it was some other massively useful function that got
> deprecated, or perhaps people changed their minds.
>
> Anyway, looking this up also made me find ifilter and ifilterfalse (in
> itertools). With those, I can finally say what I need here tersely and
> uniformly:
>
> arch_specific_pubs = list(ifilter(is_arch_specific, publications))
> arch_indep_pubs = list(ifilterfalse(is_arch_specific, publications))

I had not seen those before, very nice.

It would serve us some use to trawl through the Python built-ins at some
point!

> It's not very costly compared to inevitable work we're already doing: for a
> list of n publications, this does n-1 version comparisons. That's
> guaranteed to be no more than the sort operation that produced the
> publications list in the first place.
>
> The loop depends very heavily on the sort order that this verifies, and
> there are several call sites that do the sorting. It'd be nice if we could
> just re-sort to make sure of the right order, but besides being more
> expensive that also wouldn't help the liveness functions which also rely on
> sort order.
>
> Given how important it is to get this right and how easy it is to get it
> wrong, I think verifying the order is worth the small added cost.

I agree that it's important. I have an additional suggestion - rename the
method parameter so it's "sorted_publications" and also update the docstring
to say it *must* be sorted.

> Done. But, er... does this mean I'm condemned to another year of this? :-)

You might think so but I could not possibly comment.

Thanks!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/archivepublisher/domination.py'
--- lib/lp/archivepublisher/domination.py 2011-11-02 12:58:31 +0000
+++ lib/lp/archivepublisher/domination.py 2011-11-04 04:07:32 +0000
@@ -52,8 +52,16 @@
5252
53__all__ = ['Dominator']53__all__ = ['Dominator']
5454
55from collections import defaultdict
55from datetime import timedelta56from datetime import timedelta
56from operator import itemgetter57from itertools import (
58 ifilter,
59 ifilterfalse,
60 )
61from operator import (
62 attrgetter,
63 itemgetter,
64 )
5765
58import apt_pkg66import apt_pkg
59from storm.expr import (67from storm.expr import (
@@ -72,6 +80,7 @@
72 DecoratedResultSet,80 DecoratedResultSet,
73 )81 )
74from canonical.launchpad.interfaces.lpstorm import IStore82from canonical.launchpad.interfaces.lpstorm import IStore
83from canonical.launchpad.utilities.orderingcheck import OrderingCheck
75from lp.registry.model.sourcepackagename import SourcePackageName84from lp.registry.model.sourcepackagename import SourcePackageName
76from lp.services.database.bulk import load_related85from lp.services.database.bulk import load_related
77from lp.soyuz.enums import (86from lp.soyuz.enums import (
@@ -192,6 +201,107 @@
192 else:201 else:
193 return version_comparison202 return version_comparison
194203
204 def sortPublications(self, publications):
205 """Sort publications from most to least current versions."""
206 # Listify; we want to iterate this twice, which won't do for a
207 # non-persistent sequence.
208 publications = list(publications)
209 # Batch-load associated package releases; we'll be needing them
210 # to compare versions.
211 self.load_releases(publications)
212 # Now sort. This is that second iteration. An in-place sort
213 # won't hurt the original, because we're working on a copy of
214 # the original iterable.
215 publications.sort(cmp=self.compare, reverse=True)
216 return publications
217
218
219def find_live_source_versions(publications):
220 """Find versions out of Published `publications` that should stay live.
221
222 This particular notion of liveness applies to source domination: the
223 latest version stays live, and that's it.
224
225 :param publications: An iterable of `SourcePackagePublishingHistory`
226 sorted by descending package version.
227 :return: A list of live versions.
228 """
229 # Given the required sort order, the latest version is at the head
230 # of the list.
231 return [publications[0].sourcepackagerelease.version]
232
233
234def get_binary_versions(binary_publications):
235 """List versions for sequence of `BinaryPackagePublishingHistory`.
236
237 :param binary_publications: An iterable of
238 `BinaryPackagePublishingHistory`.
239 :return: A list of the publications' respective versions.
240 """
241 return [pub.binarypackagerelease.version for pub in binary_publications]
242
243
244def find_live_binary_versions_pass_1(publications):
245 """Find versions out of Published `publications` that should stay live.
246
247 This particular notion of liveness applies to first-pass binary
248 domination: the latest version stays live, and so do publications of
249 binary packages for the "all" architecture.
250
251 :param publications: An iterable of `BinaryPackagePublishingHistory`,
252 sorted by descending package version.
253 :return: A list of live versions.
254 """
255 publications = list(publications)
256 latest = publications.pop(0)
257 return get_binary_versions(
258 [latest] + [
259 pub for pub in publications if not pub.architecture_specific])
260
261
262def find_live_binary_versions_pass_2(publications):
263 """Find versions out of Published `publications` that should stay live.
264
265 This particular notion of liveness applies to second-pass binary
266 domination: the latest version stays live, and architecture-specific
267 publications stay live (i.e, ones that are not for the "all"
268 architecture).
269
270 More importantly, any publication for binary packages of the "all"
271 architecture stay live if any of the non-"all" binary packages from
272 the same source package release are still active -- even if they are
273 for other architectures.
274
275 This is the raison d'etre for the two-pass binary domination algorithm:
276 to let us see which architecture-independent binary publications can be
277 superseded without rendering any architecture-specific binaries from the
278 same source package release uninstallable.
279
280 (Note that here, "active" includes Published publications but also
281 Pending ones. This is standard nomenclature in Soyuz. Some of the
282 domination code confuses matters by using the term "active" to mean only
283 Published publications).
284
285 :param publications: An iterable of `BinaryPackagePublishingHistory`,
286 sorted by descending package version.
287 :return: A list of live versions.
288 """
289 publications = list(publications)
290 latest = publications.pop(0)
291 is_arch_specific = attrgetter('architecture_specific')
292 arch_specific_pubs = list(ifilter(is_arch_specific, publications))
293 arch_indep_pubs = list(ifilterfalse(is_arch_specific, publications))
294
295 # XXX JeroenVermeulen 2011-11-01 bug=884649: This is likely to be
296 # costly, and the result could be reused for all builds of the same
297 # source package release to all architectures.
298 reprieved_pubs = [
299 pub
300 for pub in arch_indep_pubs
301 if pub.getOtherPublicationsForSameSource().any()]
302
303 return get_binary_versions([latest] + arch_specific_pubs + reprieved_pubs)
304
195305
196class Dominator:306class Dominator:
197 """Manage the process of marking packages as superseded.307 """Manage the process of marking packages as superseded.
@@ -209,27 +319,6 @@
209 self.logger = logger319 self.logger = logger
210 self.archive = archive320 self.archive = archive
211321
212 def _checkArchIndep(self, publication):
213 """Return True if the binary publication can be superseded.
214
215 If the publication is an arch-indep binary, we can only supersede
216 it if all the binaries from the same source are also superseded,
217 else those binaries may become uninstallable.
218 See bug 34086.
219 """
220 binary = publication.binarypackagerelease
221 if not binary.architecturespecific:
222 # getOtherPublicationsForSameSource returns PENDING in
223 # addition to PUBLISHED binaries, and we rely on this since
224 # they must also block domination.
225 others = publication.getOtherPublicationsForSameSource()
226 if others.any():
227 # Don't dominate this arch:all binary as there are
228 # other arch-specific binaries from the same build
229 # that are still active.
230 return False
231 return True
232
233 def dominatePackage(self, publications, live_versions, generalization):322 def dominatePackage(self, publications, live_versions, generalization):
234 """Dominate publications for a single package.323 """Dominate publications for a single package.
235324
@@ -247,34 +336,33 @@
247336
248 :param publications: Iterable of publications for the same package,337 :param publications: Iterable of publications for the same package,
249 in the same archive, series, and pocket, all with status338 in the same archive, series, and pocket, all with status
250 `PackagePublishingStatus.PUBLISHED`.339 `PackagePublishingStatus.PUBLISHED`. They must be sorted from
251 :param live_versions: Iterable of version strings that are still340 most current to least current, as would be the result of
252 considered live for this package. The given publications will341 `generalization.sortPublications`.
253 remain active insofar as they represent any of these versions;342 :param live_versions: Iterable of versions that are still considered
254 older publications will be marked as superseded and newer ones343 "live" for this package. For any of these, the latest publication
255 as deleted.344 among `publications` will remain Published. Publications for
345 older releases, as well as older publications of live versions,
346 will be marked as Superseded. Publications of newer versions than
347 are listed in `live_versions` are marked as Deleted.
256 :param generalization: A `GeneralizedPublication` helper representing348 :param generalization: A `GeneralizedPublication` helper representing
257 the kind of publications these are--source or binary.349 the kind of publications these are: source or binary.
258 """350 """
259 publications = list(publications)351 live_versions = frozenset(live_versions)
260 generalization.load_releases(publications)
261
262 # Go through publications from latest version to oldest. This
263 # makes it easy to figure out which release superseded which:
264 # the dominant is always the oldest live release that is newer
265 # than the one being superseded. In this loop, that means the
266 # dominant is always the last live publication we saw.
267 publications = sorted(
268 publications, cmp=generalization.compare, reverse=True)
269352
270 self.logger.debug(353 self.logger.debug(
271 "Package has %d live publication(s). Live versions: %s",354 "Package has %d live publication(s). Live versions: %s",
272 len(publications), live_versions)355 len(publications), live_versions)
273356
357 # Verify that the publications are really sorted properly.
358 check_order = OrderingCheck(cmp=generalization.compare, reverse=True)
359
274 current_dominant = None360 current_dominant = None
275 dominant_version = None361 dominant_version = None
276362
277 for pub in publications:363 for pub in publications:
364 check_order.check(pub)
365
278 version = generalization.getPackageVersion(pub)366 version = generalization.getPackageVersion(pub)
279 # There should never be two published releases with the same367 # There should never be two published releases with the same
280 # version. So it doesn't matter whether this comparison is368 # version. So it doesn't matter whether this comparison is
@@ -295,11 +383,6 @@
295 current_dominant = pub383 current_dominant = pub
296 dominant_version = version384 dominant_version = version
297 self.logger.debug2("Keeping version %s.", version)385 self.logger.debug2("Keeping version %s.", version)
298 elif not (generalization.is_source or self._checkArchIndep(pub)):
299 # As a special case, we keep this version live as well.
300 current_dominant = pub
301 dominant_version = version
302 self.logger.debug2("Keeping version %s.", version)
303 elif current_dominant is None:386 elif current_dominant is None:
304 # This publication is no longer live, but there is no387 # This publication is no longer live, but there is no
305 # newer version to supersede it either. Therefore it388 # newer version to supersede it either. Therefore it
@@ -312,50 +395,32 @@
312 pub.supersede(current_dominant, logger=self.logger)395 pub.supersede(current_dominant, logger=self.logger)
313 self.logger.debug2("Superseding version %s.", version)396 self.logger.debug2("Superseding version %s.", version)
314397
315 def _dominatePublications(self, pubs, generalization):398 def _sortPackages(self, publications, generalization):
316 """Perform dominations for the given publications.399 """Partition publications by package name, and sort them.
317400
318 Keep the latest published version for each package active,401 The publications are sorted from most current to least current,
319 superseding older versions.402 as required by `dominatePackage` etc.
320403
321 :param pubs: A dict mapping names to a list of publications. Every404 :param publications: An iterable of `SourcePackagePublishingHistory`
322 publication must be PUBLISHED or PENDING, and the first in each405 or of `BinaryPackagePublishingHistory`.
323 list will be treated as dominant (so should be the latest).406 :param generalization: A `GeneralizedPublication` helper representing
324 :param generalization: A `GeneralizedPublication` helper representing407 the kind of publications these are: source or binary.
325 the kind of publications these are--source or binary.408 :return: A dict mapping each package name to a sorted list of
326 """409 publications from `publications`.
327 self.logger.debug("Dominating packages...")410 """
328 for name, publications in pubs.iteritems():411 pubs_by_package = defaultdict(list)
329 assert publications, "Empty list of publications for %s." % name412 for pub in publications:
330 # Since this always picks the latest version as the live413 pubs_by_package[generalization.getPackageName(pub)].append(pub)
331 # one, this dominatePackage call will never result in a414
332 # deletion.415 # Sort the publication lists. This is not an in-place sort, so
333 latest_version = generalization.getPackageVersion(publications[0])416 # it involves altering the dict while we iterate it. Listify
334 self.logger.debug2("Dominating %s" % name)417 # the keys so that we can be sure that we're not altering the
335 self.dominatePackage(418 # iteration order while iteration is underway.
336 publications, [latest_version], generalization)419 for package in list(pubs_by_package.keys()):
337420 pubs_by_package[package] = generalization.sortPublications(
338 def _sortPackages(self, pkglist, generalization):421 pubs_by_package[package])
339 """Map out packages by name, and sort by descending version.422
340423 return pubs_by_package
341 :param pkglist: An iterable of `SourcePackagePublishingHistory` or
342 `BinaryPackagePublishingHistory`.
343 :param generalization: A `GeneralizedPublication` helper representing
344 the kind of publications these are--source or binary.
345 :return: A dict mapping each package name to a list of publications
346 from `pkglist`, newest first.
347 """
348 self.logger.debug("Sorting packages...")
349
350 outpkgs = {}
351 for inpkg in pkglist:
352 key = generalization.getPackageName(inpkg)
353 outpkgs.setdefault(key, []).append(inpkg)
354
355 for package_pubs in outpkgs.itervalues():
356 package_pubs.sort(cmp=generalization.compare, reverse=True)
357
358 return outpkgs
359424
360 def _setScheduledDeletionDate(self, pub_record):425 def _setScheduledDeletionDate(self, pub_record):
361 """Set the scheduleddeletiondate on a publishing record.426 """Set the scheduleddeletiondate on a publishing record.
@@ -520,7 +585,11 @@
520 bins = self.findBinariesForDomination(distroarchseries, pocket)585 bins = self.findBinariesForDomination(distroarchseries, pocket)
521 sorted_packages = self._sortPackages(bins, generalization)586 sorted_packages = self._sortPackages(bins, generalization)
522 self.logger.info("Dominating binaries...")587 self.logger.info("Dominating binaries...")
523 self._dominatePublications(sorted_packages, generalization)588 for name, pubs in sorted_packages.iteritems():
589 self.logger.debug("Dominating %s" % name)
590 assert len(pubs) > 0, "Dominating zero binaries!"
591 live_versions = find_live_binary_versions_pass_1(pubs)
592 self.dominatePackage(pubs, live_versions, generalization)
524593
525 # We need to make a second pass to cover the cases where:594 # We need to make a second pass to cover the cases where:
526 # * arch-specific binaries were not all dominated before arch-all595 # * arch-specific binaries were not all dominated before arch-all
@@ -534,7 +603,11 @@
534 bins = self.findBinariesForDomination(distroarchseries, pocket)603 bins = self.findBinariesForDomination(distroarchseries, pocket)
535 sorted_packages = self._sortPackages(bins, generalization)604 sorted_packages = self._sortPackages(bins, generalization)
536 self.logger.info("Dominating binaries...(2nd pass)")605 self.logger.info("Dominating binaries...(2nd pass)")
537 self._dominatePublications(sorted_packages, generalization)606 for name, pubs in sorted_packages.iteritems():
607 self.logger.debug("Dominating %s" % name)
608 assert len(pubs) > 0, "Dominating zero binaries in 2nd pass!"
609 live_versions = find_live_binary_versions_pass_2(pubs)
610 self.dominatePackage(pubs, live_versions, generalization)
538611
539 def _composeActiveSourcePubsCondition(self, distroseries, pocket):612 def _composeActiveSourcePubsCondition(self, distroseries, pocket):
540 """Compose ORM condition for restricting relevant source pubs."""613 """Compose ORM condition for restricting relevant source pubs."""
@@ -550,7 +623,16 @@
550 )623 )
551624
552 def findSourcesForDomination(self, distroseries, pocket):625 def findSourcesForDomination(self, distroseries, pocket):
553 """Find binary publications that need dominating."""626 """Find binary publications that need dominating.
627
628 This is only for traditional domination, where the latest published
629 publication is always kept published. See `find_live_source_versions`
630 for this logic.
631
632 To optimize for that logic, `findSourcesForDomination` will ignore
633 publications that have no other publications competing for the same
634 binary package. There'd be nothing to do for those cases.
635 """
554 # Avoid circular imports.636 # Avoid circular imports.
555 from lp.soyuz.model.publishing import SourcePackagePublishingHistory637 from lp.soyuz.model.publishing import SourcePackagePublishingHistory
556638
@@ -589,11 +671,18 @@
589 distroseries.name, pocket.title)671 distroseries.name, pocket.title)
590672
591 generalization = GeneralizedPublication(is_source=True)673 generalization = GeneralizedPublication(is_source=True)
674
675 self.logger.debug("Finding sources...")
592 sources = self.findSourcesForDomination(distroseries, pocket)676 sources = self.findSourcesForDomination(distroseries, pocket)
677 sorted_packages = self._sortPackages(sources, generalization)
593678
594 self.logger.debug("Dominating sources...")679 self.logger.debug("Dominating sources...")
595 self._dominatePublications(680 for name, pubs in sorted_packages.iteritems():
596 self._sortPackages(sources, generalization), generalization)681 self.logger.debug("Dominating %s" % name)
682 assert len(pubs) > 0, "Dominating zero sources!"
683 live_versions = find_live_source_versions(pubs)
684 self.dominatePackage(pubs, live_versions, generalization)
685
597 flush_database_updates()686 flush_database_updates()
598687
599 def findPublishedSourcePackageNames(self, distroseries, pocket):688 def findPublishedSourcePackageNames(self, distroseries, pocket):
@@ -653,6 +742,7 @@
653 """742 """
654 generalization = GeneralizedPublication(is_source=True)743 generalization = GeneralizedPublication(is_source=True)
655 pubs = self.findPublishedSPPHs(distroseries, pocket, package_name)744 pubs = self.findPublishedSPPHs(distroseries, pocket, package_name)
745 pubs = generalization.sortPublications(pubs)
656 self.dominatePackage(pubs, live_versions, generalization)746 self.dominatePackage(pubs, live_versions, generalization)
657747
658 def judge(self, distroseries, pocket):748 def judge(self, distroseries, pocket):
659749
=== modified file 'lib/lp/archivepublisher/tests/test_dominator.py'
--- lib/lp/archivepublisher/tests/test_dominator.py 2011-11-02 10:28:31 +0000
+++ lib/lp/archivepublisher/tests/test_dominator.py 2011-11-04 04:07:32 +0000
@@ -16,6 +16,9 @@
16from canonical.testing.layers import ZopelessDatabaseLayer16from canonical.testing.layers import ZopelessDatabaseLayer
17from lp.archivepublisher.domination import (17from lp.archivepublisher.domination import (
18 Dominator,18 Dominator,
19 find_live_binary_versions_pass_1,
20 find_live_binary_versions_pass_2,
21 find_live_source_versions,
19 GeneralizedPublication,22 GeneralizedPublication,
20 STAY_OF_EXECUTION,23 STAY_OF_EXECUTION,
21 )24 )
@@ -30,6 +33,7 @@
30 StormStatementRecorder,33 StormStatementRecorder,
31 TestCaseWithFactory,34 TestCaseWithFactory,
32 )35 )
36from lp.testing.fakemethod import FakeMethod
33from lp.testing.matchers import HasQueryCount37from lp.testing.matchers import HasQueryCount
3438
3539
@@ -72,13 +76,9 @@
72 is_source=ISourcePackagePublishingHistory.providedBy(dominant))76 is_source=ISourcePackagePublishingHistory.providedBy(dominant))
73 dominator = Dominator(self.logger, self.ubuntutest.main_archive)77 dominator = Dominator(self.logger, self.ubuntutest.main_archive)
7478
75 # The _dominate* test methods require a dictionary where the79 pubs = [dominant, dominated]
76 # package name is the key. The key's value is a list of80 live_versions = [generalization.getPackageVersion(dominant)]
77 # source or binary packages representing dominant, the first element81 dominator.dominatePackage(pubs, live_versions, generalization)
78 # and dominated, the subsequents.
79 pubs = {'foo': [dominant, dominated]}
80
81 dominator._dominatePublications(pubs, generalization)
82 flush_database_updates()82 flush_database_updates()
8383
84 # The dominant version remains correctly published.84 # The dominant version remains correctly published.
@@ -158,16 +158,30 @@
158 [foo_10_source] + foo_10_binaries,158 [foo_10_source] + foo_10_binaries,
159 PackagePublishingStatus.SUPERSEDED)159 PackagePublishingStatus.SUPERSEDED)
160160
161 def testEmptyDomination(self):161 def test_dominateBinaries_rejects_empty_publication_list(self):
162 """Domination asserts for not empty input list."""162 """Domination asserts for non-empty input list."""
163 dominator = Dominator(self.logger, self.ubuntutest.main_archive)163 package = self.factory.makeBinaryPackageName()
164 pubs = {'foo': []}164 dominator = Dominator(self.logger, self.ubuntutest.main_archive)
165 # This isn't a really good exception. It should probably be165 dominator._sortPackages = FakeMethod({package.name: []})
166 # something more indicative of bad input.166 # This isn't a really good exception. It should probably be
167 self.assertRaises(167 # something more indicative of bad input.
168 AssertionError,168 self.assertRaises(
169 dominator._dominatePublications,169 AssertionError,
170 pubs, GeneralizedPublication(True))170 dominator.dominateBinaries,
171 self.factory.makeDistroArchSeries().distroseries,
172 self.factory.getAnyPocket())
173
174 def test_dominateSources_rejects_empty_publication_list(self):
175 """Domination asserts for non-empty input list."""
176 package = self.factory.makeSourcePackageName()
177 dominator = Dominator(self.logger, self.ubuntutest.main_archive)
178 dominator._sortPackages = FakeMethod({package.name: []})
179 # This isn't a really good exception. It should probably be
180 # something more indicative of bad input.
181 self.assertRaises(
182 AssertionError,
183 dominator.dominateSources,
184 self.factory.makeDistroSeries(), self.factory.getAnyPocket())
171185
172 def test_archall_domination(self):186 def test_archall_domination(self):
173 # Arch-all binaries should not be dominated when a new source187 # Arch-all binaries should not be dominated when a new source
@@ -358,6 +372,16 @@
358 SeriesStatus.OBSOLETE)372 SeriesStatus.OBSOLETE)
359373
360374
375def remove_security_proxies(proxied_objects):
376 """Return list of `proxied_objects`, without their proxies.
377
378 The dominator runs only in scripts, where security proxies don't get
379 in the way. To test realistically for this environment, strip the
380 proxies wherever necessary and do as you will.
381 """
382 return [removeSecurityProxy(obj) for obj in proxied_objects]
383
384
361def make_spphs_for_versions(factory, versions):385def make_spphs_for_versions(factory, versions):
362 """Create publication records for each of `versions`.386 """Create publication records for each of `versions`.
363387
@@ -400,14 +424,15 @@
400 archive = das.distroseries.main_archive424 archive = das.distroseries.main_archive
401 pocket = factory.getAnyPocket()425 pocket = factory.getAnyPocket()
402 bprs = [426 bprs = [
403 factory.makeBinaryPackageRelease(binarypackagename=bpn)427 factory.makeBinaryPackageRelease(
428 binarypackagename=bpn, version=version)
404 for version in versions]429 for version in versions]
405 return [430 return remove_security_proxies([
406 factory.makeBinaryPackagePublishingHistory(431 factory.makeBinaryPackagePublishingHistory(
407 binarypackagerelease=bpr, binarypackagename=bpn,432 binarypackagerelease=bpr, binarypackagename=bpn,
408 distroarchseries=das, pocket=pocket, archive=archive,433 distroarchseries=das, pocket=pocket, archive=archive,
409 sourcepackagename=spn, status=PackagePublishingStatus.PUBLISHED)434 sourcepackagename=spn, status=PackagePublishingStatus.PUBLISHED)
410 for bpr in bprs]435 for bpr in bprs])
411436
412437
413def list_source_versions(spphs):438def list_source_versions(spphs):
@@ -591,9 +616,10 @@
591 def test_dominatePackage_supersedes_older_pub_with_newer_live_pub(self):616 def test_dominatePackage_supersedes_older_pub_with_newer_live_pub(self):
592 # When marking a package as superseded, dominatePackage617 # When marking a package as superseded, dominatePackage
593 # designates a newer live version as the superseding version.618 # designates a newer live version as the superseding version.
619 generalization = GeneralizedPublication(True)
594 pubs = make_spphs_for_versions(self.factory, ['1.0', '1.1'])620 pubs = make_spphs_for_versions(self.factory, ['1.0', '1.1'])
595 self.makeDominator(pubs).dominatePackage(621 self.makeDominator(pubs).dominatePackage(
596 pubs, ['1.1'], GeneralizedPublication(True))622 generalization.sortPublications(pubs), ['1.1'], generalization)
597 self.assertEqual(PackagePublishingStatus.SUPERSEDED, pubs[0].status)623 self.assertEqual(PackagePublishingStatus.SUPERSEDED, pubs[0].status)
598 self.assertEqual(pubs[1].sourcepackagerelease, pubs[0].supersededby)624 self.assertEqual(pubs[1].sourcepackagerelease, pubs[0].supersededby)
599 self.assertEqual(PackagePublishingStatus.PUBLISHED, pubs[1].status)625 self.assertEqual(PackagePublishingStatus.PUBLISHED, pubs[1].status)
@@ -601,10 +627,11 @@
601 def test_dominatePackage_only_supersedes_with_live_pub(self):627 def test_dominatePackage_only_supersedes_with_live_pub(self):
602 # When marking a package as superseded, dominatePackage will628 # When marking a package as superseded, dominatePackage will
603 # only pick a live version as the superseding one.629 # only pick a live version as the superseding one.
630 generalization = GeneralizedPublication(True)
604 pubs = make_spphs_for_versions(631 pubs = make_spphs_for_versions(
605 self.factory, ['1.0', '2.0', '3.0', '4.0'])632 self.factory, ['1.0', '2.0', '3.0', '4.0'])
606 self.makeDominator(pubs).dominatePackage(633 self.makeDominator(pubs).dominatePackage(
607 pubs, ['3.0'], GeneralizedPublication(True))634 generalization.sortPublications(pubs), ['3.0'], generalization)
608 self.assertEqual([635 self.assertEqual([
609 pubs[2].sourcepackagerelease,636 pubs[2].sourcepackagerelease,
610 pubs[2].sourcepackagerelease,637 pubs[2].sourcepackagerelease,
@@ -616,23 +643,27 @@
616 def test_dominatePackage_supersedes_with_oldest_newer_live_pub(self):643 def test_dominatePackage_supersedes_with_oldest_newer_live_pub(self):
617 # When marking a package as superseded, dominatePackage picks644 # When marking a package as superseded, dominatePackage picks
618 # the oldest of the newer, live versions as the superseding one.645 # the oldest of the newer, live versions as the superseding one.
646 generalization = GeneralizedPublication(True)
619 pubs = make_spphs_for_versions(self.factory, ['2.7', '2.8', '2.9'])647 pubs = make_spphs_for_versions(self.factory, ['2.7', '2.8', '2.9'])
620 self.makeDominator(pubs).dominatePackage(648 self.makeDominator(pubs).dominatePackage(
621 pubs, ['2.8', '2.9'], GeneralizedPublication(True))649 generalization.sortPublications(pubs), ['2.8', '2.9'],
650 generalization)
622 self.assertEqual(pubs[1].sourcepackagerelease, pubs[0].supersededby)651 self.assertEqual(pubs[1].sourcepackagerelease, pubs[0].supersededby)
623652
624 def test_dominatePackage_only_supersedes_with_newer_live_pub(self):653 def test_dominatePackage_only_supersedes_with_newer_live_pub(self):
625 # When marking a package as superseded, dominatePackage only654 # When marking a package as superseded, dominatePackage only
626 # considers a newer version as the superseding one.655 # considers a newer version as the superseding one.
656 generalization = GeneralizedPublication(True)
627 pubs = make_spphs_for_versions(self.factory, ['0.1', '0.2'])657 pubs = make_spphs_for_versions(self.factory, ['0.1', '0.2'])
628 self.makeDominator(pubs).dominatePackage(658 self.makeDominator(pubs).dominatePackage(
629 pubs, ['0.1'], GeneralizedPublication(True))659 generalization.sortPublications(pubs), ['0.1'], generalization)
630 self.assertEqual(None, pubs[1].supersededby)660 self.assertEqual(None, pubs[1].supersededby)
631 self.assertEqual(PackagePublishingStatus.DELETED, pubs[1].status)661 self.assertEqual(PackagePublishingStatus.DELETED, pubs[1].status)
632662
633 def test_dominatePackage_supersedes_replaced_pub_for_live_version(self):663 def test_dominatePackage_supersedes_replaced_pub_for_live_version(self):
634 # Even if a publication record is for a live version, a newer664 # Even if a publication record is for a live version, a newer
635 # one for the same version supersedes it.665 # one for the same version supersedes it.
666 generalization = GeneralizedPublication(True)
636 spr = self.factory.makeSourcePackageRelease()667 spr = self.factory.makeSourcePackageRelease()
637 series = self.factory.makeDistroSeries()668 series = self.factory.makeDistroSeries()
638 pocket = PackagePublishingPocket.RELEASE669 pocket = PackagePublishingPocket.RELEASE
@@ -649,7 +680,8 @@
649 ])680 ])
650681
651 self.makeDominator(pubs).dominatePackage(682 self.makeDominator(pubs).dominatePackage(
652 pubs, [spr.version], GeneralizedPublication(True))683 generalization.sortPublications(pubs), [spr.version],
684 generalization)
653 self.assertEqual([685 self.assertEqual([
654 PackagePublishingStatus.SUPERSEDED,686 PackagePublishingStatus.SUPERSEDED,
655 PackagePublishingStatus.SUPERSEDED,687 PackagePublishingStatus.SUPERSEDED,
@@ -661,12 +693,13 @@
661693
662 def test_dominatePackage_is_efficient(self):694 def test_dominatePackage_is_efficient(self):
663 # dominatePackage avoids issuing too many queries.695 # dominatePackage avoids issuing too many queries.
696 generalization = GeneralizedPublication(True)
664 versions = ["1.%s" % revision for revision in xrange(5)]697 versions = ["1.%s" % revision for revision in xrange(5)]
665 pubs = make_spphs_for_versions(self.factory, versions)698 pubs = make_spphs_for_versions(self.factory, versions)
666 with StormStatementRecorder() as recorder:699 with StormStatementRecorder() as recorder:
667 self.makeDominator(pubs).dominatePackage(700 self.makeDominator(pubs).dominatePackage(
668 pubs, versions[2:-1],701 generalization.sortPublications(pubs), versions[2:-1],
669 GeneralizedPublication(True))702 generalization)
670 self.assertThat(recorder, HasQueryCount(LessThan(5)))703 self.assertThat(recorder, HasQueryCount(LessThan(5)))
671704
672 def test_dominatePackage_advanced_scenario(self):705 def test_dominatePackage_advanced_scenario(self):
@@ -677,6 +710,7 @@
677 # don't just patch up the code or this test. Create unit tests710 # don't just patch up the code or this test. Create unit tests
678 # that specifically cover the difference, then change the code711 # that specifically cover the difference, then change the code
679 # and/or adapt this test to return to harmony.712 # and/or adapt this test to return to harmony.
713 generalization = GeneralizedPublication(True)
680 series = self.factory.makeDistroSeries()714 series = self.factory.makeDistroSeries()
681 package = self.factory.makeSourcePackageName()715 package = self.factory.makeSourcePackageName()
682 pocket = PackagePublishingPocket.RELEASE716 pocket = PackagePublishingPocket.RELEASE
@@ -723,7 +757,8 @@
723757
724 all_pubs = sum(pubs_by_version.itervalues(), [])758 all_pubs = sum(pubs_by_version.itervalues(), [])
725 Dominator(DevNullLogger(), series.main_archive).dominatePackage(759 Dominator(DevNullLogger(), series.main_archive).dominatePackage(
726 all_pubs, live_versions, GeneralizedPublication(True))760 generalization.sortPublications(all_pubs), live_versions,
761 generalization)
727762
728 for version in reversed(versions):763 for version in reversed(versions):
729 pubs = pubs_by_version[version]764 pubs = pubs_by_version[version]
@@ -1089,3 +1124,82 @@
1089 published_spphs,1124 published_spphs,
1090 dominator.findSourcesForDomination(1125 dominator.findSourcesForDomination(
1091 spphs[0].distroseries, spphs[0].pocket))1126 spphs[0].distroseries, spphs[0].pocket))
1127
1128
1129def make_publications_arch_specific(pubs, arch_specific=True):
1130 """Set the `architecturespecific` attribute for given SPPHs.
1131
1132 :param pubs: An iterable of `BinaryPackagePublishingHistory`.
1133 :param arch_specific: Whether the binary package releases published
1134 by `pubs` are to be architecture-specific. If not, they will be
1135 treated as being for the "all" architecture.
1136 """
1137 for pub in pubs:
1138 bpr = removeSecurityProxy(pub).binarypackagerelease
1139 bpr.architecturespecific = arch_specific
1140
1141
1142class TestLivenessFunctions(TestCaseWithFactory):
1143 """Tests for the functions that say which versions are live."""
1144
1145 layer = ZopelessDatabaseLayer
1146
1147 def test_find_live_source_versions_blesses_latest(self):
1148 # find_live_source_versions, assuming that you passed it
1149 # publications sorted from most current to least current
1150 # version, simply returns the most current version.
1151 spphs = make_spphs_for_versions(self.factory, ['1.2', '1.1', '1.0'])
1152 self.assertEqual(['1.2'], find_live_source_versions(spphs))
1153
1154 def test_find_live_binary_versions_pass_1_blesses_latest(self):
1155 # find_live_binary_versions_pass_1 always includes the latest
1156 # version among the input publications in its result.
1157 bpphs = make_bpphs_for_versions(self.factory, ['1.2', '1.1', '1.0'])
1158 make_publications_arch_specific(bpphs)
1159 self.assertEqual(['1.2'], find_live_binary_versions_pass_1(bpphs))
1160
1161 def test_find_live_binary_versions_pass_1_blesses_arch_all(self):
1162 # find_live_binary_versions_pass_1 includes any
1163 # architecture-independent publications among the input in its
1164 # result.
1165 versions = list(reversed(['1.%d' % version for version in range(3)]))
1166 bpphs = make_bpphs_for_versions(self.factory, versions)
1167
1168 # All of these publications are architecture-specific, except
1169 # the last one. This would happen if the binary package had
1170 # just changed from being architecture-specific to being
1171 # architecture-independent.
1172 make_publications_arch_specific(bpphs, True)
1173 make_publications_arch_specific(bpphs[-1:], False)
1174 self.assertEqual(
1175 versions[:1] + versions[-1:],
1176 find_live_binary_versions_pass_1(bpphs))
1177
1178 def test_find_live_binary_versions_pass_2_blesses_latest(self):
1179 # find_live_binary_versions_pass_2 always includes the latest
1180 # version among the input publications in its result.
1181 bpphs = make_bpphs_for_versions(self.factory, ['1.2', '1.1', '1.0'])
1182 make_publications_arch_specific(bpphs, False)
1183 self.assertEqual(['1.2'], find_live_binary_versions_pass_2(bpphs))
1184
1185 def test_find_live_binary_versions_pass_2_blesses_arch_specific(self):
1186 # find_live_binary_versions_pass_2 includes any
1187 # architecture-specific publications among the input in its
1188 # result.
1189 versions = list(reversed(['1.%d' % version for version in range(3)]))
1190 bpphs = make_bpphs_for_versions(self.factory, versions)
1191 make_publications_arch_specific(bpphs)
1192 self.assertEqual(versions, find_live_binary_versions_pass_2(bpphs))
1193
1194 def test_find_live_binary_versions_pass_2_reprieves_arch_all(self):
1195 # An arch-all BPPH for a BPR built by an SPR that also still has
1196 # active arch-dependent BPPHs gets a reprieve: it can't be
1197 # superseded until those arch-dependent BPPHs have been
1198 # superseded.
1199 bpphs = make_bpphs_for_versions(self.factory, ['1.2', '1.1', '1.0'])
1200 make_publications_arch_specific(bpphs, False)
1201 dependent = self.factory.makeBinaryPackagePublishingHistory(
1202 binarypackagerelease=bpphs[1].binarypackagerelease)
1203 make_publications_arch_specific([dependent], True)
1204 self.assertEqual(
1205 ['1.2', '1.1'], find_live_binary_versions_pass_2(bpphs))