Merge lp:~jtv/launchpad/bug-884649-branch-2 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: j.c.sackett
Approved revision: no longer in the source branch.
Merged at revision: 14249
Proposed branch: lp:~jtv/launchpad/bug-884649-branch-2
Merge into: lp:launchpad
Prerequisite: lp:~jtv/launchpad/getBinariesForDomination-bulk
Diff against target: 676 lines (+325/-121)
2 files modified
lib/lp/archivepublisher/domination.py (+183/-93)
lib/lp/archivepublisher/tests/test_dominator.py (+142/-28)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-884649-branch-2
Reviewer Review Type Date Requested Status
Julian Edwards (community) Approve
j.c.sackett (community) Approve
Review via email: mp+81049@code.launchpad.net

This proposal supersedes a proposal from 2011-11-02.

Commit message

[r=jcsackett,julian-edwards] Move special-casing for publication liveness out of inner domination loop; specialize binary passes.

Description of the change

= Summary =

This is part of the Dominator optimization work needed for bug 884649.

Binary domination needs 2 passes. This is because publications for the "all" architecture, which are copied into each architecture, can't be superseded until all architecture-specific publications coming from the same source package release have been superseded. Only then can we decide whether arch-all publications can be superseded. This is because a binary package from the same source package release may still depend on the arch-all binary package, and may not have finished building yet.

The code for dealing with this subtlety is a bit hard to place, because it's deep down in the inner loop as a special case. The special case is applied to both binary-domination passes, although there's really no point: the first pass may conceivably supersede a few arch-all publications but it won't deal with all of them. Meanwhile we pay a hefty price for the extra check.

== Proposed fix ==

Hoist the decision of which publications must stay live out of dominatePackage. It's a policy decision for the caller; the API lets the caller specify which versions stay live.

With that, we can specialize the passes more, and cut out complexity that is slowing things down.

You'll see three new functions explaining exactly what liveness criteria each domination pass uses. The source pass is easy. The first binary pass deals with architecture-specific publications; arch-all ones must be considered, but they are simply kept alive. And once all architectures have gone through first-pass binary domination, superseding as many arch-specific publications as possible, the second binary-domination pass considers just what needs to happen to those arch-all publications.

== Pre-implementation notes ==

Discussed in detail with Julian; see bug 884649 for the plan we're following in addressing this. We should also be able to re-use liveness decisions across architectures (and probably binary packages building from the same source package).

Raphaƫl suggested another clever improvement: Steve K. has just initialized a new, denormalized column on BinaryPackagePublishingHistory that contains the package name. That should enable a few shortcuts as well.

By the way, note that we can't just dominate arch-all and arch-specific publications separately. Packages may vacillate between the two, producing successive releases that need to be considered together even if domination of some publications may need to be postponed.

== Implementation details ==

I had to ditch _dominatePublications, because the discovery of live versions went into the middle of it. I reasoned that if removing it introduced too much code duplication, it could return with a callback for live-version discovery. But the loss turned out to be pretty small: it's really just a 3-line loop, including log output.

== Tests ==

{{{
./bin/test -vvc lp.archivepublisher.tests.test_dominator
}}}

== Demo and Q/A ==

Dominate Ubuntu and you'll see.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/domination.py
  lib/lp/soyuz/model/publishing.py
  lib/lp/archivepublisher/tests/test_dominator.py

To post a comment you must log in.
Revision history for this message
j.c.sackett (jcsackett) wrote :

Whew. Very extensive commenting, and thanks for it or I don't think I would have followed all of that!

This is some very complex work, but I think it looks good to move forward.

review: Approve
Revision history for this message
Julian Edwards (julian-edwards) wrote :
Download full text (26.2 KiB)

(I reviewed this several hours ago by email and only just noticed that LP rejected my email..GRRRRRRRRRRR)

Thanks for doing this Jeroen, it looks pretty good so far. I've just got a
few nits inline:

> === modified file 'lib/lp/archivepublisher/domination.py'
> --- lib/lp/archivepublisher/domination.py 2011-11-02 12:58:31 +0000
> +++ lib/lp/archivepublisher/domination.py 2011-11-03 10:12:17 +0000
> @@ -52,8 +52,12 @@
>
> __all__ = ['Dominator']
>
> +from collections import defaultdict
> from datetime import timedelta
> -from operator import itemgetter
> +from operator import (
> + attrgetter,
> + itemgetter,
> + )
>
> import apt_pkg
> from storm.expr import (
> @@ -72,6 +76,7 @@
> DecoratedResultSet,
> )
> from canonical.launchpad.interfaces.lpstorm import IStore
> +from canonical.launchpad.utilities.orderingcheck import OrderingCheck
> from lp.registry.model.sourcepackagename import SourcePackageName
> from lp.services.database.bulk import load_related
> from lp.soyuz.enums import (
> @@ -192,6 +197,104 @@
> else:
> return version_comparison
>
> + def sortPublications(self, publications):
> + """Sort publications from most to least current versions."""
> + # Listify; we want to iterate this twice, which won't do for a
> + # non-persistent sequence.
> + sorted_publications = list(publications)

This variable is prematurely named, I think I'd just write:
    publications = list(publications)
and return publications.sort(...) later.

> + # Batch-load associated package releases; we'll be needing them
> + # to compare versions.
> + self.load_releases(sorted_publications)
> + # Now sort. This is that second iteration. An in-place sort
> + # won't hurt the original, because we're working on a copy of
> + # the original iterable.
> + sorted_publications.sort(cmp=self.compare, reverse=True)
> + return sorted_publications
> +
> +
> +def find_live_source_versions(publications):
> + """Find versions out of Published `publications` that should stay live.
> +
> + This particular notion of liveness applies to source domination: the
> + latest version stays live, and that's it.
> +
> + :param publications: An iterable of `SourcePackagePublishingHistory`
> + sorted by descending package version.
> + :return: A list of live versions.
> + """
> + # Given the required sort order, the latest version is at the head
> + # of the list.
> + return [publications[0].sourcepackagerelease.version]

Nicely abstracted.

> +
> +
> +def get_binary_versions(binary_publications):
> + """List versions for sequence of `BinaryPackagePublishingHistory`."""
> + return [pub.binarypackagerelease.version for pub in
binary_publications]

If you're going to do :param: and :return: for the other functions, you
should do it here too!

> +
> +
> +def find_live_binary_versions_pass_1(publications):
> + """Find versions out of Published `publications` that should stay live.
> +
> + This particular notion of liveness applies to first-pass binary
> + domination: the latest version stays l...

review: Approve
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Jon, thanks for taking on this very tough review. It's a crying shame that Julian had also reviewed it, but the extra pair of eyeballs is still appreciated. Even if Julian knows more of the innards, it's also great to hear that someone less familiar with the intricacies of Soyuz and the archive publisher can make sense at least some of this.

Let's not do this too much though or you'll get sucked into Soyuz-related work like I did. :)

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :
Download full text (11.4 KiB)

> Thanks for doing this Jeroen, it looks pretty good so far. I've just got a
> few nits inline:

Ah, one of those reviews that actually make my code better. Good!

> > === modified file 'lib/lp/archivepublisher/domination.py'
> > --- lib/lp/archivepublisher/domination.py 2011-11-02 12:58:31 +0000
> > +++ lib/lp/archivepublisher/domination.py 2011-11-03 10:12:17 +0000

> > @@ -192,6 +197,104 @@
> > else:
> > return version_comparison
> >
> > + def sortPublications(self, publications):
> > + """Sort publications from most to least current versions."""
> > + # Listify; we want to iterate this twice, which won't do for a
> > + # non-persistent sequence.
> > + sorted_publications = list(publications)
>
> This variable is prematurely named, I think I'd just write:
> publications = list(publications)
> and return publications.sort(...) later.

Yup, you're right. Done.

(Apart from one nasty little subtlety: surprisingly, list.sort does not return the sorted list. I suppose that's to avoid obscuring the fact that it modifies the list in-place.)

> > +def get_binary_versions(binary_publications):
> > + """List versions for sequence of `BinaryPackagePublishingHistory`."""
> > + return [pub.binarypackagerelease.version for pub in
> binary_publications]
>
> If you're going to do :param: and :return: for the other functions, you
> should do it here too!

Hoist by my own pedantard. Done.

> > +def find_live_binary_versions_pass_1(publications):
> > + """Find versions out of Published `publications` that should stay live.
> > +
> > + This particular notion of liveness applies to first-pass binary
> > + domination: the latest version stays live, and so do publications of
> > + binary packages for the "all" architecture.
> > +
> > + :param publications: An iterable of `BinaryPackagePublishingHistory`,
> > + sorted by descending package version.
> > + :return: A list of live versions.
> > + """
> > + publications = list(publications)
> > + latest = publications.pop(0)
> > + return get_binary_versions(
> > + [latest] + [
> > + pub for pub in publications if not pub.architecture_specific])
>
> pub.architecture_specific masks a query, in case you didn't know. I
> absolutely detest properties that do that, it encourages SQL death-by-a-
> thousand-cuts.

I spotted it through sheer luck. It's just dereferencing a Storm reference though, and I took care to bulk-load the referenced BinaryPackageReleases.

> > +def find_live_binary_versions_pass_2(publications):
> > + """Find versions out of Published `publications` that should stay live.
> > +
> > + This particular notion of liveness applies to second-pass binary
> > + domination: the latest version stays live, and architecture-specific
> > + publications stay live (i.e, ones that are not for the "all"
> > + architecture).
> > +
> > + More importantly, any publication for binary packages of the "all"
> > + architecture stay live if any of the non-"all" binary packages from
> > + the same source package release are still active -- even if they are
> > + f...

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Friday 04 November 2011 03:58:24 you wrote:
> > Thanks for doing this Jeroen, it looks pretty good so far. I've just
> > got a
> > few nits inline:
> Ah, one of those reviews that actually make my code better. Good!

You asked for it!

> (Apart from one nasty little subtlety: surprisingly, list.sort does not
> return the sorted list. I suppose that's to avoid obscuring the fact that
> it modifies the list in-place.)

Right, that's annoying.

> That's what I keep thinking too. But the python documentation says
> otherwise. Perhaps it was some other massively useful function that got
> deprecated, or perhaps people changed their minds.
>
> Anyway, looking this up also made me find ifilter and ifilterfalse (in
> itertools). With those, I can finally say what I need here tersely and
> uniformly:
>
> arch_specific_pubs = list(ifilter(is_arch_specific, publications))
> arch_indep_pubs = list(ifilterfalse(is_arch_specific, publications))

I had not seen those before, very nice.

It would serve us some use to trawl through the Python built-ins at some
point!

> It's not very costly compared to inevitable work we're already doing: for a
> list of n publications, this does n-1 version comparisons. That's
> guaranteed to be no more than the sort operation that produced the
> publications list in the first place.
>
> The loop depends very heavily on the sort order that this verifies, and
> there are several call sites that do the sorting. It'd be nice if we could
> just re-sort to make sure of the right order, but besides being more
> expensive that also wouldn't help the liveness functions which also rely on
> sort order.
>
> Given how important it is to get this right and how easy it is to get it
> wrong, I think verifying the order is worth the small added cost.

I agree that it's important. I have an additional suggestion - rename the
method parameter so it's "sorted_publications" and also update the docstring
to say it *must* be sorted.

> Done. But, er... does this mean I'm condemned to another year of this? :-)

You might think so but I could not possibly comment.

Thanks!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/archivepublisher/domination.py'
2--- lib/lp/archivepublisher/domination.py 2011-11-02 12:58:31 +0000
3+++ lib/lp/archivepublisher/domination.py 2011-11-04 04:07:32 +0000
4@@ -52,8 +52,16 @@
5
6 __all__ = ['Dominator']
7
8+from collections import defaultdict
9 from datetime import timedelta
10-from operator import itemgetter
11+from itertools import (
12+ ifilter,
13+ ifilterfalse,
14+ )
15+from operator import (
16+ attrgetter,
17+ itemgetter,
18+ )
19
20 import apt_pkg
21 from storm.expr import (
22@@ -72,6 +80,7 @@
23 DecoratedResultSet,
24 )
25 from canonical.launchpad.interfaces.lpstorm import IStore
26+from canonical.launchpad.utilities.orderingcheck import OrderingCheck
27 from lp.registry.model.sourcepackagename import SourcePackageName
28 from lp.services.database.bulk import load_related
29 from lp.soyuz.enums import (
30@@ -192,6 +201,107 @@
31 else:
32 return version_comparison
33
34+ def sortPublications(self, publications):
35+ """Sort publications from most to least current versions."""
36+ # Listify; we want to iterate this twice, which won't do for a
37+ # non-persistent sequence.
38+ publications = list(publications)
39+ # Batch-load associated package releases; we'll be needing them
40+ # to compare versions.
41+ self.load_releases(publications)
42+ # Now sort. This is that second iteration. An in-place sort
43+ # won't hurt the original, because we're working on a copy of
44+ # the original iterable.
45+ publications.sort(cmp=self.compare, reverse=True)
46+ return publications
47+
48+
49+def find_live_source_versions(publications):
50+ """Find versions out of Published `publications` that should stay live.
51+
52+ This particular notion of liveness applies to source domination: the
53+ latest version stays live, and that's it.
54+
55+ :param publications: An iterable of `SourcePackagePublishingHistory`
56+ sorted by descending package version.
57+ :return: A list of live versions.
58+ """
59+ # Given the required sort order, the latest version is at the head
60+ # of the list.
61+ return [publications[0].sourcepackagerelease.version]
62+
63+
64+def get_binary_versions(binary_publications):
65+ """List versions for sequence of `BinaryPackagePublishingHistory`.
66+
67+ :param binary_publications: An iterable of
68+ `BinaryPackagePublishingHistory`.
69+ :return: A list of the publications' respective versions.
70+ """
71+ return [pub.binarypackagerelease.version for pub in binary_publications]
72+
73+
74+def find_live_binary_versions_pass_1(publications):
75+ """Find versions out of Published `publications` that should stay live.
76+
77+ This particular notion of liveness applies to first-pass binary
78+ domination: the latest version stays live, and so do publications of
79+ binary packages for the "all" architecture.
80+
81+ :param publications: An iterable of `BinaryPackagePublishingHistory`,
82+ sorted by descending package version.
83+ :return: A list of live versions.
84+ """
85+ publications = list(publications)
86+ latest = publications.pop(0)
87+ return get_binary_versions(
88+ [latest] + [
89+ pub for pub in publications if not pub.architecture_specific])
90+
91+
92+def find_live_binary_versions_pass_2(publications):
93+ """Find versions out of Published `publications` that should stay live.
94+
95+ This particular notion of liveness applies to second-pass binary
96+ domination: the latest version stays live, and architecture-specific
97+ publications stay live (i.e, ones that are not for the "all"
98+ architecture).
99+
100+ More importantly, any publication for binary packages of the "all"
101+ architecture stay live if any of the non-"all" binary packages from
102+ the same source package release are still active -- even if they are
103+ for other architectures.
104+
105+ This is the raison d'etre for the two-pass binary domination algorithm:
106+ to let us see which architecture-independent binary publications can be
107+ superseded without rendering any architecture-specific binaries from the
108+ same source package release uninstallable.
109+
110+ (Note that here, "active" includes Published publications but also
111+ Pending ones. This is standard nomenclature in Soyuz. Some of the
112+ domination code confuses matters by using the term "active" to mean only
113+ Published publications).
114+
115+ :param publications: An iterable of `BinaryPackagePublishingHistory`,
116+ sorted by descending package version.
117+ :return: A list of live versions.
118+ """
119+ publications = list(publications)
120+ latest = publications.pop(0)
121+ is_arch_specific = attrgetter('architecture_specific')
122+ arch_specific_pubs = list(ifilter(is_arch_specific, publications))
123+ arch_indep_pubs = list(ifilterfalse(is_arch_specific, publications))
124+
125+ # XXX JeroenVermeulen 2011-11-01 bug=884649: This is likely to be
126+ # costly, and the result could be reused for all builds of the same
127+ # source package release to all architectures.
128+ reprieved_pubs = [
129+ pub
130+ for pub in arch_indep_pubs
131+ if pub.getOtherPublicationsForSameSource().any()]
132+
133+ return get_binary_versions([latest] + arch_specific_pubs + reprieved_pubs)
134+
135
136 class Dominator:
137 """Manage the process of marking packages as superseded.
138@@ -209,27 +319,6 @@
139 self.logger = logger
140 self.archive = archive
141
142- def _checkArchIndep(self, publication):
143- """Return True if the binary publication can be superseded.
144-
145- If the publication is an arch-indep binary, we can only supersede
146- it if all the binaries from the same source are also superseded,
147- else those binaries may become uninstallable.
148- See bug 34086.
149- """
150- binary = publication.binarypackagerelease
151- if not binary.architecturespecific:
152- # getOtherPublicationsForSameSource returns PENDING in
153- # addition to PUBLISHED binaries, and we rely on this since
154- # they must also block domination.
155- others = publication.getOtherPublicationsForSameSource()
156- if others.any():
157- # Don't dominate this arch:all binary as there are
158- # other arch-specific binaries from the same build
159- # that are still active.
160- return False
161- return True
162-
163 def dominatePackage(self, publications, live_versions, generalization):
164 """Dominate publications for a single package.
165
166@@ -247,34 +336,33 @@
167
168 :param publications: Iterable of publications for the same package,
169 in the same archive, series, and pocket, all with status
170- `PackagePublishingStatus.PUBLISHED`.
171- :param live_versions: Iterable of version strings that are still
172- considered live for this package. The given publications will
173- remain active insofar as they represent any of these versions;
174- older publications will be marked as superseded and newer ones
175- as deleted.
176+ `PackagePublishingStatus.PUBLISHED`. They must be sorted from
177+ most current to least current, as would be the result of
178+ `generalization.sortPublications`.
179+ :param live_versions: Iterable of versions that are still considered
180+ "live" for this package. For any of these, the latest publication
181+ among `publications` will remain Published. Publications for
182+ older releases, as well as older publications of live versions,
183+ will be marked as Superseded. Publications of newer versions than
184+ are listed in `live_versions` are marked as Deleted.
185 :param generalization: A `GeneralizedPublication` helper representing
186- the kind of publications these are--source or binary.
187+ the kind of publications these are: source or binary.
188 """
189- publications = list(publications)
190- generalization.load_releases(publications)
191-
192- # Go through publications from latest version to oldest. This
193- # makes it easy to figure out which release superseded which:
194- # the dominant is always the oldest live release that is newer
195- # than the one being superseded. In this loop, that means the
196- # dominant is always the last live publication we saw.
197- publications = sorted(
198- publications, cmp=generalization.compare, reverse=True)
199+ live_versions = frozenset(live_versions)
200
201 self.logger.debug(
202 "Package has %d live publication(s). Live versions: %s",
203 len(publications), live_versions)
204
205+ # Verify that the publications are really sorted properly.
206+ check_order = OrderingCheck(cmp=generalization.compare, reverse=True)
207+
208 current_dominant = None
209 dominant_version = None
210
211 for pub in publications:
212+ check_order.check(pub)
213+
214 version = generalization.getPackageVersion(pub)
215 # There should never be two published releases with the same
216 # version. So it doesn't matter whether this comparison is
217@@ -295,11 +383,6 @@
218 current_dominant = pub
219 dominant_version = version
220 self.logger.debug2("Keeping version %s.", version)
221- elif not (generalization.is_source or self._checkArchIndep(pub)):
222- # As a special case, we keep this version live as well.
223- current_dominant = pub
224- dominant_version = version
225- self.logger.debug2("Keeping version %s.", version)
226 elif current_dominant is None:
227 # This publication is no longer live, but there is no
228 # newer version to supersede it either. Therefore it
229@@ -312,50 +395,32 @@
230 pub.supersede(current_dominant, logger=self.logger)
231 self.logger.debug2("Superseding version %s.", version)
232
233- def _dominatePublications(self, pubs, generalization):
234- """Perform dominations for the given publications.
235-
236- Keep the latest published version for each package active,
237- superseding older versions.
238-
239- :param pubs: A dict mapping names to a list of publications. Every
240- publication must be PUBLISHED or PENDING, and the first in each
241- list will be treated as dominant (so should be the latest).
242- :param generalization: A `GeneralizedPublication` helper representing
243- the kind of publications these are--source or binary.
244- """
245- self.logger.debug("Dominating packages...")
246- for name, publications in pubs.iteritems():
247- assert publications, "Empty list of publications for %s." % name
248- # Since this always picks the latest version as the live
249- # one, this dominatePackage call will never result in a
250- # deletion.
251- latest_version = generalization.getPackageVersion(publications[0])
252- self.logger.debug2("Dominating %s" % name)
253- self.dominatePackage(
254- publications, [latest_version], generalization)
255-
256- def _sortPackages(self, pkglist, generalization):
257- """Map out packages by name, and sort by descending version.
258-
259- :param pkglist: An iterable of `SourcePackagePublishingHistory` or
260- `BinaryPackagePublishingHistory`.
261- :param generalization: A `GeneralizedPublication` helper representing
262- the kind of publications these are--source or binary.
263- :return: A dict mapping each package name to a list of publications
264- from `pkglist`, newest first.
265- """
266- self.logger.debug("Sorting packages...")
267-
268- outpkgs = {}
269- for inpkg in pkglist:
270- key = generalization.getPackageName(inpkg)
271- outpkgs.setdefault(key, []).append(inpkg)
272-
273- for package_pubs in outpkgs.itervalues():
274- package_pubs.sort(cmp=generalization.compare, reverse=True)
275-
276- return outpkgs
277+ def _sortPackages(self, publications, generalization):
278+ """Partition publications by package name, and sort them.
279+
280+ The publications are sorted from most current to least current,
281+ as required by `dominatePackage` etc.
282+
283+ :param publications: An iterable of `SourcePackagePublishingHistory`
284+ or of `BinaryPackagePublishingHistory`.
285+ :param generalization: A `GeneralizedPublication` helper representing
286+ the kind of publications these are: source or binary.
287+ :return: A dict mapping each package name to a sorted list of
288+ publications from `publications`.
289+ """
290+ pubs_by_package = defaultdict(list)
291+ for pub in publications:
292+ pubs_by_package[generalization.getPackageName(pub)].append(pub)
293+
294+ # Sort the publication lists. This is not an in-place sort, so
295+ # it involves altering the dict while we iterate it. Listify
296+ # the keys so that we can be sure that we're not altering the
297+ # iteration order while iteration is underway.
298+ for package in list(pubs_by_package.keys()):
299+ pubs_by_package[package] = generalization.sortPublications(
300+ pubs_by_package[package])
301+
302+ return pubs_by_package
303
304 def _setScheduledDeletionDate(self, pub_record):
305 """Set the scheduleddeletiondate on a publishing record.
306@@ -520,7 +585,11 @@
307 bins = self.findBinariesForDomination(distroarchseries, pocket)
308 sorted_packages = self._sortPackages(bins, generalization)
309 self.logger.info("Dominating binaries...")
310- self._dominatePublications(sorted_packages, generalization)
311+ for name, pubs in sorted_packages.iteritems():
312+ self.logger.debug("Dominating %s" % name)
313+ assert len(pubs) > 0, "Dominating zero binaries!"
314+ live_versions = find_live_binary_versions_pass_1(pubs)
315+ self.dominatePackage(pubs, live_versions, generalization)
316
317 # We need to make a second pass to cover the cases where:
318 # * arch-specific binaries were not all dominated before arch-all
319@@ -534,7 +603,11 @@
320 bins = self.findBinariesForDomination(distroarchseries, pocket)
321 sorted_packages = self._sortPackages(bins, generalization)
322 self.logger.info("Dominating binaries...(2nd pass)")
323- self._dominatePublications(sorted_packages, generalization)
324+ for name, pubs in sorted_packages.iteritems():
325+ self.logger.debug("Dominating %s" % name)
326+ assert len(pubs) > 0, "Dominating zero binaries in 2nd pass!"
327+ live_versions = find_live_binary_versions_pass_2(pubs)
328+ self.dominatePackage(pubs, live_versions, generalization)
329
330 def _composeActiveSourcePubsCondition(self, distroseries, pocket):
331 """Compose ORM condition for restricting relevant source pubs."""
332@@ -550,7 +623,16 @@
333 )
334
335 def findSourcesForDomination(self, distroseries, pocket):
336- """Find binary publications that need dominating."""
337+ """Find binary publications that need dominating.
338+
339+ This is only for traditional domination, where the latest published
340+ publication is always kept published. See `find_live_source_versions`
341+ for this logic.
342+
343+ To optimize for that logic, `findSourcesForDomination` will ignore
344+ publications that have no other publications competing for the same
345+ binary package. There'd be nothing to do for those cases.
346+ """
347 # Avoid circular imports.
348 from lp.soyuz.model.publishing import SourcePackagePublishingHistory
349
350@@ -589,11 +671,18 @@
351 distroseries.name, pocket.title)
352
353 generalization = GeneralizedPublication(is_source=True)
354+
355+ self.logger.debug("Finding sources...")
356 sources = self.findSourcesForDomination(distroseries, pocket)
357+ sorted_packages = self._sortPackages(sources, generalization)
358
359 self.logger.debug("Dominating sources...")
360- self._dominatePublications(
361- self._sortPackages(sources, generalization), generalization)
362+ for name, pubs in sorted_packages.iteritems():
363+ self.logger.debug("Dominating %s" % name)
364+ assert len(pubs) > 0, "Dominating zero sources!"
365+ live_versions = find_live_source_versions(pubs)
366+ self.dominatePackage(pubs, live_versions, generalization)
367+
368 flush_database_updates()
369
370 def findPublishedSourcePackageNames(self, distroseries, pocket):
371@@ -653,6 +742,7 @@
372 """
373 generalization = GeneralizedPublication(is_source=True)
374 pubs = self.findPublishedSPPHs(distroseries, pocket, package_name)
375+ pubs = generalization.sortPublications(pubs)
376 self.dominatePackage(pubs, live_versions, generalization)
377
378 def judge(self, distroseries, pocket):
379
380=== modified file 'lib/lp/archivepublisher/tests/test_dominator.py'
381--- lib/lp/archivepublisher/tests/test_dominator.py 2011-11-02 10:28:31 +0000
382+++ lib/lp/archivepublisher/tests/test_dominator.py 2011-11-04 04:07:32 +0000
383@@ -16,6 +16,9 @@
384 from canonical.testing.layers import ZopelessDatabaseLayer
385 from lp.archivepublisher.domination import (
386 Dominator,
387+ find_live_binary_versions_pass_1,
388+ find_live_binary_versions_pass_2,
389+ find_live_source_versions,
390 GeneralizedPublication,
391 STAY_OF_EXECUTION,
392 )
393@@ -30,6 +33,7 @@
394 StormStatementRecorder,
395 TestCaseWithFactory,
396 )
397+from lp.testing.fakemethod import FakeMethod
398 from lp.testing.matchers import HasQueryCount
399
400
401@@ -72,13 +76,9 @@
402 is_source=ISourcePackagePublishingHistory.providedBy(dominant))
403 dominator = Dominator(self.logger, self.ubuntutest.main_archive)
404
405- # The _dominate* test methods require a dictionary where the
406- # package name is the key. The key's value is a list of
407- # source or binary packages representing dominant, the first element
408- # and dominated, the subsequents.
409- pubs = {'foo': [dominant, dominated]}
410-
411- dominator._dominatePublications(pubs, generalization)
412+ pubs = [dominant, dominated]
413+ live_versions = [generalization.getPackageVersion(dominant)]
414+ dominator.dominatePackage(pubs, live_versions, generalization)
415 flush_database_updates()
416
417 # The dominant version remains correctly published.
418@@ -158,16 +158,30 @@
419 [foo_10_source] + foo_10_binaries,
420 PackagePublishingStatus.SUPERSEDED)
421
422- def testEmptyDomination(self):
423- """Domination asserts for not empty input list."""
424- dominator = Dominator(self.logger, self.ubuntutest.main_archive)
425- pubs = {'foo': []}
426- # This isn't a really good exception. It should probably be
427- # something more indicative of bad input.
428- self.assertRaises(
429- AssertionError,
430- dominator._dominatePublications,
431- pubs, GeneralizedPublication(True))
432+ def test_dominateBinaries_rejects_empty_publication_list(self):
433+ """Domination asserts for non-empty input list."""
434+ package = self.factory.makeBinaryPackageName()
435+ dominator = Dominator(self.logger, self.ubuntutest.main_archive)
436+ dominator._sortPackages = FakeMethod({package.name: []})
437+ # This isn't a really good exception. It should probably be
438+ # something more indicative of bad input.
439+ self.assertRaises(
440+ AssertionError,
441+ dominator.dominateBinaries,
442+ self.factory.makeDistroArchSeries().distroseries,
443+ self.factory.getAnyPocket())
444+
445+ def test_dominateSources_rejects_empty_publication_list(self):
446+ """Domination asserts for non-empty input list."""
447+ package = self.factory.makeSourcePackageName()
448+ dominator = Dominator(self.logger, self.ubuntutest.main_archive)
449+ dominator._sortPackages = FakeMethod({package.name: []})
450+ # This isn't a really good exception. It should probably be
451+ # something more indicative of bad input.
452+ self.assertRaises(
453+ AssertionError,
454+ dominator.dominateSources,
455+ self.factory.makeDistroSeries(), self.factory.getAnyPocket())
456
457 def test_archall_domination(self):
458 # Arch-all binaries should not be dominated when a new source
459@@ -358,6 +372,16 @@
460 SeriesStatus.OBSOLETE)
461
462
463+def remove_security_proxies(proxied_objects):
464+ """Return list of `proxied_objects`, without their proxies.
465+
466+ The dominator runs only in scripts, where security proxies don't get
467+ in the way. To test realistically for this environment, strip the
468+ proxies wherever necessary and do as you will.
469+ """
470+ return [removeSecurityProxy(obj) for obj in proxied_objects]
471+
472+
473 def make_spphs_for_versions(factory, versions):
474 """Create publication records for each of `versions`.
475
476@@ -400,14 +424,15 @@
477 archive = das.distroseries.main_archive
478 pocket = factory.getAnyPocket()
479 bprs = [
480- factory.makeBinaryPackageRelease(binarypackagename=bpn)
481+ factory.makeBinaryPackageRelease(
482+ binarypackagename=bpn, version=version)
483 for version in versions]
484- return [
485+ return remove_security_proxies([
486 factory.makeBinaryPackagePublishingHistory(
487 binarypackagerelease=bpr, binarypackagename=bpn,
488 distroarchseries=das, pocket=pocket, archive=archive,
489 sourcepackagename=spn, status=PackagePublishingStatus.PUBLISHED)
490- for bpr in bprs]
491+ for bpr in bprs])
492
493
494 def list_source_versions(spphs):
495@@ -591,9 +616,10 @@
496 def test_dominatePackage_supersedes_older_pub_with_newer_live_pub(self):
497 # When marking a package as superseded, dominatePackage
498 # designates a newer live version as the superseding version.
499+ generalization = GeneralizedPublication(True)
500 pubs = make_spphs_for_versions(self.factory, ['1.0', '1.1'])
501 self.makeDominator(pubs).dominatePackage(
502- pubs, ['1.1'], GeneralizedPublication(True))
503+ generalization.sortPublications(pubs), ['1.1'], generalization)
504 self.assertEqual(PackagePublishingStatus.SUPERSEDED, pubs[0].status)
505 self.assertEqual(pubs[1].sourcepackagerelease, pubs[0].supersededby)
506 self.assertEqual(PackagePublishingStatus.PUBLISHED, pubs[1].status)
507@@ -601,10 +627,11 @@
508 def test_dominatePackage_only_supersedes_with_live_pub(self):
509 # When marking a package as superseded, dominatePackage will
510 # only pick a live version as the superseding one.
511+ generalization = GeneralizedPublication(True)
512 pubs = make_spphs_for_versions(
513 self.factory, ['1.0', '2.0', '3.0', '4.0'])
514 self.makeDominator(pubs).dominatePackage(
515- pubs, ['3.0'], GeneralizedPublication(True))
516+ generalization.sortPublications(pubs), ['3.0'], generalization)
517 self.assertEqual([
518 pubs[2].sourcepackagerelease,
519 pubs[2].sourcepackagerelease,
520@@ -616,23 +643,27 @@
521 def test_dominatePackage_supersedes_with_oldest_newer_live_pub(self):
522 # When marking a package as superseded, dominatePackage picks
523 # the oldest of the newer, live versions as the superseding one.
524+ generalization = GeneralizedPublication(True)
525 pubs = make_spphs_for_versions(self.factory, ['2.7', '2.8', '2.9'])
526 self.makeDominator(pubs).dominatePackage(
527- pubs, ['2.8', '2.9'], GeneralizedPublication(True))
528+ generalization.sortPublications(pubs), ['2.8', '2.9'],
529+ generalization)
530 self.assertEqual(pubs[1].sourcepackagerelease, pubs[0].supersededby)
531
532 def test_dominatePackage_only_supersedes_with_newer_live_pub(self):
533 # When marking a package as superseded, dominatePackage only
534 # considers a newer version as the superseding one.
535+ generalization = GeneralizedPublication(True)
536 pubs = make_spphs_for_versions(self.factory, ['0.1', '0.2'])
537 self.makeDominator(pubs).dominatePackage(
538- pubs, ['0.1'], GeneralizedPublication(True))
539+ generalization.sortPublications(pubs), ['0.1'], generalization)
540 self.assertEqual(None, pubs[1].supersededby)
541 self.assertEqual(PackagePublishingStatus.DELETED, pubs[1].status)
542
543 def test_dominatePackage_supersedes_replaced_pub_for_live_version(self):
544 # Even if a publication record is for a live version, a newer
545 # one for the same version supersedes it.
546+ generalization = GeneralizedPublication(True)
547 spr = self.factory.makeSourcePackageRelease()
548 series = self.factory.makeDistroSeries()
549 pocket = PackagePublishingPocket.RELEASE
550@@ -649,7 +680,8 @@
551 ])
552
553 self.makeDominator(pubs).dominatePackage(
554- pubs, [spr.version], GeneralizedPublication(True))
555+ generalization.sortPublications(pubs), [spr.version],
556+ generalization)
557 self.assertEqual([
558 PackagePublishingStatus.SUPERSEDED,
559 PackagePublishingStatus.SUPERSEDED,
560@@ -661,12 +693,13 @@
561
562 def test_dominatePackage_is_efficient(self):
563 # dominatePackage avoids issuing too many queries.
564+ generalization = GeneralizedPublication(True)
565 versions = ["1.%s" % revision for revision in xrange(5)]
566 pubs = make_spphs_for_versions(self.factory, versions)
567 with StormStatementRecorder() as recorder:
568 self.makeDominator(pubs).dominatePackage(
569- pubs, versions[2:-1],
570- GeneralizedPublication(True))
571+ generalization.sortPublications(pubs), versions[2:-1],
572+ generalization)
573 self.assertThat(recorder, HasQueryCount(LessThan(5)))
574
575 def test_dominatePackage_advanced_scenario(self):
576@@ -677,6 +710,7 @@
577 # don't just patch up the code or this test. Create unit tests
578 # that specifically cover the difference, then change the code
579 # and/or adapt this test to return to harmony.
580+ generalization = GeneralizedPublication(True)
581 series = self.factory.makeDistroSeries()
582 package = self.factory.makeSourcePackageName()
583 pocket = PackagePublishingPocket.RELEASE
584@@ -723,7 +757,8 @@
585
586 all_pubs = sum(pubs_by_version.itervalues(), [])
587 Dominator(DevNullLogger(), series.main_archive).dominatePackage(
588- all_pubs, live_versions, GeneralizedPublication(True))
589+ generalization.sortPublications(all_pubs), live_versions,
590+ generalization)
591
592 for version in reversed(versions):
593 pubs = pubs_by_version[version]
594@@ -1089,3 +1124,82 @@
595 published_spphs,
596 dominator.findSourcesForDomination(
597 spphs[0].distroseries, spphs[0].pocket))
598+
599+
600+def make_publications_arch_specific(pubs, arch_specific=True):
601+ """Set the `architecturespecific` attribute for given SPPHs.
602+
603+ :param pubs: An iterable of `BinaryPackagePublishingHistory`.
604+ :param arch_specific: Whether the binary package releases published
605+ by `pubs` are to be architecture-specific. If not, they will be
606+ treated as being for the "all" architecture.
607+ """
608+ for pub in pubs:
609+ bpr = removeSecurityProxy(pub).binarypackagerelease
610+ bpr.architecturespecific = arch_specific
611+
612+
613+class TestLivenessFunctions(TestCaseWithFactory):
614+ """Tests for the functions that say which versions are live."""
615+
616+ layer = ZopelessDatabaseLayer
617+
618+ def test_find_live_source_versions_blesses_latest(self):
619+ # find_live_source_versions, assuming that you passed it
620+ # publications sorted from most current to least current
621+ # version, simply returns the most current version.
622+ spphs = make_spphs_for_versions(self.factory, ['1.2', '1.1', '1.0'])
623+ self.assertEqual(['1.2'], find_live_source_versions(spphs))
624+
625+ def test_find_live_binary_versions_pass_1_blesses_latest(self):
626+ # find_live_binary_versions_pass_1 always includes the latest
627+ # version among the input publications in its result.
628+ bpphs = make_bpphs_for_versions(self.factory, ['1.2', '1.1', '1.0'])
629+ make_publications_arch_specific(bpphs)
630+ self.assertEqual(['1.2'], find_live_binary_versions_pass_1(bpphs))
631+
632+ def test_find_live_binary_versions_pass_1_blesses_arch_all(self):
633+ # find_live_binary_versions_pass_1 includes any
634+ # architecture-independent publications among the input in its
635+ # result.
636+ versions = list(reversed(['1.%d' % version for version in range(3)]))
637+ bpphs = make_bpphs_for_versions(self.factory, versions)
638+
639+ # All of these publications are architecture-specific, except
640+ # the last one. This would happen if the binary package had
641+ # just changed from being architecture-specific to being
642+ # architecture-independent.
643+ make_publications_arch_specific(bpphs, True)
644+ make_publications_arch_specific(bpphs[-1:], False)
645+ self.assertEqual(
646+ versions[:1] + versions[-1:],
647+ find_live_binary_versions_pass_1(bpphs))
648+
649+ def test_find_live_binary_versions_pass_2_blesses_latest(self):
650+ # find_live_binary_versions_pass_2 always includes the latest
651+ # version among the input publications in its result.
652+ bpphs = make_bpphs_for_versions(self.factory, ['1.2', '1.1', '1.0'])
653+ make_publications_arch_specific(bpphs, False)
654+ self.assertEqual(['1.2'], find_live_binary_versions_pass_2(bpphs))
655+
656+ def test_find_live_binary_versions_pass_2_blesses_arch_specific(self):
657+ # find_live_binary_versions_pass_2 includes any
658+ # architecture-specific publications among the input in its
659+ # result.
660+ versions = list(reversed(['1.%d' % version for version in range(3)]))
661+ bpphs = make_bpphs_for_versions(self.factory, versions)
662+ make_publications_arch_specific(bpphs)
663+ self.assertEqual(versions, find_live_binary_versions_pass_2(bpphs))
664+
665+ def test_find_live_binary_versions_pass_2_reprieves_arch_all(self):
666+ # An arch-all BPPH for a BPR built by an SPR that also still has
667+ # active arch-dependent BPPHs gets a reprieve: it can't be
668+ # superseded until those arch-dependent BPPHs have been
669+ # superseded.
670+ bpphs = make_bpphs_for_versions(self.factory, ['1.2', '1.1', '1.0'])
671+ make_publications_arch_specific(bpphs, False)
672+ dependent = self.factory.makeBinaryPackagePublishingHistory(
673+ binarypackagerelease=bpphs[1].binarypackagerelease)
674+ make_publications_arch_specific([dependent], True)
675+ self.assertEqual(
676+ ['1.2', '1.1'], find_live_binary_versions_pass_2(bpphs))