Merge lp:~jtv/launchpad/bug-844550 into lp:launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: 14009
Proposed branch: lp:~jtv/launchpad/bug-844550
Merge into: lp:launchpad
Diff against target: 250 lines (+59/-151)
2 files modified
lib/lp/soyuz/scripts/gina/dominate.py (+6/-78)
lib/lp/soyuz/scripts/tests/test_gina.py (+53/-73)
To merge this branch: bzr merge lp:~jtv/launchpad/bug-844550
Reviewer Review Type Date Requested Status
Graham Binns (community) code Approve
Review via email: mp+75150@code.launchpad.net

Commit message

Replace transitional Gina domination with its permanent form.

Description of the change

= Summary =

We're rolling out “transitional Gina domination.” This means that Gina will perform source domination on the imported Debian packages: publications of package versions that are no longer in Debian's Sources lists will become Superseded by their newer versions. Where the very newest versions are deleted from the Sources lists for whatever reason, the publication records in our database will be marked Deleted. (Also, Debian source publications are no longer nonsensically kept Pending; they will be Published immediately upon import.)

But what happens when a package no longer occurs in the Debian Sources list at all? We have tons of these still lying around for old packages. For the historical cases, we'll assume that the last known release for a package superseded all others. So the last known release should end up Deleted and all its older ones Superseded. In the future, when a package disappears from Debian's Sources list, its remaining active publications should be considered Deleted.

To make a smooth transition, transitional domination does the one-off jobs: mark any remaining Pending publications for Debian as Published (the bulk of the work was done in custom SQL), and for any package that isn't in the Sources lists, pick the latest version as “live.” That last version stays Published; the older ones become Deleted.

And then there's this branch. Once transitional domination has run, disposing of all the historical data, we can start marking publications for packages that aren't in the Sources at all as Deleted. That includes, for the historical data, that one last release that transitional domination had kept Published.

== Pre-implementation notes ==

Discussed in great detail with wgrant. Unfortunately the transition to making Debian source package publications Published (rather than Pending) broke some scripts the Ubuntu people were running, which explicitly looked for the doubtful Pending status. No multi-status filtering is available, so the easiest fix was to steam ahead to the more desirable situation and assume that all new publications are Published instead.

== Implementation details ==

Mostly this branch consists of removals of clearly marked transitional code. Things get a bit simpler, as planned. But I extended a test for the gina code, to cover a more complete scenario. There is already an extensive scenario test for the underlying functionality, but the new test code shows in one place how permanent domination behaves over a realistic "chain" of package publications.

== Tests ==

{{{
./bin/test -vvc lp.soyuz.scripts.tests.test_gina
}}}

== Demo and Q/A ==

We've found we can run Gina on dogfood's copy of sid to compare:

    ./scripts/gina.py -v sid

This closely matches what will happen in production, since we also ran transitional Gina domination there. The only slightly unusual thing there is that the Sources list we have on dogfood is a bit older than the publications we have in the database, so some publications are marked Deleted because they're newer than any listed ones.

A useful thing to do before running domination is to copy all (id, status) from SourcePackagePublishingHistory into a table of its own. We currently have such a snapshot from before we ran transitional domination on dogfood.

Then, to get an overview of all changes made by gina:

    http://paste.ubuntu.com/687616/

Results after setting all SPPHs for archive 3 (Debian primary) to Published, for a really thorough domination run:

    http://paste.ubuntu.com/687584/

With this new code, a portion of those 18668 Published records should become Deleted. These should be one per SourcePackageName, and all for source packages that are no longer in the Sources list we have in /src/launchpad.net/gina-mirror/dists/sid/main/source/Sources.gz, or possibly that are in there but only with versions that we don't have.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/soyuz/scripts/gina/dominate.py
  lib/lp/soyuz/scripts/tests/test_gina.py

To post a comment you must log in.
Revision history for this message
Graham Binns (gmb) wrote :

Hi Jeroen,

Excellent branch this; I don't have any questions other than:

112 + self.assertEqual([
113 + PackagePublishingStatus.SUPERSEDED,
114 + PackagePublishingStatus.SUPERSEDED,
115 + PackagePublishingStatus.PUBLISHED,
116 + PackagePublishingStatus.DELETED,
117 + PackagePublishingStatus.PENDING,
118 + ],
119 + [pub.status for pub in spphs])

This might be more comprehensible if it were written as single statements (though I can appreciate that that might be a bit less elegant), purely from a "don't make me think about it" point of view. Also, if it's a series of single asserts, it's likely to be simpler to unwind when something goes wrong.

review: Approve (code)
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

(Marking this Work In Progress to keep it off the review list for now, and to stop me and others from landing it by accident. Transitional Gina domination needs to run on production before this can land.)

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Graham, you raise a difficult point. I went with the entire-list-at-once assertion for exactly the considerations you give. I'll elaborate.

When an early assertion in a series of them fails, the first thing I always want to know is “yeah but what did it do in the other cases, that are all similar except in one important respect?” For instance, did all publications just get the same status for some reason, or are they put in the wrong order, or is it just one case that's off, or was nothing updated at all, or is something else going on? Often I find myself re-running tests in the debugger, or with the failing assertion commented out, or with reordered assertions just to find out. Hence the choice of an assertion that shows the full pattern I expected and the full pattern that was observed. The alternative would be a series of similar tests, but in this case I particularly wanted to give a single full panorama view on what happens to a typical string of publications.

Ideally, I'd like to annotate the items in the assertion with what's unique about how each of them went through the process, and why I expect them to end up in the given way. In most cases that can be done with a bit of case-by-case creativity.

In this case, I can think of two ways of doing that:

1. Assert the “before action” state in a similar way.

2. Compare not lists, but dicts that map versions to statuses. So it'll read as “1.0 is Superseded,” etc.

I could do either one of these, or both. For brevity, one component of clarity, I'd prefer to limit it to one.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/soyuz/scripts/gina/dominate.py'
2--- lib/lp/soyuz/scripts/gina/dominate.py 2011-09-16 11:41:14 +0000
3+++ lib/lp/soyuz/scripts/gina/dominate.py 2011-09-21 08:14:25 +0000
4@@ -20,86 +20,14 @@
5 series = getUtility(IDistributionSet)[distro_name].getSeries(series_name)
6 dominator = Dominator(logger, series.main_archive)
7
8- # XXX JeroenVermeulen 2011-09-08, bug=844550: This is a transitional
9- # hack. Gina used to create SPPHs in Pending state. We cleaned up
10- # the bulk of them, and changed the code to create Published ones, but
11- # some new ones will have been created since.
12- # Update those to match what the new Gina does.
13- from canonical.launchpad.interfaces.lpstorm import IStore
14- from lp.soyuz.enums import PackagePublishingStatus
15- from lp.soyuz.model.publishing import SourcePackagePublishingHistory
16- SPPH = SourcePackagePublishingHistory
17- store = IStore(SPPH)
18- spphs = store.find(
19- SPPH,
20- SPPH.archive == series.main_archive,
21- SPPH.distroseries == series,
22- SPPH.pocket == pocket,
23- SPPH.status == PackagePublishingStatus.PENDING)
24- spphs.set(status=PackagePublishingStatus.PUBLISHED)
25-
26- # Dominate packages found in the Sources list we're importing.
27+ # Dominate all packages published in the series. This includes all
28+ # packages listed in the Sources file we imported, but also packages
29+ # that have been recently deleted.
30 package_names = dominator.findPublishedSourcePackageNames(series, pocket)
31 for package_name in package_names:
32- entries = packages_map.src_map.get(package_name)
33-
34- if entries is None:
35- # XXX JeroenVermeulen 2011-09-08, bug=844550: This is a
36- # transitional hack. The database is full of "Published"
37- # Debian SPPHs whose packages have actually been deleted.
38- # In the future such publications should simply be marked
39- # Deleted, but for the legacy baggage we currently carry
40- # around we'll just do traditional domination first: pick
41- # the latest Published version, and mark the rest of the
42- # SPPHs as superseded by that version. The latest version
43- # will then, finally, be marked appropriately Deleted once
44- # we remove this transitional hack.
45- # To remove the transitional hack, just let live_versions
46- # default to the empty list instead of doing this:
47- import apt_pkg
48- from lp.services.database.bulk import load_related
49- from lp.soyuz.model.sourcepackagerelease import (
50- SourcePackageRelease,
51- )
52- pubs = list(
53- dominator.findPublishedSPPHs(series, pocket, package_name))
54- if len(pubs) <= 1:
55- # Without at least two published SPPHs, the transitional
56- # code will make no changes. Skip, and leave the
57- # algorithm free to assume there's a pubs[0].
58- continue
59- load_related(
60- SourcePackageRelease, pubs, ['sourcepackagereleaseID'])
61-
62- # Close off the transaction to avoid being idle-killed.
63- # Nothing else is going to supersede or delete these
64- # publications in the meantime, so our data stays valid; and
65- # if new ones become published, they still won't be
66- # considered anyway.
67- txn.commit()
68-
69- # Find the "latest" publication. A purely in-memory
70- # operation; won't open a new transaction.
71- def is_newer(candidate, reference):
72- comparison = apt_pkg.VersionCompare(
73- candidate.sourcepackagerelease.version,
74- reference.sourcepackagerelease.version)
75- if comparison > 0:
76- return True
77- elif comparison < 0:
78- return False
79- else:
80- return candidate.datecreated > reference.datecreated
81-
82- latest_pub = pubs[0]
83- for pub in pubs[1:]:
84- if is_newer(pub, latest_pub):
85- latest_pub = pub
86- live_versions = [latest_pub.sourcepackagerelease.version]
87- else:
88- live_versions = [
89- entry['Version']
90- for entry in entries if 'Version' in entry]
91+ entries = packages_map.src_map.get(package_name, [])
92+ live_versions = [
93+ entry['Version'] for entry in entries if 'Version' in entry]
94
95 dominator.dominateRemovedSourceVersions(
96 series, pocket, package_name, live_versions)
97
98=== modified file 'lib/lp/soyuz/scripts/tests/test_gina.py'
99--- lib/lp/soyuz/scripts/tests/test_gina.py 2011-09-16 10:48:59 +0000
100+++ lib/lp/soyuz/scripts/tests/test_gina.py 2011-09-21 08:14:25 +0000
101@@ -36,15 +36,51 @@
102 # packages that Gina imports.
103 logger = DevNullLogger()
104 txn = FakeTransaction()
105- pub = self.factory.makeSourcePackagePublishingHistory(
106- status=PackagePublishingStatus.PUBLISHED)
107- series = pub.distroseries
108- spr = pub.sourcepackagerelease
109- package = spr.sourcepackagename
110+ series = self.factory.makeDistroSeries()
111+ pocket = PackagePublishingPocket.RELEASE
112+ package = self.factory.makeSourcePackageName()
113+
114+ # Realistic situation: there's an older, superseded publication;
115+ # a series of active ones; and a newer, pending publication
116+ # that's not in the Sources lists yet.
117+ # Gina dominates the Published ones and leaves the rest alone.
118+ old_spph = self.factory.makeSourcePackagePublishingHistory(
119+ distroseries=series, archive=series.main_archive,
120+ pocket=pocket, status=PackagePublishingStatus.SUPERSEDED,
121+ sourcepackagerelease=self.factory.makeSourcePackageRelease(
122+ sourcepackagename=package, version='1.0'))
123+
124+ active_spphs = [
125+ self.factory.makeSourcePackagePublishingHistory(
126+ distroseries=series, archive=series.main_archive,
127+ pocket=pocket, status=PackagePublishingStatus.PUBLISHED,
128+ sourcepackagerelease=self.factory.makeSourcePackageRelease(
129+ sourcepackagename=package, version=version))
130+ for version in ['1.1', '1.1.1', '1.1.1.1']]
131+
132+ new_spph = self.factory.makeSourcePackagePublishingHistory(
133+ distroseries=series, archive=series.main_archive,
134+ pocket=pocket, status=PackagePublishingStatus.PENDING,
135+ sourcepackagerelease=self.factory.makeSourcePackageRelease(
136+ sourcepackagename=package, version='1.2'))
137+
138+ spphs = [old_spph] + active_spphs + [new_spph]
139+
140+ # Of the active publications, in this scenario, only one version
141+ # matches what Gina finds in the Sources list. It stays
142+ # published; older active publications are superseded, newer
143+ # ones deleted.
144 dominate_imported_source_packages(
145- txn, logger, series.distribution.name, series.name, pub.pocket,
146- FakePackagesMap({package.name: []}))
147- self.assertEqual(PackagePublishingStatus.DELETED, pub.status)
148+ txn, logger, series.distribution.name, series.name, pocket,
149+ FakePackagesMap({package.name: [{'Version': '1.1.1'}]}))
150+ self.assertEqual([
151+ PackagePublishingStatus.SUPERSEDED,
152+ PackagePublishingStatus.SUPERSEDED,
153+ PackagePublishingStatus.PUBLISHED,
154+ PackagePublishingStatus.DELETED,
155+ PackagePublishingStatus.PENDING,
156+ ],
157+ [pub.status for pub in spphs])
158
159 def test_dominate_imported_source_packages_dominates_deletions(self):
160 # dominate_imported_source_packages dominates the source
161@@ -60,81 +96,25 @@
162 sourcepackagerelease=self.factory.makeSourcePackageRelease(
163 sourcepackagename=package, version=version))
164 for version in ['1.0', '1.1', '1.1a']]
165+
166+ # In this scenario, 1.0 is a superseded release.
167+ pubs[0].supersede()
168 logger = DevNullLogger()
169 txn = FakeTransaction()
170 dominate_imported_source_packages(
171 txn, logger, series.distribution.name, series.name, pocket,
172 FakePackagesMap({}))
173- # XXX JeroenVermeulen 2011-09-08, bug=844550: This is
174- # "transitional" domination which supersedes older versions of
175- # deleted packages with the last known version. Permanent
176- # domination will then mark the last known version as deleted.
177- # For permanent domination, the expected outcome is that all
178- # these publications will be Deleted (but any pre-existing
179- # Superseded publications for older versions will remain
180- # Superseded).
181+
182+ # The older, superseded release stays superseded; but the
183+ # releases that dropped out of the imported Sources list without
184+ # known successors are marked deleted.
185 self.assertEqual([
186 PackagePublishingStatus.SUPERSEDED,
187- PackagePublishingStatus.SUPERSEDED,
188- PackagePublishingStatus.PUBLISHED,
189+ PackagePublishingStatus.DELETED,
190+ PackagePublishingStatus.DELETED,
191 ],
192 [pub.status for pub in pubs])
193
194- def test_dominate_imported_source_packages_cleans_up_pending_spphs(self):
195- # XXX JeroenVermeulen 2011-09-08, bug=844550: For transition to
196- # Gina domination, dominate_imported_source_packages turns any
197- # remaining Pending SPPHS into Published ones.
198- series = self.factory.makeDistroSeries()
199- spph = self.factory.makeSourcePackagePublishingHistory(
200- distroseries=series, archive=series.main_archive,
201- status=PackagePublishingStatus.PENDING)
202- spr = spph.sourcepackagerelease
203- package_name = spr.sourcepackagename.name
204- logger = DevNullLogger()
205- txn = FakeTransaction()
206- dominate_imported_source_packages(
207- txn, logger, series.distribution.name, series.name, spph.pocket,
208- FakePackagesMap({package_name: [{"Version": spr.version}]}))
209- self.assertEqual(PackagePublishingStatus.PUBLISHED, spph.status)
210-
211- def test_dominate_imported_source_packages_cleans_up_first(self):
212- # XXX JeroenVermeulen 2011-09-08, bug=844550: For transition to
213- # Gina domination, dominate_imported_source_packages turns any
214- # remaining Pending SPPHS into Published ones. It does this
215- # *before* dominating, so no domination happens while some of
216- # the SPPHs are still mistakenly Pending (which would result in
217- # mistaken deletions).
218- series = self.factory.makeDistroSeries()
219- package = self.factory.makeSourcePackageName()
220- pocket = PackagePublishingPocket.RELEASE
221- versions = ['1.0', '1.1']
222- statuses_before = [
223- PackagePublishingStatus.PUBLISHED,
224- PackagePublishingStatus.PENDING,
225- ]
226- statuses_after = [
227- PackagePublishingStatus.SUPERSEDED,
228- PackagePublishingStatus.PUBLISHED,
229- ]
230- live_version = versions[-1]
231- sprs = [
232- self.factory.makeSourcePackageRelease(
233- sourcepackagename=package, version=version)
234- for version in versions]
235- spphs = [
236- self.factory.makeSourcePackagePublishingHistory(
237- archive=series.main_archive, distroseries=series,
238- sourcepackagerelease=spr, pocket=pocket, status=status)
239- for spr, status in zip(sprs, statuses_before)]
240-
241- logger = DevNullLogger()
242- txn = FakeTransaction()
243- dominate_imported_source_packages(
244- txn, logger, series.distribution.name, series.name, pocket,
245- FakePackagesMap({package.name: [{"Version": live_version}]}))
246-
247- self.assertEqual(statuses_after, [spph.status for spph in spphs])
248-
249
250 class TestSourcePackagePublisher(TestCaseWithFactory):
251