Merge ~cjwatson/launchpad:removed-archive-files-not-reapable into launchpad:master

Proposed by Colin Watson
Status: Merged
Approved by: Colin Watson
Approved revision: 9f52289a5afa002e220ba887562f269a04e028ae
Merge reported by: Otto Co-Pilot
Merged at revision: not available
Proposed branch: ~cjwatson/launchpad:removed-archive-files-not-reapable
Merge into: launchpad:master
Diff against target: 52 lines (+8/-0)
4 files modified
lib/lp/archivepublisher/tests/test_publisher.py (+2/-0)
lib/lp/registry/model/distribution.py (+1/-0)
lib/lp/soyuz/model/archivefile.py (+1/-0)
lib/lp/soyuz/tests/test_archivefile.py (+4/-0)
Reviewer Review Type Date Requested Status
Jürgen Gmach Approve
Review via email: mp+440415@code.launchpad.net

Commit message

Don't consider removed ArchiveFiles as eligible for reaping

Description of the change

Commit 559bbefbb5d7327debd5343c1e812787f5d0caad introduced a gradual performance regression for PPA publishing. In order to support serving old snapshots, `ArchiveFile` rows now have their `date_removed` column set rather than being deleted from the database. However, I failed to account for this in the code that decides which PPAs to publish due to pending deletions and which suites in each PPA should have `by-hash` files pruned from them. As a result, the PPA publisher has been getting gradually slower as we accumulate more removed `ArchiveFile` rows.

Excluding removed rows makes the query for pending-publication PPAs substantially faster since it has an order of magnitude fewer `ArchiveFile` rows to sort, and it should mean that we spend less effort publishing the same PPAs over and over again.

To post a comment you must log in.
Revision history for this message
Jürgen Gmach (jugmac00) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/lib/lp/archivepublisher/tests/test_publisher.py b/lib/lp/archivepublisher/tests/test_publisher.py
2index c7bbea7..ab3fd82 100644
3--- a/lib/lp/archivepublisher/tests/test_publisher.py
4+++ b/lib/lp/archivepublisher/tests/test_publisher.py
5@@ -1499,6 +1499,8 @@ class TestPublisher(TestPublisherBase):
6 archive_file
7 ).scheduled_deletion_date = now - timedelta(hours=12)
8 self.assertIn(archive, ubuntu.getPendingPublicationPPAs())
9+ getUtility(IArchiveFileSet).markDeleted([archive_file])
10+ self.assertNotIn(archive, ubuntu.getPendingPublicationPPAs())
11
12 def testDirtySuitesArchive(self):
13 # getPendingPublicationPPAs returns archives that have dirty_suites
14diff --git a/lib/lp/registry/model/distribution.py b/lib/lp/registry/model/distribution.py
15index 36f3543..e02fc3d 100644
16--- a/lib/lp/registry/model/distribution.py
17+++ b/lib/lp/registry/model/distribution.py
18@@ -1969,6 +1969,7 @@ class Distribution(
19 Archive.distribution == self,
20 ArchiveFile.archive == Archive.id,
21 ArchiveFile.scheduled_deletion_date < UTC_NOW,
22+ ArchiveFile.date_removed == None,
23 )
24 .order_by(Archive.id)
25 .config(distinct=True)
26diff --git a/lib/lp/soyuz/model/archivefile.py b/lib/lp/soyuz/model/archivefile.py
27index 89576a4..f9bbe36 100644
28--- a/lib/lp/soyuz/model/archivefile.py
29+++ b/lib/lp/soyuz/model/archivefile.py
30@@ -215,6 +215,7 @@ class ArchiveFileSet:
31 clauses = [
32 ArchiveFile.archive == archive,
33 ArchiveFile.scheduled_deletion_date < _now(),
34+ ArchiveFile.date_removed == None,
35 ]
36 if container_prefix is not None:
37 clauses.append(ArchiveFile.container.startswith(container_prefix))
38diff --git a/lib/lp/soyuz/tests/test_archivefile.py b/lib/lp/soyuz/tests/test_archivefile.py
39index 13a89f3..85d006f 100644
40--- a/lib/lp/soyuz/tests/test_archivefile.py
41+++ b/lib/lp/soyuz/tests/test_archivefile.py
42@@ -351,6 +351,10 @@ class TestArchiveFile(TestCaseWithFactory):
43 archive, container_prefix="release:"
44 ),
45 )
46+ archive_file_set.markDeleted([archive_files[3]])
47+ self.assertContentEqual(
48+ ["release:foo"], archive_file_set.getContainersToReap(archive)
49+ )
50
51 def test_markDeleted(self):
52 archive = self.factory.makeArchive()

Subscribers

People subscribed via source and target branches

to status/vote changes: