Merge ~cjwatson/launchpad:optimize-latest-uploads into launchpad:master

Proposed by Colin Watson
Status: Merged
Approved by: Colin Watson
Approved revision: 997e14ca8387b301861d9e5a2e772d5364e81ee8
Merge reported by: Otto Co-Pilot
Merged at revision: not available
Proposed branch: ~cjwatson/launchpad:optimize-latest-uploads
Merge into: launchpad:master
Diff against target: 93 lines (+34/-18)
1 file modified
lib/lp/registry/model/distroseries.py (+34/-18)
Reviewer Review Type Date Requested Status
Cristian Gonzalez (community) Approve
Review via email: mp+403735@code.launchpad.net

Commit message

Make DistroSeries.getLatestUploads use a better index

Description of the change

Unassisted, PostgreSQL's planner apparently decides to do an index scan on `PackageUpload`'s primary key, which is very slow. Use a CTE to force it to use the more appropriate index on `PackageUpload (archive, distroseries, status)`.

To post a comment you must log in.
Revision history for this message
Colin Watson (cjwatson) wrote :
Download full text (6.0 KiB)

Old query plan (just `EXPLAIN` as it was far too slow for `ANALYZE`):

```
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=2.14..186.71 rows=5 width=1288)
   -> Nested Loop Left Join (cost=2.14..1614196.46 rows=43728 width=1288)
         -> Nested Loop (cost=1.72..1595163.07 rows=43728 width=1268)
               -> Nested Loop (cost=1.30..1576129.69 rows=43728 width=1268)
                     -> Nested Loop (cost=0.87..1540544.13 rows=43728 width=8)
                           -> Index Scan Backward using distroreleasequeue_pkey on packageupload (cost=0.44..1356507.63 rows=173502 width=4)
                                 Filter: ((archive = ANY ('{1,534}'::integer[])) AND (status = 3) AND (distroseries = 108))
                           -> Index Scan using packageuploadsource__packageupload__key on packageuploadsource (cost=0.43..1.06 rows=1 width=8)
                                 Index Cond: (packageupload = packageupload.id)
                     -> Index Scan using sourcepackagerelease_pkey on sourcepackagerelease (cost=0.43..0.81 rows=1 width=1264)
                           Index Cond: (id = packageuploadsource.sourcepackagerelease)
               -> Index Only Scan using sourcepackagename_pkey on sourcepackagename (cost=0.42..0.44 rows=1 width=4)
                     Index Cond: (id = sourcepackagerelease.sourcepackagename)
         -> Index Scan using sourcepackagename_pkey on sourcepackagename _prejoin1 (cost=0.42..0.44 rows=1 width=20)
               Index Cond: (sourcepackagerelease.sourcepackagename = id)
(15 rows)
```

New query plan:

```
launchpad_staging=> EXPLAIN (ANALYZE, BUFFERS) WITH RelevantUpload AS (SELECT id FROM PackageUpload WHERE status = 3 AND distroseries = 108 AND archive IN (1, 534)) SELECT SourcePackageRelease.*, "_prejoin1".id, "_prejoin1".name FROM RelevantUpload, PackageUploadSource, SourcePackageName, SourcePackageRelease LEFT JOIN SourcePackageName AS "_prejoin1" ON SourcePackageRelease.sourcepackagename = "_prejoin1".id WHERE sourcepackagerelease.id=packageuploadsource.sourcepackagerelease AND sourcepackagerelease.sourcepackagename=sourcepackagename.id AND packageuploadsource.packageupload=relevantupload.id ORDER BY relevantupload.id DESC LIMIT 5;
                                                                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=237905.38..237919.24 rows=5 width=1483) (actual time=236.269..905.036 rows=5 loops=1)
   Buffers: shared hit=80357
   CTE relevantupload
     -> Bitmap Heap Scan on packageupload (cost=3415.68..219334.98 rows=173502 width=4) (actual time=22.161..86.424 rows=139208 loops=1)
           Recheck Cond: ((archive = ANY ('{1,534}'::integer[])) AND (distroseries = 108) AND (status = 3))
           Heap Blocks:...

Read more...

Revision history for this message
Cristian Gonzalez (cristiangsp) wrote :

Looks good!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/lib/lp/registry/model/distroseries.py b/lib/lp/registry/model/distroseries.py
2index 08673ce..b2dc72b 100644
3--- a/lib/lp/registry/model/distroseries.py
4+++ b/lib/lp/registry/model/distroseries.py
5@@ -1,4 +1,4 @@
6-# Copyright 2009-2020 Canonical Ltd. This software is licensed under the
7+# Copyright 2009-2021 Canonical Ltd. This software is licensed under the
8 # GNU Affero General Public License version 3 (see the file LICENSE).
9
10 """Database classes for a distribution series."""
11@@ -14,7 +14,10 @@ __all__ = [
12
13 import collections
14 from io import BytesIO
15-from operator import attrgetter
16+from operator import (
17+ attrgetter,
18+ itemgetter,
19+ )
20
21 import apt_pkg
22 from lazr.delegates import delegate_to
23@@ -30,10 +33,14 @@ from sqlobject import (
24 )
25 from storm.expr import (
26 And,
27+ Column,
28 Desc,
29 Join,
30 Or,
31+ Select,
32 SQL,
33+ Table,
34+ With,
35 )
36 from storm.locals import (
37 Int,
38@@ -167,6 +174,7 @@ from lp.soyuz.model.publishing import (
39 from lp.soyuz.model.queue import (
40 PackageUpload,
41 PackageUploadQueue,
42+ PackageUploadSource,
43 )
44 from lp.soyuz.model.section import Section
45 from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
46@@ -1235,23 +1243,31 @@ class DistroSeries(SQLBase, BugTargetBase, HasSpecificationsMixin,
47
48 def getLatestUploads(self):
49 """See `IDistroSeries`."""
50- query = """
51- sourcepackagerelease.id=packageuploadsource.sourcepackagerelease
52- AND sourcepackagerelease.sourcepackagename=sourcepackagename.id
53- AND packageuploadsource.packageupload=packageupload.id
54- AND packageupload.status=%s
55- AND packageupload.distroseries=%s
56- AND packageupload.archive IN %s
57- """ % sqlvalues(
58- PackageUploadStatus.DONE,
59- self,
60- self.distribution.all_distro_archive_ids)
61+ # Without this CTE, PostgreSQL sometimes decides to scan the primary
62+ # key index on PackageUpload instead, which is very slow.
63+ RelevantUpload = Table("RelevantUpload")
64+ relevant_upload_cte = With(
65+ RelevantUpload.name,
66+ Select(
67+ PackageUpload.id,
68+ And(
69+ PackageUpload.status == PackageUploadStatus.DONE,
70+ PackageUpload.distroseries == self,
71+ PackageUpload.archiveID.is_in(
72+ self.distribution.all_distro_archive_ids))))
73+ clauses = [
74+ SourcePackageRelease.id ==
75+ PackageUploadSource.sourcepackagereleaseID,
76+ SourcePackageRelease.sourcepackagenameID == SourcePackageName.id,
77+ PackageUploadSource.packageuploadID == Column(
78+ "id", RelevantUpload),
79+ ]
80
81- last_uploads = SourcePackageRelease.select(
82- query, limit=5, prejoins=['sourcepackagename'],
83- clauseTables=['SourcePackageName', 'PackageUpload',
84- 'PackageUploadSource'],
85- orderBy=['-packageupload.id'])
86+ last_uploads = DecoratedResultSet(
87+ IStore(SourcePackageRelease).with_(relevant_upload_cte).find(
88+ (SourcePackageRelease, SourcePackageName),
89+ *clauses).order_by(Desc(Column("id", RelevantUpload)))[:5],
90+ result_decorator=itemgetter(0))
91
92 distro_sprs = [
93 self.distribution.getSourcePackageRelease(spr)

Subscribers

People subscribed via source and target branches

to status/vote changes: