Merge ~cjwatson/launchpad:optimize-person-sorting into launchpad:master

Proposed by Colin Watson
Status: Merged
Approved by: Colin Watson
Approved revision: d6b8875f0ad30d36e467bc4d6a5dd090c7a976d0
Merge reported by: Otto Co-Pilot
Merged at revision: not available
Proposed branch: ~cjwatson/launchpad:optimize-person-sorting
Merge into: launchpad:master
Diff against target: 13 lines (+1/-1)
1 file modified
lib/lp/registry/model/person.py (+1/-1)
Reviewer Review Type Date Requested Status
Colin Watson (community) Approve
Review via email: mp+436194@code.launchpad.net

Commit message

Optimize short-circuit case in PersonSet.findPerson

Description of the change

Ordering this query by `Person._storm_sortingColumns` (a somewhat dubiously-named attribute that is in fact just `(Person.displayname, Person.name)` without the `person_sort_key` function) causes PostgreSQL to use a very slow plan that has to sort the whole `Person` table before applying the row limit:

   Limit (cost=798114.96..798115.15 rows=76 width=1049)
     -> Sort (cost=798114.96..806603.87 rows=3395564 width=1049)
           Sort Key: person.displayname, person.name
           -> Merge Join (cost=1.66..675060.95 rows=3395564 width=1049)
                 Merge Cond: (account.id = person.account)
                 -> Index Scan using account_pkey on account (cost=0.43..287109.00 rows=3416242 width=4)
                       Filter: (status <> ALL ('{5,30,40,50,60}'::integer[]))
                 -> Index Scan using person__account__key on person (cost=0.43..330359.42 rows=7441191 width=1049)
                       Filter: ((teamowner IS NULL) AND (merged IS NULL))

Using `Person.sortingColumns` instead is more in line with what this method does in the non-short-circuit case, and it produces a much more sensible plan:

   Limit (cost=0.99..111.69 rows=76 width=1081)
     -> Nested Loop (cost=0.99..4946116.34 rows=3395564 width=1081)
           -> Index Scan using person_sorting_idx on person (cost=0.56..505451.00 rows=7441191 width=1049)
                 Filter: ((teamowner IS NULL) AND (merged IS NULL))
           -> Index Scan using account_pkey on account (cost=0.43..0.48 rows=1 width=4)
                 Index Cond: (id = person.account)
                 Filter: (status <> ALL ('{5,30,40,50,60}'::integer[]))

To post a comment you must log in.
Revision history for this message
Colin Watson (cjwatson) wrote :

This is pretty simple (at least after digging through the SQL details), and there's been plenty of time for objections, so I'm going to go ahead with this.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/lib/lp/registry/model/person.py b/lib/lp/registry/model/person.py
2index 175ccfc..1eef067 100644
3--- a/lib/lp/registry/model/person.py
4+++ b/lib/lp/registry/model/person.py
5@@ -4417,7 +4417,7 @@ class PersonSet:
6 # Short circuit for returning all users in order
7 if not text:
8 results = store.find(Person, base_query)
9- return results.order_by(Person._storm_sortingColumns)
10+ return results.order_by(Person.sortingColumns)
11
12 # We use a UNION here because this makes things *a lot* faster
13 # than if we did a single SELECT with the two following clauses

Subscribers

People subscribed via source and target branches

to status/vote changes: