Merge lp:~rvb/launchpad/activereviews-bug-867941-hugequery into lp:launchpad

Proposed by Raphaël Badin
Status: Merged
Approved by: Raphaël Badin
Approved revision: no longer in the source branch.
Merged at revision: 14355
Proposed branch: lp:~rvb/launchpad/activereviews-bug-867941-hugequery
Merge into: lp:launchpad
Diff against target: 194 lines (+78/-23)
1 file modified
lib/lp/code/model/branchcollection.py (+78/-23)
To merge this branch: bzr merge lp:~rvb/launchpad/activereviews-bug-867941-hugequery
Reviewer Review Type Date Requested Status
Abel Deuring (community) code Approve
Review via email: mp+82650@code.launchpad.net

Commit message

[incr] [r=adeuring][bug=867941] Refactor the huge query issued by activereviews to use a CTE.

Description of the change

= Summary =
This branch aims to improve the performance of code.lp.net/~user/+activereviews. Apart from the many small queries issued by this page (this should be fixed by lp:~rvb/launchpad/activereviews-bug-867941) the main problem with this page is the huge query that is issued when the number of public branches for a user is large. For instance, the page https://code.launchpad.net/~ubuntu-branches/+activereviews issues in production a 4.5s query (https://pastebin.canonical.com/55779/). The approach here is to refactor the query to use a CTE to materialize once and for all the list of the public and private-but-visible-for-the-user branch ids instead of repeating this expensive subquery (in the paste above, the expensive subquery is present lines 10-23, 24-37, 49-62 and 66-79). It would be an even better option to not even materialize this list but the code inside ./lib/lp/code/model/branchcollection.py makes this difficult to achieve that without a huge refactoring. I decided to do something simpler that should still improve things significantly.

= Details =
The main change here is the refactoring of _getBranchVisibilityExpression to use a CTE instead of a subquery. The new version of _getBranchVisibilityExpression populates self._with_dict which will be use to build the CTE. We use a dict that is then transformed into a proper "with" storm expression so that many calls to _addVisibleBranchesCTE will not create many declarations of the same CTE.
Another trick is that getMergeProposalsForPerson uses getMergeProposals and getMergeProposalsForReviewer and then uses an union of the results. To avoid having the CTE declared in both queries, I had to restructure the code to a) be able to get BMP ids — hence the introduction of the parameter return_ids) b) be able to specify at which level the CTE must be defined — hence the introduction of the parameter include_with (getMergeProposals or getMergeProposalsForReviewer can also be used directly by external code).

= Tests =
I admit I've not created any test for this even if I've tested it manually to make sure that the query issued is of the right form. I don't think a test to assert that the query is of the right form would be appropriate but maybe a reviewer will have an better idea.

= Q/A =
This branch should not change the behavior of the +activereview page but it's important to make sure that this really improves the performance before we release it.

Here are a few numbers on how the huge original query performs:
qastaging: [1860.0, 6737.0, 1846.0, 1953.0, 1860.0, 1677.0, 1734.0, 2161.0, 2087.0, 1948.0, 2007.0, 1750.0]
prod: [4517.0, 4786.0, 4786.0, 4517.0, 4502.0, 4502.0, 4502.0, 4350.0]

To post a comment you must log in.
Revision history for this message
Abel Deuring (adeuring) wrote :

very nice!

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/code/model/branchcollection.py'
2--- lib/lp/code/model/branchcollection.py 2011-11-16 17:23:42 +0000
3+++ lib/lp/code/model/branchcollection.py 2011-11-19 17:29:26 +0000
4@@ -128,6 +128,16 @@
5 if exclude_from_search is None:
6 exclude_from_search = []
7 self._exclude_from_search = exclude_from_search
8+ self._with_dict = {}
9+
10+ @property
11+ def store_with_with(self):
12+ store = self.store
13+ if len(self._with_dict) != 0:
14+ with_expr = [With(name, expr)
15+ for (name, expr) in self._with_dict.items()]
16+ store = store.with_(with_expr)
17+ return store
18
19 def count(self):
20 """See `IBranchCollection`."""
21@@ -265,7 +275,8 @@
22 self._tables.values() + self._asymmetric_tables.values())
23 tables = [Branch] + list(all_tables)
24 expressions = self._getBranchExpressions()
25- resultset = self.store.using(*tables).find(Branch, *expressions)
26+ resultset = self.store_with_with.using(
27+ *tables).find(Branch, *expressions)
28 if not eager_load:
29 return resultset
30
31@@ -286,6 +297,15 @@
32 target_branch=None, merged_revnos=None,
33 eager_load=False):
34 """See `IBranchCollection`."""
35+ return self._getMergeProposals(
36+ statuses=statuses, for_branches=for_branches,
37+ target_branch=target_branch, merged_revnos=merged_revnos,
38+ eager_load=eager_load)
39+
40+ def _getMergeProposals(self, statuses=None, for_branches=None,
41+ target_branch=None, merged_revnos=None,
42+ eager_load=False, return_ids=False,
43+ include_with=True):
44 if for_branches is not None and not for_branches:
45 # We have an empty branches list, so we can shortcut.
46 return EmptyResultSet()
47@@ -297,15 +317,19 @@
48 target_branch is not None or
49 merged_revnos is not None):
50 return self._naiveGetMergeProposals(statuses, for_branches,
51- target_branch, merged_revnos, eager_load)
52+ target_branch, merged_revnos, eager_load,
53+ return_ids=return_ids, include_with=include_with)
54 else:
55 # When examining merge proposals in a scope, this is a moderately
56 # effective set of constrained queries. It is not effective when
57 # unscoped or when tight constraints on branches are present.
58- return self._scopedGetMergeProposals(statuses)
59+ return self._scopedGetMergeProposals(
60+ statuses, return_ids=return_ids)
61
62 def _naiveGetMergeProposals(self, statuses=None, for_branches=None,
63- target_branch=None, merged_revnos=None, eager_load=False):
64+ target_branch=None, merged_revnos=None,
65+ eager_load=False, return_ids=False,
66+ include_with=True):
67
68 def do_eager_load(rows):
69 branch_ids = set()
70@@ -362,14 +386,20 @@
71 if statuses is not None:
72 expressions.append(
73 BranchMergeProposal.queue_status.is_in(statuses))
74- resultset = self.store.using(*tables).find(
75- BranchMergeProposal, *expressions)
76+ if include_with:
77+ store = self.store_with_with
78+ else:
79+ store = self.store
80+ returned_obj = (
81+ BranchMergeProposal.id if return_ids else BranchMergeProposal)
82+ resultset = store.using(*tables).find(
83+ returned_obj, *expressions)
84 if not eager_load:
85 return resultset
86 else:
87 return DecoratedResultSet(resultset, pre_iter_hook=do_eager_load)
88
89- def _scopedGetMergeProposals(self, statuses):
90+ def _scopedGetMergeProposals(self, statuses, return_ids=False):
91 scope_tables = [Branch] + self._tables.values()
92 scope_expressions = self._branch_filter_expressions
93 select = self.store.using(*scope_tables).find(
94@@ -392,21 +422,28 @@
95 if statuses is not None:
96 expressions.append(
97 BranchMergeProposal.queue_status.is_in(statuses))
98+ returned_obj = (
99+ BranchMergeProposal.id if return_ids else BranchMergeProposal)
100 return self.store.with_(with_expr).using(*tables).find(
101- BranchMergeProposal, *expressions)
102+ returned_obj, *expressions)
103
104 def getMergeProposalsForPerson(self, person, status=None):
105 """See `IBranchCollection`."""
106 # We want to limit the proposals to those where the source branch is
107 # limited by the defined collection.
108- owned = self.ownedBy(person).getMergeProposals(status)
109- reviewing = self.getMergeProposalsForReviewer(person, status)
110- resultset = owned.union(reviewing)
111+ owned = self.ownedBy(person)._getMergeProposals(
112+ status, include_with=False, return_ids=True)
113+ reviewing = self._getMergeProposalsForReviewer(
114+ person, status, include_with=False, return_ids=True)
115+ result = owned.union(reviewing)
116+ resultset = self.store_with_with.using([BranchMergeProposal]).find(
117+ BranchMergeProposal,
118+ In(BranchMergeProposal.id, result._get_select()))
119
120 def do_eager_load(rows):
121 source_branches = load_related(Branch, rows, ['source_branchID'])
122 # Cache person's data (registrants of the proposal and
123- # owners of the source branches).
124+ # owners of the source branches).
125 person_ids = set().union(
126 (proposal.registrantID for proposal in rows),
127 (branch.ownerID for branch in source_branches))
128@@ -422,6 +459,10 @@
129
130 def getMergeProposalsForReviewer(self, reviewer, status=None):
131 """See `IBranchCollection`."""
132+ return self._getMergeProposalsForReviewer(reviewer, status=status)
133+
134+ def _getMergeProposalsForReviewer(self, reviewer, status=None,
135+ include_with=True, return_ids=False):
136 tables = [
137 BranchMergeProposal,
138 Join(CodeReviewVoteReference,
139@@ -441,8 +482,14 @@
140 if status is not None:
141 expressions.append(
142 BranchMergeProposal.queue_status.is_in(status))
143- proposals = self.store.using(*tables).find(
144- BranchMergeProposal, *expressions)
145+ if include_with:
146+ store = self.store_with_with
147+ else:
148+ store = self.store
149+ returned_obj = (
150+ BranchMergeProposal.id if return_ids else BranchMergeProposal)
151+ proposals = store.using(*tables).find(
152+ returned_obj, *expressions)
153 # Apply sorting here as we can't do it in the browser code. We need
154 # to think carefully about the best places to do this, but not here
155 # nor now.
156@@ -844,21 +891,29 @@
157 Branch.transitively_private == True)))
158 return private_branches
159
160+ def _addVisibleBranchesCTE(self):
161+ """Add the 'visible_branches' CTE to self._with_dict.
162+
163+ """
164+ public_branches = Branch.transitively_private == False
165+ if self._private_branch_ids is not None:
166+ branches = Or(
167+ public_branches,
168+ Branch.id.is_in(self._private_branch_ids))
169+ else:
170+ branches = public_branches
171+ branches_select = self.store.using(
172+ [Branch]).find(Branch.id, branches)._get_select()
173+ self._with_dict.update({'visible_branches': branches_select})
174+
175 def _getBranchVisibilityExpression(self, branch_class=Branch):
176 """Return the where clauses for visibility.
177
178 :param branch_class: The Branch class to use - permits using
179 ClassAliases.
180 """
181- public_branches = branch_class.transitively_private == False
182- if self._private_branch_ids is None:
183- # Public only.
184- return [public_branches]
185- else:
186- public_or_private = Or(
187- public_branches,
188- branch_class.id.is_in(self._private_branch_ids))
189- return [public_or_private]
190+ self._addVisibleBranchesCTE()
191+ return [branch_class.id.is_in(SQL("SELECT id FROM visible_branches"))]
192
193 def _getCandidateBranchesWith(self):
194 """Return WITH clauses defining candidate branches.