Merge lp:~rvb/launchpad/activereviews-bug-867941-hugequery into lp:launchpad

Proposed by Raphaël Badin
Status: Merged
Approved by: Raphaël Badin
Approved revision: no longer in the source branch.
Merged at revision: 14355
Proposed branch: lp:~rvb/launchpad/activereviews-bug-867941-hugequery
Merge into: lp:launchpad
Diff against target: 194 lines (+78/-23)
1 file modified
lib/lp/code/model/branchcollection.py (+78/-23)
To merge this branch: bzr merge lp:~rvb/launchpad/activereviews-bug-867941-hugequery
Reviewer Review Type Date Requested Status
Abel Deuring (community) code Approve
Review via email: mp+82650@code.launchpad.net

Commit message

[incr] [r=adeuring][bug=867941] Refactor the huge query issued by activereviews to use a CTE.

Description of the change

= Summary =
This branch aims to improve the performance of code.lp.net/~user/+activereviews. Apart from the many small queries issued by this page (this should be fixed by lp:~rvb/launchpad/activereviews-bug-867941) the main problem with this page is the huge query that is issued when the number of public branches for a user is large. For instance, the page https://code.launchpad.net/~ubuntu-branches/+activereviews issues in production a 4.5s query (https://pastebin.canonical.com/55779/). The approach here is to refactor the query to use a CTE to materialize once and for all the list of the public and private-but-visible-for-the-user branch ids instead of repeating this expensive subquery (in the paste above, the expensive subquery is present lines 10-23, 24-37, 49-62 and 66-79). It would be an even better option to not even materialize this list but the code inside ./lib/lp/code/model/branchcollection.py makes this difficult to achieve that without a huge refactoring. I decided to do something simpler that should still improve things significantly.

= Details =
The main change here is the refactoring of _getBranchVisibilityExpression to use a CTE instead of a subquery. The new version of _getBranchVisibilityExpression populates self._with_dict which will be use to build the CTE. We use a dict that is then transformed into a proper "with" storm expression so that many calls to _addVisibleBranchesCTE will not create many declarations of the same CTE.
Another trick is that getMergeProposalsForPerson uses getMergeProposals and getMergeProposalsForReviewer and then uses an union of the results. To avoid having the CTE declared in both queries, I had to restructure the code to a) be able to get BMP ids — hence the introduction of the parameter return_ids) b) be able to specify at which level the CTE must be defined — hence the introduction of the parameter include_with (getMergeProposals or getMergeProposalsForReviewer can also be used directly by external code).

= Tests =
I admit I've not created any test for this even if I've tested it manually to make sure that the query issued is of the right form. I don't think a test to assert that the query is of the right form would be appropriate but maybe a reviewer will have an better idea.

= Q/A =
This branch should not change the behavior of the +activereview page but it's important to make sure that this really improves the performance before we release it.

Here are a few numbers on how the huge original query performs:
qastaging: [1860.0, 6737.0, 1846.0, 1953.0, 1860.0, 1677.0, 1734.0, 2161.0, 2087.0, 1948.0, 2007.0, 1750.0]
prod: [4517.0, 4786.0, 4786.0, 4517.0, 4502.0, 4502.0, 4502.0, 4350.0]

To post a comment you must log in.
Revision history for this message
Abel Deuring (adeuring) wrote :

very nice!

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/code/model/branchcollection.py'
--- lib/lp/code/model/branchcollection.py 2011-11-16 17:23:42 +0000
+++ lib/lp/code/model/branchcollection.py 2011-11-19 17:29:26 +0000
@@ -128,6 +128,16 @@
128 if exclude_from_search is None:128 if exclude_from_search is None:
129 exclude_from_search = []129 exclude_from_search = []
130 self._exclude_from_search = exclude_from_search130 self._exclude_from_search = exclude_from_search
131 self._with_dict = {}
132
133 @property
134 def store_with_with(self):
135 store = self.store
136 if len(self._with_dict) != 0:
137 with_expr = [With(name, expr)
138 for (name, expr) in self._with_dict.items()]
139 store = store.with_(with_expr)
140 return store
131141
132 def count(self):142 def count(self):
133 """See `IBranchCollection`."""143 """See `IBranchCollection`."""
@@ -265,7 +275,8 @@
265 self._tables.values() + self._asymmetric_tables.values())275 self._tables.values() + self._asymmetric_tables.values())
266 tables = [Branch] + list(all_tables)276 tables = [Branch] + list(all_tables)
267 expressions = self._getBranchExpressions()277 expressions = self._getBranchExpressions()
268 resultset = self.store.using(*tables).find(Branch, *expressions)278 resultset = self.store_with_with.using(
279 *tables).find(Branch, *expressions)
269 if not eager_load:280 if not eager_load:
270 return resultset281 return resultset
271282
@@ -286,6 +297,15 @@
286 target_branch=None, merged_revnos=None,297 target_branch=None, merged_revnos=None,
287 eager_load=False):298 eager_load=False):
288 """See `IBranchCollection`."""299 """See `IBranchCollection`."""
300 return self._getMergeProposals(
301 statuses=statuses, for_branches=for_branches,
302 target_branch=target_branch, merged_revnos=merged_revnos,
303 eager_load=eager_load)
304
305 def _getMergeProposals(self, statuses=None, for_branches=None,
306 target_branch=None, merged_revnos=None,
307 eager_load=False, return_ids=False,
308 include_with=True):
289 if for_branches is not None and not for_branches:309 if for_branches is not None and not for_branches:
290 # We have an empty branches list, so we can shortcut.310 # We have an empty branches list, so we can shortcut.
291 return EmptyResultSet()311 return EmptyResultSet()
@@ -297,15 +317,19 @@
297 target_branch is not None or317 target_branch is not None or
298 merged_revnos is not None):318 merged_revnos is not None):
299 return self._naiveGetMergeProposals(statuses, for_branches,319 return self._naiveGetMergeProposals(statuses, for_branches,
300 target_branch, merged_revnos, eager_load)320 target_branch, merged_revnos, eager_load,
321 return_ids=return_ids, include_with=include_with)
301 else:322 else:
302 # When examining merge proposals in a scope, this is a moderately323 # When examining merge proposals in a scope, this is a moderately
303 # effective set of constrained queries. It is not effective when324 # effective set of constrained queries. It is not effective when
304 # unscoped or when tight constraints on branches are present.325 # unscoped or when tight constraints on branches are present.
305 return self._scopedGetMergeProposals(statuses)326 return self._scopedGetMergeProposals(
327 statuses, return_ids=return_ids)
306328
307 def _naiveGetMergeProposals(self, statuses=None, for_branches=None,329 def _naiveGetMergeProposals(self, statuses=None, for_branches=None,
308 target_branch=None, merged_revnos=None, eager_load=False):330 target_branch=None, merged_revnos=None,
331 eager_load=False, return_ids=False,
332 include_with=True):
309333
310 def do_eager_load(rows):334 def do_eager_load(rows):
311 branch_ids = set()335 branch_ids = set()
@@ -362,14 +386,20 @@
362 if statuses is not None:386 if statuses is not None:
363 expressions.append(387 expressions.append(
364 BranchMergeProposal.queue_status.is_in(statuses))388 BranchMergeProposal.queue_status.is_in(statuses))
365 resultset = self.store.using(*tables).find(389 if include_with:
366 BranchMergeProposal, *expressions)390 store = self.store_with_with
391 else:
392 store = self.store
393 returned_obj = (
394 BranchMergeProposal.id if return_ids else BranchMergeProposal)
395 resultset = store.using(*tables).find(
396 returned_obj, *expressions)
367 if not eager_load:397 if not eager_load:
368 return resultset398 return resultset
369 else:399 else:
370 return DecoratedResultSet(resultset, pre_iter_hook=do_eager_load)400 return DecoratedResultSet(resultset, pre_iter_hook=do_eager_load)
371401
372 def _scopedGetMergeProposals(self, statuses):402 def _scopedGetMergeProposals(self, statuses, return_ids=False):
373 scope_tables = [Branch] + self._tables.values()403 scope_tables = [Branch] + self._tables.values()
374 scope_expressions = self._branch_filter_expressions404 scope_expressions = self._branch_filter_expressions
375 select = self.store.using(*scope_tables).find(405 select = self.store.using(*scope_tables).find(
@@ -392,21 +422,28 @@
392 if statuses is not None:422 if statuses is not None:
393 expressions.append(423 expressions.append(
394 BranchMergeProposal.queue_status.is_in(statuses))424 BranchMergeProposal.queue_status.is_in(statuses))
425 returned_obj = (
426 BranchMergeProposal.id if return_ids else BranchMergeProposal)
395 return self.store.with_(with_expr).using(*tables).find(427 return self.store.with_(with_expr).using(*tables).find(
396 BranchMergeProposal, *expressions)428 returned_obj, *expressions)
397429
398 def getMergeProposalsForPerson(self, person, status=None):430 def getMergeProposalsForPerson(self, person, status=None):
399 """See `IBranchCollection`."""431 """See `IBranchCollection`."""
400 # We want to limit the proposals to those where the source branch is432 # We want to limit the proposals to those where the source branch is
401 # limited by the defined collection.433 # limited by the defined collection.
402 owned = self.ownedBy(person).getMergeProposals(status)434 owned = self.ownedBy(person)._getMergeProposals(
403 reviewing = self.getMergeProposalsForReviewer(person, status)435 status, include_with=False, return_ids=True)
404 resultset = owned.union(reviewing)436 reviewing = self._getMergeProposalsForReviewer(
437 person, status, include_with=False, return_ids=True)
438 result = owned.union(reviewing)
439 resultset = self.store_with_with.using([BranchMergeProposal]).find(
440 BranchMergeProposal,
441 In(BranchMergeProposal.id, result._get_select()))
405442
406 def do_eager_load(rows):443 def do_eager_load(rows):
407 source_branches = load_related(Branch, rows, ['source_branchID'])444 source_branches = load_related(Branch, rows, ['source_branchID'])
408 # Cache person's data (registrants of the proposal and445 # Cache person's data (registrants of the proposal and
409 # owners of the source branches).446 # owners of the source branches).
410 person_ids = set().union(447 person_ids = set().union(
411 (proposal.registrantID for proposal in rows),448 (proposal.registrantID for proposal in rows),
412 (branch.ownerID for branch in source_branches))449 (branch.ownerID for branch in source_branches))
@@ -422,6 +459,10 @@
422459
423 def getMergeProposalsForReviewer(self, reviewer, status=None):460 def getMergeProposalsForReviewer(self, reviewer, status=None):
424 """See `IBranchCollection`."""461 """See `IBranchCollection`."""
462 return self._getMergeProposalsForReviewer(reviewer, status=status)
463
464 def _getMergeProposalsForReviewer(self, reviewer, status=None,
465 include_with=True, return_ids=False):
425 tables = [466 tables = [
426 BranchMergeProposal,467 BranchMergeProposal,
427 Join(CodeReviewVoteReference,468 Join(CodeReviewVoteReference,
@@ -441,8 +482,14 @@
441 if status is not None:482 if status is not None:
442 expressions.append(483 expressions.append(
443 BranchMergeProposal.queue_status.is_in(status))484 BranchMergeProposal.queue_status.is_in(status))
444 proposals = self.store.using(*tables).find(485 if include_with:
445 BranchMergeProposal, *expressions)486 store = self.store_with_with
487 else:
488 store = self.store
489 returned_obj = (
490 BranchMergeProposal.id if return_ids else BranchMergeProposal)
491 proposals = store.using(*tables).find(
492 returned_obj, *expressions)
446 # Apply sorting here as we can't do it in the browser code. We need493 # Apply sorting here as we can't do it in the browser code. We need
447 # to think carefully about the best places to do this, but not here494 # to think carefully about the best places to do this, but not here
448 # nor now.495 # nor now.
@@ -844,21 +891,29 @@
844 Branch.transitively_private == True)))891 Branch.transitively_private == True)))
845 return private_branches892 return private_branches
846893
894 def _addVisibleBranchesCTE(self):
895 """Add the 'visible_branches' CTE to self._with_dict.
896
897 """
898 public_branches = Branch.transitively_private == False
899 if self._private_branch_ids is not None:
900 branches = Or(
901 public_branches,
902 Branch.id.is_in(self._private_branch_ids))
903 else:
904 branches = public_branches
905 branches_select = self.store.using(
906 [Branch]).find(Branch.id, branches)._get_select()
907 self._with_dict.update({'visible_branches': branches_select})
908
847 def _getBranchVisibilityExpression(self, branch_class=Branch):909 def _getBranchVisibilityExpression(self, branch_class=Branch):
848 """Return the where clauses for visibility.910 """Return the where clauses for visibility.
849911
850 :param branch_class: The Branch class to use - permits using912 :param branch_class: The Branch class to use - permits using
851 ClassAliases.913 ClassAliases.
852 """914 """
853 public_branches = branch_class.transitively_private == False915 self._addVisibleBranchesCTE()
854 if self._private_branch_ids is None:916 return [branch_class.id.is_in(SQL("SELECT id FROM visible_branches"))]
855 # Public only.
856 return [public_branches]
857 else:
858 public_or_private = Or(
859 public_branches,
860 branch_class.id.is_in(self._private_branch_ids))
861 return [public_or_private]
862917
863 def _getCandidateBranchesWith(self):918 def _getCandidateBranchesWith(self):
864 """Return WITH clauses defining candidate branches.919 """Return WITH clauses defining candidate branches.