Merge lp:~rvb/launchpad/activereviews-bug-867941 into lp:launchpad

Proposed by Raphaël Badin
Status: Merged
Approved by: Raphaël Badin
Approved revision: no longer in the source branch.
Merged at revision: 14330
Proposed branch: lp:~rvb/launchpad/activereviews-bug-867941
Merge into: lp:launchpad
Diff against target: 124 lines (+79/-1)
2 files modified
lib/lp/code/browser/tests/test_branchmergeproposallisting.py (+60/-0)
lib/lp/code/model/branchcollection.py (+19/-1)
To merge this branch: bzr merge lp:~rvb/launchpad/activereviews-bug-867941
Reviewer Review Type Date Requested Status
Gavin Panella (community) Approve
Review via email: mp+82375@code.launchpad.net

Commit message

[incr] [r=allenap][bug=867941] Eager load stuff to improve the performance of a person's activereviews page.

Description of the change

This branch addresses one of the problems of code.lp.net/~user/+activereviews: the huge number of queries issued by this page (1000+ for pages like https://code.launchpad.net/~ubuntu-branches/+activereviews). It does this by eager loading the relevant data (the related branches & users mainly).

= Tests =

./bin/test -vvc test_branchmergeproposallisting test_activereviews_query_count

= Q/A =

This page issues a lot of queries:
qastaging: [678+, 807+, 1068+, 939+, 1056+, 1140+]
prod: [1405, 849+, 777+, 1034+, 639+, 1382+]
(a '+' next to the number means that the whole page timed out and thus the number of queries is not accurate)

To post a comment you must log in.
Revision history for this message
Raphaël Badin (rvb) wrote :

Something weird is going on here; if I change:

recorder1 = self.createBMPsAndMeasureActiveReviewsPageRendering(3)
recorder2 = self.createBMPsAndMeasureActiveReviewsPageRendering(7)

to:

recorder1 = self.createBMPsAndMeasureActiveReviewsPageRendering(3)
recorder2 = self.createBMPsAndMeasureActiveReviewsPageRendering(70)

Then the number of queries does not match. The extra queries are the one needed to get branch.{_associatedProductSeries, _associatedSuiteSourcePackages} which is exactly what _preloadDataForBranches populates.

This needs investigation…

Revision history for this message
Gavin Panella (allenap) wrote :

Nice. However, I think [2] must be resolved, so Needs Fixing. [3] is
just a suggestion which you're welcome to ignore.

[1]

+ transaction.commit()

I don't think this is needed.

[2]

+ with StormStatementRecorder() as recorder:
+ view = create_initialized_view(
+ user, name='+activereviews', rootsite='code',
+ principal=user)
+ view.render()

I discovered (as discussed on IRC) that calling view.initialize()
twice causes the test to pass with number_of_bmps values of >= 14. I
did it by changing view.render() to view(). I have not a clue why this
makes a difference, but it seems like a promising avenue for discovery.
I think this issue must be resolved before landing this branch.

[3]

+ def test_activereviews_query_count(self):
+ recorder1 = self.createBMPsAndMeasureActiveReviewsPageRendering(3)
+ recorder2 = self.createBMPsAndMeasureActiveReviewsPageRendering(7)
+ self.assertThat(recorder2, HasQueryCount(Equals(recorder1.count)))

While trying to debug [2] I noticed that recorder1's statements are
not reported when it fails.

I added the following line to solve that:

    self.addDetail("r1tb", Content(UTF8_TEXT, lambda: [str(recorder1)]))

This also has the advantage that tracebacks recorded are also shown. I
think _MismatchedQueryCount do something similar. Its get_details()
method could be changed to:

    def get_details(self):
        return {
            'queries': Content(
                UTF8_TEXT, lambda: [
                    unicode(self.query_collector).encode("utf8")]),
            }

[4]

+ # Load the registrants' data.
+ load_related(Person, rows, ['registrantID'])
+ # Cache registrants' info (Person and ValidPersonCache).
+ list(self.store.find(
+ (Person, ValidPersonCache),
+ ValidPersonCache.id == Person.id,
+ Person.id.is_in([proposal.registrantID for proposal in rows]),
+ ))
[...]
+ load_related(Person, source_branches, ['ownerID'])

Consider using IPersonSet.getPrecachedPersonsFromIDs() instead here:

    person_ids = set().union(
        (proposal.registrantID for proposal in rows),
        (branch.ownerID for branch in source_branches))
    getUtility(IPersonSet).getPrecachedPersonsFromIDs(
        person_ids, need_validity=True, ...)

review: Needs Fixing
Revision history for this message
Raphaël Badin (rvb) wrote :

> [1]

Right.

> [2]
>
> + with StormStatementRecorder() as recorder:
> + view = create_initialized_view(
> + user, name='+activereviews', rootsite='code',
> + principal=user)
> + view.render()
>
> I discovered (as discussed on IRC) that calling view.initialize()
> twice causes the test to pass with number_of_bmps values of >= 14. I
> did it by changing view.render() to view(). I have not a clue why this
> makes a difference, but it seems like a promising avenue for discovery.
> I think this issue must be resolved before landing this branch.

Damn!

> [3]
>
> + def test_activereviews_query_count(self):
> + recorder1 = self.createBMPsAndMeasureActiveReviewsPageRendering(3)
> + recorder2 = self.createBMPsAndMeasureActiveReviewsPageRendering(7)
> + self.assertThat(recorder2, HasQueryCount(Equals(recorder1.count)))
>
> While trying to debug [2] I noticed that recorder1's statements are
> not reported when it fails.
>
> I added the following line to solve that:
>
> self.addDetail("r1tb", Content(UTF8_TEXT, lambda: [str(recorder1)]))

I've added that.

> [4]

One less query \o/.

Revision history for this message
Raphaël Badin (rvb) wrote :

= A description of the problem =

The cache for branch._associatedSuiteSourcePackages is populated by _preloadDataForBranches but the cache disappears somehow.

My only idea was to think that maybe there is a cache maximum set somewhere but I can't find one and Gavin says he does not know about such a maximum.

= Queries issued when separating source/target branches =

I tried separating the queries issued to populate the source and the target branches:
self._preloadDataForBranches(target_branches + source_branches) => self._preloadDataForBranches(source_branches) self._preloadDataForBranches(target_branches)

These are the queries for SeriesSourcePackageBranch issued in this case:
http://paste.ubuntu.com/740270/

We can see that all the data is pre fetched and then some of the data is fetched again one branch at a time as if the cache for _associatedSuiteSourcePackages has been cleared; debugging confirms this.

= Calling initialize twice =

Gavin found out that calling initialize twice "fixes" the problem.

These are the queries for SeriesSourcePackageBranch issued in this case:
http://paste.ubuntu.com/740275/

Revision history for this message
Raphaël Badin (rvb) wrote :

Ah ah, thanks to jtv's expertise I've found that the problem was indeed a cache size problem. This fixes the problem:

=== modified file 'configs/development/launchpad-lazr.conf'
--- configs/development/launchpad-lazr.conf 2011-10-18 09:25:12 +0000
+++ configs/development/launchpad-lazr.conf 2011-11-17 09:02:41 +0000
@@ -157,7 +157,7 @@
 geoip_database: /usr/share/GeoIP/GeoLiteCity.dat
 geonames_identity: lpdev
 storm_cache: generational
-storm_cache_size: 100
+storm_cache_size: 10000

So I'll keep the numbers of bmp created small to have the test pass even with the default storm_cache_size.

Revision history for this message
Gavin Panella (allenap) wrote :

Lovely :)

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/code/browser/tests/test_branchmergeproposallisting.py'
2--- lib/lp/code/browser/tests/test_branchmergeproposallisting.py 2011-08-26 06:33:22 +0000
3+++ lib/lp/code/browser/tests/test_branchmergeproposallisting.py 2011-11-17 12:35:25 +0000
4@@ -8,9 +8,13 @@
5 from datetime import datetime
6
7 import pytz
8+from testtools.content import Content
9+from testtools.content_type import UTF8_TEXT
10+from testtools.matchers import Equals
11 import transaction
12 from zope.security.proxy import removeSecurityProxy
13
14+from canonical.database.sqlbase import flush_database_caches
15 from canonical.launchpad.webapp.servers import LaunchpadTestRequest
16 from canonical.testing.layers import DatabaseFunctionalLayer
17 from lp.code.browser.branchmergeproposallisting import (
18@@ -29,8 +33,10 @@
19 login,
20 login_person,
21 person_logged_in,
22+ StormStatementRecorder,
23 TestCaseWithFactory,
24 )
25+from lp.testing.matchers import HasQueryCount
26 from lp.testing.views import create_initialized_view
27
28
29@@ -377,3 +383,57 @@
30 view = create_initialized_view(
31 branch, name='+activereviews', rootsite='code')
32 self.assertEqual([mp], list(view.getProposals()))
33+
34+
35+class PersonActiveReviewsPerformance(TestCaseWithFactory):
36+ """Test the performance of the person's active reviews page."""
37+
38+ layer = DatabaseFunctionalLayer
39+
40+ def createBMP(self, reviewer=None, target_branch_owner=None):
41+ target_branch = None
42+ if target_branch_owner is not None:
43+ target_branch = self.factory.makeProductBranch(
44+ owner=target_branch_owner)
45+ bmp = self.factory.makeBranchMergeProposal(
46+ reviewer=reviewer,
47+ target_branch=target_branch)
48+ login_person(bmp.source_branch.owner)
49+ bmp.requestReview()
50+ return bmp
51+
52+ def createBMPsAndRecordQueries(self, number_of_bmps):
53+ # Create {number_of_bmps} branch merge proposals related to a
54+ # user, render the person's +activereviews page and return the
55+ # view and a recorder of the queries generated by this page
56+ # rendering.
57+ user = self.factory.makePerson()
58+ for i in xrange(number_of_bmps):
59+ # Create one of the two types of BMP which will be displayed
60+ # on a person's +activereviews page:
61+ # - A BMP for which the person is the reviewer.
62+ # - A BMP for which the person is the owner of the target
63+ # branch.
64+ if i % 2 == 0:
65+ self.createBMP(target_branch_owner=user)
66+ else:
67+ self.createBMP(reviewer=user)
68+ login_person(user)
69+ flush_database_caches()
70+ with StormStatementRecorder() as recorder:
71+ view = create_initialized_view(
72+ user, name='+activereviews', rootsite='code',
73+ principal=user)
74+ view.render()
75+ return recorder, view
76+
77+ def test_activereviews_query_count(self):
78+ # Note that we keep the number of bmps created small (3 and 7)
79+ # so that the all the per-cached objects will fit into the cache
80+ # used in tests (storm_cache_size: 100).
81+ recorder1, view1 = self.createBMPsAndRecordQueries(3)
82+ self.assertEqual(3, view1.proposal_count)
83+ self.addDetail("r1tb", Content(UTF8_TEXT, lambda: [str(recorder1)]))
84+ recorder2, view2 = self.createBMPsAndRecordQueries(7)
85+ self.assertEqual(7, view2.proposal_count)
86+ self.assertThat(recorder2, HasQueryCount(Equals(recorder1.count)))
87
88=== modified file 'lib/lp/code/model/branchcollection.py'
89--- lib/lp/code/model/branchcollection.py 2011-11-01 08:17:57 +0000
90+++ lib/lp/code/model/branchcollection.py 2011-11-17 12:35:25 +0000
91@@ -69,6 +69,7 @@
92 PreviewDiff,
93 )
94 from lp.code.model.seriessourcepackagebranch import SeriesSourcePackageBranch
95+from lp.registry.interfaces.person import IPersonSet
96 from lp.registry.model.distribution import Distribution
97 from lp.registry.model.distroseries import DistroSeries
98 from lp.registry.model.person import (
99@@ -400,7 +401,24 @@
100 # limited by the defined collection.
101 owned = self.ownedBy(person).getMergeProposals(status)
102 reviewing = self.getMergeProposalsForReviewer(person, status)
103- return owned.union(reviewing)
104+ resultset = owned.union(reviewing)
105+
106+ def do_eager_load(rows):
107+ source_branches = load_related(Branch, rows, ['source_branchID'])
108+ # Cache person's data (registrants of the proposal and
109+ # owners of the source branches).
110+ person_ids = set().union(
111+ (proposal.registrantID for proposal in rows),
112+ (branch.ownerID for branch in source_branches))
113+ list(getUtility(IPersonSet).getPrecachedPersonsFromIDs(
114+ person_ids, need_validity=True))
115+ # Load the source/target branches and preload the data for
116+ # these branches.
117+ target_branches = load_related(Branch, rows, ['target_branchID'])
118+ self._preloadDataForBranches(target_branches + source_branches)
119+ load_related(Product, target_branches, ['productID'])
120+
121+ return DecoratedResultSet(resultset, pre_iter_hook=do_eager_load)
122
123 def getMergeProposalsForReviewer(self, reviewer, status=None):
124 """See `IBranchCollection`."""