Merge lp:~wallyworld/launchpad/ppa-packages-timeout-1071581 into lp:launchpad

Proposed by Ian Booth on 2012-10-31
Status: Superseded
Proposed branch: lp:~wallyworld/launchpad/ppa-packages-timeout-1071581
Merge into: lp:launchpad
Diff against target: 448 lines (+228/-55)
7 files modified
database/schema/security.cfg (+3/-0)
lib/lp/registry/browser/person.py (+8/-8)
lib/lp/registry/model/person.py (+27/-47)
lib/lp/scripts/garbo.py (+108/-0)
lib/lp/soyuz/configure.zcml (+6/-0)
lib/lp/soyuz/interfaces/reporting.py (+25/-0)
lib/lp/soyuz/model/reporting.py (+51/-0)
To merge this branch: bzr merge lp:~wallyworld/launchpad/ppa-packages-timeout-1071581
Reviewer Review Type Date Requested Status
Curtis Hovey (community) code 2012-10-31 Needs Information on 2012-10-31
Review via email: mp+132236@code.launchpad.net

This proposal has been superseded by a proposal from 2012-11-02.

Commit Message

Improve the query used to find person related software

Description of the Change

== Pre Implementation ==

Discussed with wgrant.

== Implementation ==

So Postgres cannot efficiently execute the required DISTINCT ON query needed to only find the latest published packages. So we need to dumb down the query and perform the grouping in Python. The functionality is split into 2 parts:
1. Perform a query with the required WHERE conditions, iterate over the result in batches, and gather the required SourcePackageRelease ids.
2. Set up a result set using SourcePackageRelease.id IN (...) and return this one. Thus client batching and pre iteration and eager loading of only the batched records' information works as before.

New indexes on SourcePackageRelease are required - these can be added live, so are included in this mp. The first 2 are for the queries to find the spr ids to include, the last is for the query which loads the records.

== Tests ==

Internal change - use existing tests.

== Lint ==

Checking for conflicts and issues in changed files.

Linting changed files:
  database/schema/patch-2209-38-0.sql
  lib/lp/registry/browser/person.py
  lib/lp/registry/model/person.py

To post a comment you must log in.
Curtis Hovey (sinzui) wrote :

Line 123 looks inconsistent because the storm column does not use ID
    SourcePackagePublishingHistory.sourcepackagereleaseID

The while-loop looks odd. I don't like use the done variable, but I see you wanted something separate from max_results. I don't understand what happens when max_results is None or 0 -- they seem to lead to contradictory behaviour. I suppose 0 is really a value error for max_results. What happens when max_results is None and the the last batch is returns less than the batch size? eg. rs has only 15 more items, it iterates over them, then back to the start of the loop to get the next batch using a slice that exceeds everything in the rs.

review: Needs Information (code)
Ian Booth (wallyworld) wrote :

I am going to scrap this work as is - it turns out there will still be
performance issues for some users, those who will have many 1000s of
records to be displayed. I was of the assumption that the number of
records would normally be much smaller than that.

So the plan now is to introduce a new denormalised table, designed for
reporting, and maintain it with a frequent garbo job, since the data
does not have to be immediately current.

On Wed 31 Oct 2012 23:24:23 EST, Curtis Hovey wrote:
> Review: Needs Information code
>
> Line 123 looks inconsistent because the storm column does not use ID
> SourcePackagePublishingHistory.sourcepackagereleaseID
>
> The while-loop looks odd. I don't like use the done variable, but I see you wanted something separate from max_results. I don't understand what happens when max_results is None or 0 -- they seem to lead to contradictory behaviour. I suppose 0 is really a value error for max_results. What happens when max_results is None and the the last batch is returns less than the batch size? eg. rs has only 15 more items, it iterates over them, then back to the start of the loop to get the next batch using a slice that exceeds everything in the rs.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'database/schema/security.cfg'
2--- database/schema/security.cfg 2012-11-01 16:14:00 +0000
3+++ database/schema/security.cfg 2012-11-02 04:32:20 +0000
4@@ -217,6 +217,7 @@
5 public.karmatotalcache = SELECT, DELETE, UPDATE
6 public.language = SELECT
7 public.languagepack = SELECT, INSERT, UPDATE
8+public.latestpersonsourcepackagereleasecache = SELECT
9 public.launchpadstatistic = SELECT
10 public.libraryfilealias = SELECT, INSERT, UPDATE, DELETE
11 public.libraryfiledownloadcount = SELECT, INSERT, UPDATE
12@@ -2255,8 +2256,10 @@
13 public.codeimportresult = SELECT, DELETE
14 public.commercialsubscription = SELECT, UPDATE
15 public.emailaddress = SELECT, UPDATE, DELETE
16+public.garbojobstate = SELECT, INSERT, UPDATE, DELETE
17 public.hwsubmission = SELECT, UPDATE
18 public.job = SELECT, INSERT, DELETE
19+public.latestpersonsourcepackagereleasecache = SELECT, INSERT, UPDATE
20 public.logintoken = SELECT, DELETE
21 public.mailinglistsubscription = SELECT, DELETE
22 public.milestonetag = SELECT
23
24=== modified file 'lib/lp/registry/browser/person.py'
25--- lib/lp/registry/browser/person.py 2012-10-26 07:15:49 +0000
26+++ lib/lp/registry/browser/person.py 2012-11-02 04:32:20 +0000
27@@ -3654,17 +3654,17 @@
28 builds_by_package = {}
29 needs_build_by_package = {}
30 for package in package_releases:
31- builds_by_package[package] = []
32- needs_build_by_package[package] = False
33+ builds_by_package[package.id] = []
34+ needs_build_by_package[package.id] = False
35 for build in all_builds:
36 if build.status == BuildStatus.FAILEDTOBUILD:
37- builds_by_package[build.source_package_release].append(build)
38+ builds_by_package[build.source_package_release.id].append(build)
39 needs_build = build.status in [
40 BuildStatus.NEEDSBUILD,
41 BuildStatus.MANUALDEPWAIT,
42 BuildStatus.CHROOTWAIT,
43 ]
44- needs_build_by_package[build.source_package_release] = needs_build
45+ needs_build_by_package[build.source_package_release.id] = needs_build
46
47 return (builds_by_package, needs_build_by_package)
48
49@@ -3675,8 +3675,8 @@
50
51 return [
52 SourcePackageReleaseWithStats(
53- package, builds_by_package[package],
54- needs_build_by_package[package])
55+ package, builds_by_package[package.id],
56+ needs_build_by_package[package.id])
57 for package in package_releases]
58
59 def _addStatsToPublishings(self, publishings):
60@@ -3689,8 +3689,8 @@
61
62 return [
63 SourcePackagePublishingHistoryWithStats(
64- spph, builds_by_package[spph.sourcepackagerelease],
65- needs_build_by_package[spph.sourcepackagerelease])
66+ spph, builds_by_package[spph.sourcepackagerelease.id],
67+ needs_build_by_package[spph.sourcepackagerelease.id])
68 for spph in filtered_spphs]
69
70 def setUpBatch(self, packages):
71
72=== modified file 'lib/lp/registry/model/person.py'
73--- lib/lp/registry/model/person.py 2012-11-01 20:33:27 +0000
74+++ lib/lp/registry/model/person.py 2012-11-02 04:32:20 +0000
75@@ -327,6 +327,7 @@
76 Archive,
77 validate_ppa,
78 )
79+from lp.soyuz.model.reporting import LatestPersonSourcepackageReleaseCache
80 from lp.soyuz.model.publishing import SourcePackagePublishingHistory
81 from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
82 from lp.translations.model.hastranslationimports import (
83@@ -2810,8 +2811,8 @@
84 return self._latestReleasesQuery(uploader_only=True, ppa_only=True)
85
86 def _releasesQueryFilter(self, uploader_only=False, ppa_only=False):
87- """Return the filter used to find sourcepackagereleases (SPRs)
88- related to this person.
89+ """Return the filter used to find latest published source package
90+ releases (SPRs) related to this person.
91
92 :param uploader_only: controls if we are interested in SPRs where
93 the person in question is only the uploader (creator) and not the
94@@ -2827,24 +2828,27 @@
95 'uploader_only' because there shouldn't be any sense of maintainership
96 for packages uploaded to PPAs by someone else than the user himself.
97 """
98- clauses = [SourcePackageRelease.upload_archive == Archive.id]
99-
100+ clauses = []
101 if uploader_only:
102- clauses.append(SourcePackageRelease.creator == self)
103-
104+ clauses.append(
105+ LatestPersonSourcepackageReleaseCache.creator == self)
106 if ppa_only:
107 # Source maintainer is irrelevant for PPA uploads.
108 pass
109 elif uploader_only:
110- clauses.append(SourcePackageRelease.maintainer != self)
111+ clauses.append(
112+ LatestPersonSourcepackageReleaseCache.maintainer != self)
113 else:
114- clauses.append(SourcePackageRelease.maintainer == self)
115-
116+ clauses.append(
117+ LatestPersonSourcepackageReleaseCache.maintainer == self)
118 if ppa_only:
119- clauses.append(Archive.purpose == ArchivePurpose.PPA)
120+ clauses.append(
121+ LatestPersonSourcepackageReleaseCache.archive_purpose ==
122+ ArchivePurpose.PPA)
123 else:
124- clauses.append(Archive.purpose != ArchivePurpose.PPA)
125-
126+ clauses.append(
127+ LatestPersonSourcepackageReleaseCache.archive_purpose !=
128+ ArchivePurpose.PPA)
129 return clauses
130
131 def _hasReleasesQuery(self, uploader_only=False, ppa_only=False):
132@@ -2852,50 +2856,26 @@
133 See `_releasesQueryFilter` for details on the criteria used.
134 """
135 clauses = self._releasesQueryFilter(uploader_only, ppa_only)
136- spph = ClassAlias(SourcePackagePublishingHistory, "spph")
137- tables = (
138- SourcePackageRelease,
139- Join(
140- spph, spph.sourcepackagereleaseID == SourcePackageRelease.id),
141- Join(Archive, Archive.id == spph.archiveID))
142- rs = Store.of(self).using(*tables).find(
143- SourcePackageRelease.id, clauses)
144+ rs = Store.of(self).using(LatestPersonSourcepackageReleaseCache).find(
145+ LatestPersonSourcepackageReleaseCache.publication_id, clauses)
146 return not rs.is_empty()
147
148 def _latestReleasesQuery(self, uploader_only=False, ppa_only=False):
149- """Return the sourcepackagereleases (SPRs) related to this person.
150+ """Return the sourcepackagereleases records related to this person.
151 See `_releasesQueryFilter` for details on the criteria used."""
152 clauses = self._releasesQueryFilter(uploader_only, ppa_only)
153- spph = ClassAlias(SourcePackagePublishingHistory, "spph")
154 rs = Store.of(self).find(
155- SourcePackageRelease,
156- SourcePackageRelease.id.is_in(
157- Select(
158- SourcePackageRelease.id,
159- tables=[
160- SourcePackageRelease,
161- Join(
162- spph,
163- spph.sourcepackagereleaseID ==
164- SourcePackageRelease.id),
165- Join(Archive, Archive.id == spph.archiveID)],
166- where=And(*clauses),
167- order_by=[SourcePackageRelease.upload_distroseriesID,
168- SourcePackageRelease.sourcepackagenameID,
169- SourcePackageRelease.upload_archiveID,
170- Desc(SourcePackageRelease.dateuploaded)],
171- distinct=(
172- SourcePackageRelease.upload_distroseriesID,
173- SourcePackageRelease.sourcepackagenameID,
174- SourcePackageRelease.upload_archiveID)))
175- ).order_by(
176- Desc(SourcePackageRelease.dateuploaded), SourcePackageRelease.id)
177+ LatestPersonSourcepackageReleaseCache, *clauses).order_by(
178+ Desc(LatestPersonSourcepackageReleaseCache.dateuploaded))
179
180 def load_related_objects(rows):
181 list(getUtility(IPersonSet).getPrecachedPersonsFromIDs(
182- set(map(attrgetter("maintainerID"), rows))))
183- bulk.load_related(SourcePackageName, rows, ['sourcepackagenameID'])
184- bulk.load_related(Archive, rows, ['upload_archiveID'])
185+ set(map(attrgetter("maintainer_id"), rows))))
186+ bulk.load_related(
187+ SourcePackageName, rows, ['sourcepackagename_id'])
188+ bulk.load_related(
189+ SourcePackageRelease, rows, ['sourcepackagerelease_id'])
190+ bulk.load_related(Archive, rows, ['upload_archive_id'])
191
192 return DecoratedResultSet(rs, pre_iter_hook=load_related_objects)
193
194
195=== modified file 'lib/lp/scripts/garbo.py'
196--- lib/lp/scripts/garbo.py 2012-10-22 02:30:44 +0000
197+++ lib/lp/scripts/garbo.py 2012-11-02 04:32:20 +0000
198@@ -17,6 +17,7 @@
199 import logging
200 import multiprocessing
201 import os
202+import simplejson
203 import threading
204 import time
205
206@@ -28,11 +29,16 @@
207 from psycopg2 import IntegrityError
208 import pytz
209 from storm.expr import (
210+ Alias,
211+ And,
212+ Desc,
213 In,
214 Like,
215 Select,
216 Update,
217+ Join,
218 )
219+from storm.info import ClassAlias
220 from storm.locals import (
221 Max,
222 Min,
223@@ -105,6 +111,10 @@
224 )
225 from lp.services.session.model import SessionData
226 from lp.services.verification.model.logintoken import LoginToken
227+from lp.soyuz.model.archive import Archive
228+from lp.soyuz.model.publishing import SourcePackagePublishingHistory
229+from lp.soyuz.model.reporting import LatestPersonSourcepackageReleaseCache
230+from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
231 from lp.translations.interfaces.potemplate import IPOTemplateSet
232 from lp.translations.model.potmsgset import POTMsgSet
233 from lp.translations.model.potranslation import POTranslation
234@@ -423,6 +433,103 @@
235 transaction.commit()
236
237
238+class PopulateLatestPersonSourcepackageReleaseCache(TunableLoop):
239+ """Populate the LatestPersonSourcepackageReleaseCache table."""
240+ maximum_chunk_size = 5000
241+
242+ def __init__(self, log, abort_time=None):
243+ super_cl = super(PopulateLatestPersonSourcepackageReleaseCache, self)
244+ super_cl.__init__(log, abort_time)
245+ self.store = getUtility(IStoreSelector).get(MAIN_STORE, MASTER_FLAVOR)
246+ self.next_id = 0
247+ self.job_name = self.__class__.__name__
248+ job_data = self.store.execute(
249+ "SELECT json_data FROM GarboJobState WHERE name = '%s'"
250+ % self.job_name).get_one()
251+ if job_data:
252+ json_data = simplejson.loads(job_data[0])
253+ self.next_id = json_data['next_id']
254+ else:
255+ json_data = simplejson.dumps({'next_id': 0})
256+ self.store.execute(
257+ "INSERT INTO GarboJobState(name, json_data) "
258+ "VALUES ('%s', '%s')" % (self.job_name, json_data))
259+
260+ def getPendingUpdates(self):
261+ spph = ClassAlias(SourcePackagePublishingHistory, "spph")
262+ origin = [
263+ SourcePackageRelease,
264+ Join(
265+ spph,
266+ And(spph.sourcepackagereleaseID == SourcePackageRelease.id,
267+ spph.archiveID == SourcePackageRelease.upload_archiveID))]
268+ spr_select = self.store.using(*origin).find(
269+ (SourcePackageRelease.id, Alias(spph.id, 'spph_id')),
270+ SourcePackageRelease.upload_archiveID == spph.archiveID,
271+ SourcePackageRelease.id > self.next_id
272+ ).order_by(
273+ SourcePackageRelease.upload_distroseriesID,
274+ SourcePackageRelease.sourcepackagenameID,
275+ SourcePackageRelease.upload_archiveID,
276+ Desc(SourcePackageRelease.dateuploaded),
277+ SourcePackageRelease.id
278+ ).config(distinct=(
279+ SourcePackageRelease.upload_distroseriesID,
280+ SourcePackageRelease.sourcepackagenameID,
281+ SourcePackageRelease.upload_archiveID))._get_select()
282+
283+ spr = Alias(spr_select, 'spr')
284+ origin = [
285+ SourcePackageRelease,
286+ Join(spr, SQL('spr.id') == SourcePackageRelease.id),
287+ Join(Archive, Archive.id == SourcePackageRelease.upload_archiveID)]
288+ rs = self.store.using(*origin).find(
289+ (SourcePackageRelease.id,
290+ SourcePackageRelease.creatorID, SourcePackageRelease.maintainerID,
291+ SourcePackageRelease.upload_archiveID,
292+ Archive.purpose,
293+ SourcePackageRelease.sourcepackagenameID,
294+ SourcePackageRelease.upload_distroseriesID,
295+ SourcePackageRelease.dateuploaded, SQL('spph_id'))
296+ ).order_by(SourcePackageRelease.id)
297+ return rs
298+
299+ def isDone(self):
300+ return self.getPendingUpdates().count() == 0
301+
302+ def update_cache(self, spr_id, creator_id, maintainer_id, archive_id,
303+ purpose, spn_id, distroseries_id, dateuploaded, spph_id):
304+ lpr = LatestPersonSourcepackageReleaseCache()
305+ lpr.publication = spph_id
306+ lpr.upload_distroseries_id = distroseries_id
307+ lpr.archive_purpose = purpose
308+ lpr.creator_id = creator_id
309+ lpr.maintainer_id = maintainer_id
310+ lpr.sourcepackagename_id = spn_id
311+ lpr.sourcepackagerelease = spr_id
312+ lpr.upload_archive = archive_id
313+ lpr.dateuploaded = dateuploaded
314+ self.store.add(lpr)
315+
316+ def __call__(self, chunk_size):
317+ max_id = 0
318+ for (spr_id, creator_id, maintainer_id, archive_id, purpose,
319+ distroseries_id, spn_id, dateuploaded, spph_id) in (
320+ self.getPendingUpdates()[:chunk_size]):
321+ self.update_cache(
322+ spr_id, creator_id, maintainer_id, archive_id, purpose,
323+ distroseries_id, spn_id, dateuploaded, spph_id)
324+ max_id = spr_id
325+
326+ self.next_id = max_id
327+ self.store.flush()
328+ json_data = simplejson.dumps({'next_id': max_id})
329+ self.store.execute(
330+ "UPDATE GarboJobState SET json_data = '%s' WHERE name = '%s'"
331+ % (json_data, self.job_name))
332+ transaction.commit()
333+
334+
335 class OpenIDConsumerNoncePruner(TunableLoop):
336 """An ITunableLoop to prune old OpenIDConsumerNonce records.
337
338@@ -1339,6 +1446,7 @@
339 OpenIDConsumerAssociationPruner,
340 AntiqueSessionPruner,
341 VoucherRedeemer,
342+ PopulateLatestPersonSourcepackageReleaseCache
343 ]
344 experimental_tunable_loops = []
345
346
347=== modified file 'lib/lp/soyuz/configure.zcml'
348--- lib/lp/soyuz/configure.zcml 2012-09-28 14:48:20 +0000
349+++ lib/lp/soyuz/configure.zcml 2012-11-02 04:32:20 +0000
350@@ -1002,6 +1002,12 @@
351 <allow interface="lp.soyuz.adapters.overrides.IOverridePolicy" />
352 </class>
353
354+ <class
355+ class="lp.soyuz.model.reporting.LatestPersonSourcepackageReleaseCache">
356+ <allow
357+ interface="lp.soyuz.interfaces.reporting.ILatestPersonSourcepackageReleaseCache"/>
358+ </class>
359+
360 <!-- ProcessAcceptedBugsJobSource -->
361 <securedutility
362 component=".model.processacceptedbugsjob.ProcessAcceptedBugsJob"
363
364=== added file 'lib/lp/soyuz/interfaces/reporting.py'
365--- lib/lp/soyuz/interfaces/reporting.py 1970-01-01 00:00:00 +0000
366+++ lib/lp/soyuz/interfaces/reporting.py 2012-11-02 04:32:20 +0000
367@@ -0,0 +1,25 @@
368+# Copyright 2012 Canonical Ltd. This software is licensed under the
369+# GNU Affero General Public License version 3 (see the file LICENSE).
370+
371+__metaclass__ = type
372+__all__ = [
373+ 'ILatestPersonSourcepackageReleaseCache',
374+ ]
375+
376+
377+from zope.interface import Attribute
378+from lp.soyuz.interfaces.sourcepackagerelease import ISourcePackageRelease
379+
380+
381+class ILatestPersonSourcepackageReleaseCache(ISourcePackageRelease):
382+ """Published source package release information for a person.
383+
384+ The records represented by this object are the latest published source
385+ package releases for a given sourcepackage, distroseries, archive.
386+ """
387+
388+ id = Attribute("The id of the associated SourcePackageRelease.")
389+ sourcepackagerelease = Attribute(
390+ "The SourcePackageRelease which this object represents.")
391+ publication = Attribute(
392+ "The publication record for the associated SourcePackageRelease.")
393
394=== added file 'lib/lp/soyuz/model/reporting.py'
395--- lib/lp/soyuz/model/reporting.py 1970-01-01 00:00:00 +0000
396+++ lib/lp/soyuz/model/reporting.py 2012-11-02 04:32:20 +0000
397@@ -0,0 +1,51 @@
398+# Copyright 2012 Canonical Ltd. This software is licensed under the
399+# GNU Affero General Public License version 3 (see the file LICENSE).
400+
401+__metaclass__ = type
402+__all__ = [
403+ 'LatestPersonSourcepackageReleaseCache',
404+ ]
405+
406+from lazr.delegates import delegates
407+from storm.base import Storm
408+from storm.locals import (
409+ Int,
410+ Reference,
411+ )
412+from storm.properties import DateTime
413+from zope.interface import implements
414+
415+from lp.services.database.enumcol import EnumCol
416+from lp.soyuz.enums import ArchivePurpose
417+from lp.soyuz.interfaces.reporting import (
418+ ILatestPersonSourcepackageReleaseCache,
419+ )
420+from lp.soyuz.interfaces.sourcepackagerelease import ISourcePackageRelease
421+
422+
423+class LatestPersonSourcepackageReleaseCache(Storm):
424+ """See `LatestPersonSourcepackageReleaseCache`."""
425+ implements(ILatestPersonSourcepackageReleaseCache)
426+ delegates(ISourcePackageRelease, context='sourcepackagerelease')
427+
428+ __storm_table__ = 'LatestPersonSourcepackageReleaseCache'
429+
430+ id = Int(name='id', primary=True)
431+ publication_id = Int(name='publication')
432+ publication = Reference(
433+ publication_id, 'SourcePackagePublishingHistory.id')
434+ dateuploaded = DateTime(name='date_uploaded')
435+ creator_id = Int(name='creator')
436+ creator = Reference(creator_id, 'Person.id')
437+ maintainer_id = Int(name='maintainer')
438+ maintainer = Reference(maintainer_id, 'Person.id')
439+ upload_archive_id = Int(name='upload_archive')
440+ upload_archive = Reference(upload_archive_id, 'Archive.id')
441+ archive_purpose = EnumCol(schema=ArchivePurpose)
442+ upload_distroseries_id = Int(name='upload_distroseries')
443+ upload_distroseries = Reference(upload_distroseries_id, 'DistroSeries.id')
444+ sourcepackagename_id = Int(name='sourcepackagename')
445+ sourcepackagename = Reference(sourcepackagename_id, 'SourcePackageName.id')
446+ sourcepackagerelease_id = Int(name='sourcepackagerelease')
447+ sourcepackagerelease = Reference(
448+ sourcepackagerelease_id, 'SourcePackageRelease.id')