Merge lp:~wgrant/launchpad/package-search-timeouts into lp:launchpad

Proposed by William Grant
Status: Merged
Approved by: William Grant
Approved revision: no longer in the source branch.
Merged at revision: 15588
Proposed branch: lp:~wgrant/launchpad/package-search-timeouts
Merge into: lp:launchpad
Diff against target: 302 lines (+72/-68)
5 files modified
lib/lp/registry/doc/distribution.txt (+7/-7)
lib/lp/registry/model/distribution.py (+45/-41)
lib/lp/soyuz/browser/tests/distribution-views.txt (+1/-1)
lib/lp/soyuz/doc/package-cache-script.txt (+8/-8)
lib/lp/soyuz/doc/package-cache.txt (+11/-11)
To merge this branch: bzr merge lp:~wgrant/launchpad/package-search-timeouts
Reviewer Review Type Date Requested Status
Curtis Hovey (community) code Approve
Review via email: mp+109997@code.launchpad.net

Commit message

Optimise Distribution.searchSourcePackageCaches.

Description of the change

This branch reworks Distribution.searchSourcePackageCaches to be sub-200ms on DF. It relies on the indices in lp:~wgrant/launchpad/dspc-trgm-indices for optimal speed.

The name ILIKE '%foo%' changes to a LIKE, since DSPC.name is always lowercase -- this is indexable by the new GIN trigram index. SPPH.archive is now constrained, to be more correct and to let it use the new index. SPPH.sourcepackagename now also joins directly onto DSPC.sourcepackagename, rather than indirecting through SPR and SPN.

To post a comment you must log in.
Revision history for this message
Curtis Hovey (sinzui) wrote :

Thank you.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/registry/doc/distribution.txt'
2--- lib/lp/registry/doc/distribution.txt 2012-04-10 14:01:17 +0000
3+++ lib/lp/registry/doc/distribution.txt 2012-07-09 23:04:32 +0000
4@@ -182,7 +182,7 @@
5 The distribution also allows you to look for source packages that match
6 a certain string through the magic of fti. For instance:
7
8- >>> packages = ubuntu.searchSourcePackageCaches("mozilla")
9+ >>> packages = ubuntu.searchSourcePackageCaches(u"mozilla")
10 >>> for distro_source_package_cache, source_name, rank in packages:
11 ... print "%-17s rank:%s" % (
12 ... distro_source_package_cache.name,
13@@ -192,10 +192,10 @@
14 The search also matches on exact package names which fti doesn't like,
15 and even on substrings:
16
17- >>> packages = ubuntu.searchSourcePackageCaches("linux-source-2.6.15")
18+ >>> packages = ubuntu.searchSourcePackageCaches(u"linux-source-2.6.15")
19 >>> print packages.count()
20 1
21- >>> packages = ubuntu.searchSourcePackageCaches('a')
22+ >>> packages = ubuntu.searchSourcePackageCaches(u'a')
23 >>> for distro_source_package_cache, source_name, rank in packages:
24 ... print "%s: %-17s rank:%s" % (
25 ... distro_source_package_cache.__class__.__name__,
26@@ -210,7 +210,7 @@
27 The searchSourcePackages() method just returns a decorated version
28 of the results from searchSourcePackageCaches():
29
30- >>> packages = ubuntu.searchSourcePackages('a')
31+ >>> packages = ubuntu.searchSourcePackages(u'a')
32 >>> for dsp in packages:
33 ... print "%s: %s" % (dsp.__class__.__name__, dsp.name)
34 DistributionSourcePackage: alsa-utils
35@@ -224,13 +224,13 @@
36 the results based on whether the source package has an entry
37 in the Packaging table linking it to an upstream project.
38
39- >>> packages = ubuntu.searchSourcePackages('a', has_packaging=True)
40+ >>> packages = ubuntu.searchSourcePackages(u'a', has_packaging=True)
41 >>> for dsp in packages:
42 ... print "%s: %s" % (dsp.__class__.__name__, dsp.name)
43 DistributionSourcePackage: alsa-utils
44 DistributionSourcePackage: mozilla-firefox
45 DistributionSourcePackage: netapplet
46- >>> packages = ubuntu.searchSourcePackages('a', has_packaging=False)
47+ >>> packages = ubuntu.searchSourcePackages(u'a', has_packaging=False)
48 >>> for dsp in packages:
49 ... print "%s: %s" % (dsp.__class__.__name__, dsp.name)
50 DistributionSourcePackage: commercialpackage
51@@ -242,7 +242,7 @@
52 SourcePackagePublishingHistory table for the given distroseries.
53
54 >>> packages = ubuntu.searchSourcePackages(
55- ... 'a', publishing_distroseries=ubuntu.currentseries)
56+ ... u'a', publishing_distroseries=ubuntu.currentseries)
57 >>> for dsp in packages:
58 ... print "%s: %s" % (dsp.__class__.__name__, dsp.name)
59 DistributionSourcePackage: alsa-utils
60
61=== modified file 'lib/lp/registry/model/distribution.py'
62--- lib/lp/registry/model/distribution.py 2012-07-05 08:52:03 +0000
63+++ lib/lp/registry/model/distribution.py 2012-07-09 23:04:32 +0000
64@@ -23,9 +23,12 @@
65 from storm.expr import (
66 And,
67 Desc,
68+ Exists,
69 Join,
70 Max,
71+ Not,
72 Or,
73+ Select,
74 SQL,
75 )
76 from storm.info import ClassAlias
77@@ -1082,6 +1085,7 @@
78 def searchSourcePackageCaches(
79 self, text, has_packaging=None, publishing_distroseries=None):
80 """See `IDistribution`."""
81+ from lp.registry.model.packaging import Packaging
82 from lp.soyuz.model.distributionsourcepackagecache import (
83 DistributionSourcePackageCache,
84 )
85@@ -1093,58 +1097,58 @@
86 find_spec = (
87 DistributionSourcePackageCache,
88 SourcePackageName,
89- SQL('rank(fti, ftq(%s)) AS rank' % sqlvalues(text)),
90+ SQL('rank(fti, ftq(?)) AS rank', params=(text,)),
91 )
92 origin = [
93 DistributionSourcePackageCache,
94 Join(
95 SourcePackageName,
96 DistributionSourcePackageCache.sourcepackagename ==
97- SourcePackageName.id,
98+ SourcePackageName.id),
99+ ]
100+
101+ # quote_like SQL-escapes the string in addition to LIKE-escaping
102+ # it, so we can't use params=. So we need to double-escape the %
103+ # on either side of the string: once to survive the formatting
104+ # here, and once to survive Storm's formatting during
105+ # compilation. Storm should really %-escape literal SQL strings,
106+ # but it doesn't.
107+ conditions = [
108+ DistributionSourcePackageCache.distribution == self,
109+ DistributionSourcePackageCache.archiveID.is_in(
110+ self.all_distro_archive_ids),
111+ Or(
112+ SQL("DistributionSourcePackageCache.fti @@ ftq(?)",
113+ params=(text,)),
114+ SQL("DistributionSourcePackageCache.name "
115+ "LIKE '%%%%' || %s || '%%%%'" % quote_like(text.lower())),
116 ),
117 ]
118
119- publishing_condition = ''
120+ if has_packaging is not None:
121+ packaging_query = Exists(Select(
122+ 1, tables=[Packaging],
123+ where=(Packaging.sourcepackagenameID == SourcePackageName.id)))
124+ if has_packaging is False:
125+ packaging_query = Not(packaging_query)
126+ conditions.append(packaging_query)
127+
128 if publishing_distroseries is not None:
129- origin += [
130- Join(SourcePackageRelease,
131- SourcePackageRelease.sourcepackagename ==
132- SourcePackageName.id),
133- Join(SourcePackagePublishingHistory,
134- SourcePackagePublishingHistory.sourcepackagerelease ==
135- SourcePackageRelease.id),
136- ]
137- publishing_condition = (
138- "AND SourcePackagePublishingHistory.distroseries = %d"
139- % publishing_distroseries.id)
140-
141- packaging_query = """
142- SELECT 1
143- FROM Packaging
144- WHERE Packaging.sourcepackagename = SourcePackageName.id
145- """
146- has_packaging_condition = ''
147- if has_packaging is True:
148- has_packaging_condition = 'AND EXISTS (%s)' % packaging_query
149- elif has_packaging is False:
150- has_packaging_condition = 'AND NOT EXISTS (%s)' % packaging_query
151-
152- # Note: When attempting to convert the query below into straight
153- # Storm expressions, a 'tuple index out-of-range' error was always
154- # raised.
155- condition = """
156- DistributionSourcePackageCache.distribution = %s AND
157- DistributionSourcePackageCache.archive IN %s AND
158- (DistributionSourcePackageCache.fti @@ ftq(%s) OR
159- DistributionSourcePackageCache.name ILIKE '%%' || %s || '%%')
160- %s
161- %s
162- """ % (quote(self), quote(self.all_distro_archive_ids),
163- quote(text), quote_like(text), has_packaging_condition,
164- publishing_condition)
165+ origin.append(
166+ Join(
167+ SourcePackagePublishingHistory,
168+ SourcePackagePublishingHistory.sourcepackagenameID ==
169+ DistributionSourcePackageCache.sourcepackagenameID))
170+ conditions.extend([
171+ SourcePackagePublishingHistory.distroseries ==
172+ publishing_distroseries,
173+ SourcePackagePublishingHistory.archiveID.is_in(
174+ self.all_distro_archive_ids),
175+ ])
176+
177 dsp_caches_with_ranks = store.using(*origin).find(
178- find_spec, condition).order_by(
179- 'rank DESC, DistributionSourcePackageCache.name')
180+ find_spec, *conditions).order_by(
181+ Desc(SQL('rank')), DistributionSourcePackageCache.name)
182 dsp_caches_with_ranks.config(distinct=True)
183 return dsp_caches_with_ranks
184
185
186=== modified file 'lib/lp/soyuz/browser/tests/distribution-views.txt'
187--- lib/lp/soyuz/browser/tests/distribution-views.txt 2012-06-05 11:42:04 +0000
188+++ lib/lp/soyuz/browser/tests/distribution-views.txt 2012-07-09 23:04:32 +0000
189@@ -87,7 +87,7 @@
190
191 >>> distro_pkg_search_view = create_initialized_view(
192 ... ubuntu, name="+search", form={
193- ... 'text': 'a',
194+ ... 'text': u'a',
195 ... 'search_type': 'source'
196 ... })
197 >>> distro_pkg_search_view.search_by_binary_name
198
199=== modified file 'lib/lp/soyuz/doc/package-cache-script.txt'
200--- lib/lp/soyuz/doc/package-cache-script.txt 2012-05-03 20:26:29 +0000
201+++ lib/lp/soyuz/doc/package-cache-script.txt 2012-07-09 23:04:32 +0000
202@@ -19,16 +19,16 @@
203 'cdrkit' source and binary are published but it's not present in
204 cache:
205
206- >>> ubuntu.searchSourcePackages('cdrkit').count()
207+ >>> ubuntu.searchSourcePackages(u'cdrkit').count()
208 0
209- >>> warty.searchPackages('cdrkit').count()
210+ >>> warty.searchPackages(u'cdrkit').count()
211 0
212
213 'foobar' source and binary are removed but still present in cache:
214
215- >>> ubuntu.searchSourcePackages('foobar').count()
216+ >>> ubuntu.searchSourcePackages(u'foobar').count()
217 1
218- >>> warty.searchPackages('foobar').count()
219+ >>> warty.searchPackages(u'foobar').count()
220 1
221
222 Normal operation produces INFO level information about the
223@@ -62,14 +62,14 @@
224
225 Now, search results are up to date:
226
227- >>> ubuntu.searchSourcePackages('cdrkit').count()
228+ >>> ubuntu.searchSourcePackages(u'cdrkit').count()
229 1
230- >>> warty.searchPackages('cdrkit').count()
231+ >>> warty.searchPackages(u'cdrkit').count()
232 1
233
234- >>> ubuntu.searchSourcePackages('foobar').count()
235+ >>> ubuntu.searchSourcePackages(u'foobar').count()
236 0
237- >>> warty.searchPackages('foobar').count()
238+ >>> warty.searchPackages(u'foobar').count()
239 0
240
241 Explicitly mark the database as dirty so that it is cleaned (see bug 994158).
242
243=== modified file 'lib/lp/soyuz/doc/package-cache.txt'
244--- lib/lp/soyuz/doc/package-cache.txt 2012-01-24 12:36:15 +0000
245+++ lib/lp/soyuz/doc/package-cache.txt 2012-07-09 23:04:32 +0000
246@@ -81,13 +81,13 @@
247 Building these caches we can reach good performance on full and partial
248 term searching.
249
250- >>> ubuntu.searchSourcePackages('mozilla').count()
251- 1
252-
253- >>> ubuntu.searchSourcePackages('moz').count()
254- 1
255-
256- >>> ubuntu.searchSourcePackages('biscoito').count()
257+ >>> ubuntu.searchSourcePackages(u'mozilla').count()
258+ 1
259+
260+ >>> ubuntu.searchSourcePackages(u'moz').count()
261+ 1
262+
263+ >>> ubuntu.searchSourcePackages(u'biscoito').count()
264 0
265
266 The cache update procedure is done by cronscripts/update-pkgcache.py,
267@@ -109,7 +109,7 @@
268 >>> foobar_pub.status.name
269 'DELETED'
270
271- >>> ubuntu.searchSourcePackages('foobar').count()
272+ >>> ubuntu.searchSourcePackages(u'foobar').count()
273 1
274
275 >>> foobar_cache = DistributionSourcePackageCache.selectOneBy(
276@@ -131,7 +131,7 @@
277 >>> import transaction
278 >>> transaction.commit()
279
280- >>> ubuntu.searchSourcePackages('foobar').count()
281+ >>> ubuntu.searchSourcePackages(u'foobar').count()
282 0
283
284 >>> foobar_cache = DistributionSourcePackageCache.selectOneBy(
285@@ -149,7 +149,7 @@
286 >>> cdrkit_pub.status.name
287 'PUBLISHED'
288
289- >>> ubuntu.searchSourcePackages('cdrkit').count()
290+ >>> ubuntu.searchSourcePackages(u'cdrkit').count()
291 0
292
293 >>> cdrkit_cache = DistributionSourcePackageCache.selectOneBy(
294@@ -176,7 +176,7 @@
295 Now we see that the 'cdrkit' source is part of the caches and can be
296 reached via searches:
297
298- >>> ubuntu.searchSourcePackages('cdrkit').count()
299+ >>> ubuntu.searchSourcePackages(u'cdrkit').count()
300 1
301
302 >>> cdrkit_cache = DistributionSourcePackageCache.selectOneBy(