Merge lp:~edwin-grubbs/launchpad/bug-490659-series-timeout into lp:launchpad/db-devel
| Status: | Merged |
|---|---|
| Approved by: | Stuart Bishop on 2010-08-23 |
| Approved revision: | no longer in the source branch. |
| Merged at revision: | 9696 |
| Proposed branch: | lp:~edwin-grubbs/launchpad/bug-490659-series-timeout |
| Merge into: | lp:launchpad/db-devel |
| Diff against target: |
429 lines (+135/-62) 17 files modified
database/schema/patch-2208-01-0.sql (+9/-0) database/schema/security.cfg (+1/-0) database/schema/trusted.sql (+27/-0) lib/canonical/launchpad/webapp/sorting.py (+3/-2) lib/lp/registry/browser/__init__.py (+1/-1) lib/lp/registry/browser/configure.zcml (+7/-0) lib/lp/registry/browser/productseries.py (+14/-0) lib/lp/registry/browser/tests/productseries-views.txt (+1/-1) lib/lp/registry/browser/tests/test_milestone.py (+1/-1) lib/lp/registry/interfaces/milestone.py (+2/-0) lib/lp/registry/model/milestone.py (+9/-2) lib/lp/registry/model/projectgroup.py (+17/-5) lib/lp/registry/stories/productseries/xx-productseries-series.txt (+2/-2) lib/lp/registry/templates/object-milestones.pt (+2/-2) lib/lp/registry/templates/productseries-detailed-display.pt (+38/-0) lib/lp/registry/templates/productseries-macros.pt (+0/-42) lib/lp/registry/templates/productseries-status.pt (+1/-4) |
| To merge this branch: | bzr merge lp:~edwin-grubbs/launchpad/bug-490659-series-timeout |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Stuart Bishop | db | 2010-08-13 | Approve on 2010-08-23 |
| Robert Collins (community) | 2010-08-13 | Approve on 2010-08-20 | |
| Brad Crittenden (community) | code | 2010-08-20 | Approve on 2010-08-20 |
|
Review via email:
|
|||
Description of the Change
Summary
-------
Bug 490659 reports a timeout caused by a project with over 700 releases.
The ProductSeries' milestones and releases attributes both used a python
function to expand the version numbers in the milestone name so that
milestone version numbers such as 1.1, 1.10, and 1.100 would sort
correctly. To avoid loading all 700 releases from the database just to
sort them in python to find the most recent ones, I added the
milestone_
function, so that the sorting can take place in the db. I also added an
index to the Milestone table using this function.
I have never submitted a db function for review before, so I'm sure that
I'm doing something wrong. For some reason, the configuration I added to
security.cfg doesn't seem to have any effect, so I have to run the
following sql on launchpad_
GRANT EXECUTE ON FUNCTION milestone_
Implementation details
-------
Added milestone_
database/
database/
Added __len__ to DecoratedResultSet, so that code expecting the model to
return a list won't blow up on the result set.
lib/
lib/
Changed the productseries detailed-display macro into its own page since
it makes it cleaner to limit the number of milestones and releases in
the view attributes.
lib/
lib/
lib/
lib/
lib/
Converted HasMilestones' milestones and releases attributes to return a
DecoratedResultSet instead of alist.
lib/
lib/
Tests
-----
./bin/test -vv -t 'xx-productseri
Demo and Q/A
------------
Create a bunch of sample milestones and releases on launchpad.dev:
INSERT INTO Milestone (name, product, productseries)
SELECT 'a' || i, (select id from product where name = 'bzr'),
(select id from productseries where name='trunk' and product = (
FROM generate_series(1, 100) as i;
INSERT INTO Milestone (name, product, productseries, active)
SELECT
'rel-' || i,
(select id from product where name = 'bzr'),
(select id from productseries where name='trunk' and product = (
FALSE
FROM generate_series(1, 100) as i;
INSERT INTO ProductRelease (owner, milestone, datereleased)
SELECT 1, id, now()
FROM Milestone
WHERE name ~ 'rel-.*';
* Open http://
* Only 12 milestones and 12 releases should be listed in the gray box
for trunk.
| Stuart Bishop (stub) wrote : | # |
The function and a database comment should be added directly to trusted.sql. You will still need the database patch to add the index. I'll allocate a number shortly.
The function should return """ '%s %s' % (str(date_
You are importing plpy but no longer using it.
I agree with Robert on __len__ - it is deliberately left out to avoid it accidentally being called, as it can be very expensive. In particular, Python invoked __len__ if it exists when casting something to a list, doubling the amount of database time materializing a result set (we reported this as a Python bug - not sure if it is still open).
| Stuart Bishop (stub) wrote : | # |
patch-2208-01-0.sql
| Edwin Grubbs (edwin-grubbs) wrote : | # |
I got rid of __len__ on DecoratedResultSet. Instead, I added IHasMilestones.
I made the changes to to the db function, and I discovered why my configuration in security.cfg wasn't granting execute to public. It expects me to specify "timestamp without time zone" instead of just "timestamp". A GRANT statement with just "timestamp" works, so it must just be the introspection that security.py does to figure out whether it is dealing with a function or a table.
I have included a new full diff below, since it is less confusing than the incremental diff and not much bigger.
-Edwin
=== added file 'database/
--- database/
+++ database/
@@ -0,0 +1,9 @@
+-- Copyright 2010 Canonical Ltd. This software is licensed under the
+-- GNU Affero General Public License version 3 (see the file LICENSE).
+SET client_
+
+CREATE INDEX milestone_
+ON Milestone
+USING btree (milestone_
+
+INSERT INTO LaunchpadDataba
=== modified file 'database/
--- database/
+++ database/
@@ -17,6 +17,7 @@
public.
public.
public.
+public.
public.
public.
public.
=== modified file 'database/
--- database/
+++ database/
@@ -1705,3 +1705,30 @@
return int(total_heat)
$$;
+
+-- This function is not STRICT, since it needs to handle
+-- dateexpected when it is NULL.
+CREATE OR REPLACE FUNCTION milestone_sort_key(
+ dateexpected timestamp, name text)
+ RETURNS text
+AS $_$
+ # If this method is altered, then any functional indexes using it
+ # need to be rebuilt.
+ import re
+ import datetime
+
+ date_expected, name = args
+
+ def substitude_
+ return match.group(
+
+ name = re.sub(u'\d+', substitude_
+ if date_expected is None:
+ # NULL dates are considered to be in the future.
+ date_expected = datetime.
+ return '%s %s' % (date_expected, name)
+$_$
+LANGUAGE plpythonu IMMUTABLE;
+
+COMMENT ON FUNCTION milestone_
+'Sort by the Milestone dateexpected and name. If the dateexpected is NULL, then it is converted to a date far in the future, so it will be sorted as a milestone in the future.';
=== modified file 'lib/lp/
--- lib/lp/
| Brad Crittenden (bac) wrote : | # |
Hi Edwin,
Interesting branch!
typo: substitude -> substitute
Should has_milestones be exported?
Any reason you cannot use ISlaveStore for the queries?
| Edwin Grubbs (edwin-grubbs) wrote : | # |
> Hi Edwin,
>
> Interesting branch!
>
> typo: substitude -> substitute
Fixed both the db function and the code that I stole that from.
> Should has_milestones be exported?
I think it would be better not to slow down every query for the pillars and series by adding another attribute that is hardly ever used. The other milestones attributes are references to collections, so you don't actually invoke the extra query unless you follow that link.
> Any reason you cannot use ISlaveStore for the queries?
IStore() chooses the slave or master based on whether the user has modified anything lately. If I forced it to use ISlaveStore, it would not show the user any changes they made until they propagated to the slave servers.
| Robert Collins (lifeless) wrote : | # |
69- def substitude_
70+
71+ def substitute_
That new VWS breaks up a small function - closures like that are more readable when they look like one conceptual thing rather than two IMO; please consider removing the new VWS.
I think the index is new? Needs a new stamp from stuart if so.
Lastly, it seems to me that LBYL isn't needed here: surely doing *neither* a .count() nor a .any() is appropriate: rather just iterate the latest_milestones, and if the iterator outputs no rows don't show the table? Perhaps we don't have a construct for doing that; if thats the case I'm happy with this approach, but suggest that you file a bug saying we should have such a construct - it will be the least work of all and thus fastest.
| Edwin Grubbs (edwin-grubbs) wrote : | # |
> 69- def substitude_
> 70+
> 71+ def substitute_
Brad also pointed that out, and it has been fixed.
> That new VWS breaks up a small function - closures like that are more readable
> when they look like one conceptual thing rather than two IMO; please consider
> removing the new VWS.
I actually disagree that it makes it less readable. Secondly, pocketlint complains when you don't have a blank line above a function definition.
> I think the index is new? Needs a new stamp from stuart if so.
The index isn't new. Since I included a full diff after the first set of changes instead of an incremental diff, it may have appeared new, but it was in the patch file below the db function before the db function was moved to trusted.sql.
> Lastly, it seems to me that LBYL isn't needed here: surely doing *neither* a
> .count() nor a .any() is appropriate: rather just iterate the
> latest_milestones, and if the iterator outputs no rows don't show the table?
The problem with not checking .any() before iterating is that a storm ResultSet object will query the database every time you iterate over it; therefore, you would have to create a new cached property on the view that would hold a list in order to eliminate the query. We can't have the model just cache the property as a list for us, since the whole point of this refactoring was to allow the view code to slice the model's attribute without the model first querying a really large number of rows.
> Perhaps we don't have a construct for doing that; if thats the case I'm happy
> with this approach, but suggest that you file a bug saying we should have such
> a construct - it will be the least work of all and thus fastest.
I definitely think there is a better universal solution to this, but it will need more discussion on the mailing list, since either template/view coding practices or storm's ResultSet will have to change significantly. Also, to put it into perspective, we are talking about eliminating one or two single-row queries, compared to the 700-rows of results that this branch eliminates, so improving this might not be as urgent as other optimizations.

I don't like the __len__ introduction as it blurs the line between lists and resultsets more : and we suffer because the line is blurry already. I realise you probably have a stack of stuff to put on top of this that is doing lens : however I'm worried about the knock on effects of the change : we should do it in IResultSet if we're doing it at all, not piecemeal on DecoratedResultSet. You could use a lazr.delegates to add __len__ just to your specific case (or, and perhaps better?) work up the stack replacing listified stuff with resultset aware code.
Other than that I think this is conceptually fine and an appropriate solution.
I'm a tad rusty on my plpython gotchas, so I'll let Stuart review that for you.