Merge lp:~danilo/launchpad/bug-662552-fast-pofile-selection into lp:launchpad
| Status: | Merged |
|---|---|
| Approved by: | Gavin Panella on 2010-11-11 |
| Approved revision: | no longer in the source branch. |
| Merged at revision: | 11911 |
| Proposed branch: | lp:~danilo/launchpad/bug-662552-fast-pofile-selection |
| Merge into: | lp:launchpad |
| Diff against target: |
88 lines (+47/-13) 2 files modified
lib/lp/translations/model/translationmessage.py (+17/-13) lib/lp/translations/tests/test_translationmessage.py (+30/-0) |
| To merge this branch: | bzr merge lp:~danilo/launchpad/bug-662552-fast-pofile-selection |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Gavin Panella (community) | 2010-11-11 | Approve on 2010-11-11 | |
|
Review via email:
|
|||
Commit Message
[r=allenap]
Description of the Change
= Bug 662552 =
After 8.4 migration, a specific query has become much slower and that's causing a big number of timeouts on POFile:+translate page (the most important page in Translations). Proper solution would be to refactor the entire page and views to not emit hundreds of queries, but that's more work that we'll have to do over time.
For now, we first worked around the problem by disabling global suggestions, one of very useful features in Translations. That bought us some time to fix the problem (before users start complaining about global suggestions not showing up).
And this branch is now, hopefully, fix for the problem. We'll see how much it helps when it gets rolled out to production (I'll cowboy it on staging as soon as a LOSA shows up to test it out), but it does a few what might seem to be nasty things.
1. It uses a subselect for what would otherwise be a very straightforward join query
2. It uses LIMIT in the subselect, so we can't use Storm syntax for the subselect (afaik)
3. It's not equivalent query to the previous one, but my knowledge of the data model allows me to optimize it better (i.e. restrict more before the join)
getOnePOFile does not return a predictable result. The point is that it needs to return one POFile where TranslationMessage is used, and it doesn't really matter which one it is.
I've also added a few unit tests since this method was not tested so far.
= Tests =
bin/test -cvvt getOnePOFile
= Launchpad lint =
Checking for conflicts and issues in changed files.
Linting changed files:
lib/lp/
lib/lp/
| Данило Шеган (danilo) wrote : | # |
Hi Gavin,
Thanks for the review and the suggestion :)
У чет, 11. 11 2010. у 14:47 +0000, Gavin Panella пише:
> If you want to use Storm syntax, it may work just as well to use an IN
> query:
The whole point was to do the restriction in a smarter way: i.e. shorten
the set returned from a query on TranslationTemp
rows) without asking Postgres to get a Cartesian product of that table
with the POFile table (~1M rows) :)
> SELECT POFile.*
> FROM POFile
> WHERE POFile.potemplate IN (
> SELECT potemplate
> FROM TranslationTemp
> WHERE potmsgset = 1
> AND sequence > 0)
> LIMIT 1;
So, it's not surprising that this is by far the worst option. Original
query ran in ~100ms on staging, my improved version runs in ~1ms, and
this one takes ~250ms (with warm caches). I've already gotten timings
for my query on production DB as well: it runs in 0.8ms compared to 60ms
for the original version.
Also, note that I am (ab)using more knowledge of the data model to make
a slightly different query: if matching POFile does not exist for only
one selected POTMsgSet in my new version, no POFile will be returned
even though another POTMsgSet that I excluded through LIMIT 1 in the
inner query might have a matching POFile. However, that can't happen in
our model (yet can't be expressed as a DB constraint), so that's where I
am smarter than Postgres :)
Cheers,
Danilo
| Gavin Panella (allenap) wrote : | # |
> > SELECT POFile.*
> > FROM POFile
> > WHERE POFile.potemplate IN (
> > SELECT potemplate
> > FROM TranslationTemp
> > WHERE potmsgset = 1
> > AND sequence > 0)
> > LIMIT 1;
>
> So, it's not surprising that this is by far the worst option. Original
> query ran in ~100ms on staging, my improved version runs in ~1ms, and
> this one takes ~250ms (with warm caches). I've already gotten timings
> for my query on production DB as well: it runs in 0.8ms compared to 60ms
> for the original version.
That's interesting. I had hand-wavy ideas of Postgres being able to
optimize more than it can, i.e. realizing that it did not need to
materialize the IN clause fully. Thanks for trying it out, I have
learnt something :)
> Also, note that I am (ab)using more knowledge of the data model to make
> a slightly different query: if matching POFile does not exist for only
> one selected POTMsgSet in my new version, no POFile will be returned
> even though another POTMsgSet that I excluded through LIMIT 1 in the
> inner query might have a matching POFile. However, that can't happen in
> our model (yet can't be expressed as a DB constraint), so that's where I
> am smarter than Postgres :)
Yeah, I meant to mention - just to make sure that it was intended -
that the original query filtered on POFile.language but not on
TranslationTemp
what you're referring to.
| Robert Collins (lifeless) wrote : | # |
On Fri, Nov 12, 2010 at 4:11 AM, Gavin Panella
<email address hidden> wrote:
>> this one takes ~250ms (with warm caches). I've already gotten timings
>> for my query on production DB as well: it runs in 0.8ms compared to 60ms
>> for the original version.
>
> That's interesting. I had hand-wavy ideas of Postgres being able to
> optimize more than it can, i.e. realizing that it did not need to
> materialize the IN clause fully. Thanks for trying it out, I have
> learnt something :)
pg 8.4 seems to have substantially worse query plan logic than 8.3 had
*for our use cases*. I suspect they've made other things faster.
Nevertheless - we should always express what we need as narrowly as
possible to postgresql - that will let its statistics provide the most
relevant guidance, prevent manipulating more data than needed and
generally be a win.
-Rob

Looks good, especially if it's 100 times faster!
If you want to use Storm syntax, it may work just as well to use an IN
query:
SELECT POFile.* lateItem
FROM POFile
WHERE POFile.potemplate IN (
SELECT potemplate
FROM TranslationTemp
WHERE potmsgset = 1
AND sequence > 0)
LIMIT 1;
If PostgreSQL is sensible it will optimize that. But, to be honest, if
what you've got works, use it and ignore me :)