Merge lp:~danilo/launchpad/bug-662552-fast-pofile-selection into lp:launchpad

Proposed by Данило Шеган on 2010-11-11
Status: Merged
Approved by: Gavin Panella on 2010-11-11
Approved revision: no longer in the source branch.
Merged at revision: 11911
Proposed branch: lp:~danilo/launchpad/bug-662552-fast-pofile-selection
Merge into: lp:launchpad
Diff against target: 88 lines (+47/-13)
2 files modified
lib/lp/translations/model/translationmessage.py (+17/-13)
lib/lp/translations/tests/test_translationmessage.py (+30/-0)
To merge this branch: bzr merge lp:~danilo/launchpad/bug-662552-fast-pofile-selection
Reviewer Review Type Date Requested Status
Gavin Panella (community) 2010-11-11 Approve on 2010-11-11
Review via email: mp+40615@code.launchpad.net

Commit message

[r=allenap][ui=none][bug=662552] Improve query for finding a POFile for a given TranslationMessage.

Description of the change

= Bug 662552 =

After 8.4 migration, a specific query has become much slower and that's causing a big number of timeouts on POFile:+translate page (the most important page in Translations). Proper solution would be to refactor the entire page and views to not emit hundreds of queries, but that's more work that we'll have to do over time.

For now, we first worked around the problem by disabling global suggestions, one of very useful features in Translations. That bought us some time to fix the problem (before users start complaining about global suggestions not showing up).

And this branch is now, hopefully, fix for the problem. We'll see how much it helps when it gets rolled out to production (I'll cowboy it on staging as soon as a LOSA shows up to test it out), but it does a few what might seem to be nasty things.

1. It uses a subselect for what would otherwise be a very straightforward join query
2. It uses LIMIT in the subselect, so we can't use Storm syntax for the subselect (afaik)
3. It's not equivalent query to the previous one, but my knowledge of the data model allows me to optimize it better (i.e. restrict more before the join)

getOnePOFile does not return a predictable result. The point is that it needs to return one POFile where TranslationMessage is used, and it doesn't really matter which one it is.

I've also added a few unit tests since this method was not tested so far.

= Tests =

bin/test -cvvt getOnePOFile

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/translations/model/translationmessage.py
  lib/lp/translations/tests/test_translationmessage.py

To post a comment you must log in.
Gavin Panella (allenap) wrote :

Looks good, especially if it's 100 times faster!

If you want to use Storm syntax, it may work just as well to use an IN
query:

   SELECT POFile.*
     FROM POFile
    WHERE POFile.potemplate IN (
            SELECT potemplate
              FROM TranslationTemplateItem
             WHERE potmsgset = 1
               AND sequence > 0)
    LIMIT 1;

If PostgreSQL is sensible it will optimize that. But, to be honest, if
what you've got works, use it and ignore me :)

review: Approve
Данило Шеган (danilo) wrote :

Hi Gavin,

Thanks for the review and the suggestion :)

У чет, 11. 11 2010. у 14:47 +0000, Gavin Panella пише:
> If you want to use Storm syntax, it may work just as well to use an IN
> query:

The whole point was to do the restriction in a smarter way: i.e. shorten
the set returned from a query on TranslationTemplateItem table (~10M
rows) without asking Postgres to get a Cartesian product of that table
with the POFile table (~1M rows) :)

> SELECT POFile.*
> FROM POFile
> WHERE POFile.potemplate IN (
> SELECT potemplate
> FROM TranslationTemplateItem
> WHERE potmsgset = 1
> AND sequence > 0)
> LIMIT 1;

So, it's not surprising that this is by far the worst option. Original
query ran in ~100ms on staging, my improved version runs in ~1ms, and
this one takes ~250ms (with warm caches). I've already gotten timings
for my query on production DB as well: it runs in 0.8ms compared to 60ms
for the original version.

Also, note that I am (ab)using more knowledge of the data model to make
a slightly different query: if matching POFile does not exist for only
one selected POTMsgSet in my new version, no POFile will be returned
even though another POTMsgSet that I excluded through LIMIT 1 in the
inner query might have a matching POFile. However, that can't happen in
our model (yet can't be expressed as a DB constraint), so that's where I
am smarter than Postgres :)

Cheers,
Danilo

Gavin Panella (allenap) wrote :

> > SELECT POFile.*
> > FROM POFile
> > WHERE POFile.potemplate IN (
> > SELECT potemplate
> > FROM TranslationTemplateItem
> > WHERE potmsgset = 1
> > AND sequence > 0)
> > LIMIT 1;
>
> So, it's not surprising that this is by far the worst option. Original
> query ran in ~100ms on staging, my improved version runs in ~1ms, and
> this one takes ~250ms (with warm caches). I've already gotten timings
> for my query on production DB as well: it runs in 0.8ms compared to 60ms
> for the original version.

That's interesting. I had hand-wavy ideas of Postgres being able to
optimize more than it can, i.e. realizing that it did not need to
materialize the IN clause fully. Thanks for trying it out, I have
learnt something :)

> Also, note that I am (ab)using more knowledge of the data model to make
> a slightly different query: if matching POFile does not exist for only
> one selected POTMsgSet in my new version, no POFile will be returned
> even though another POTMsgSet that I excluded through LIMIT 1 in the
> inner query might have a matching POFile. However, that can't happen in
> our model (yet can't be expressed as a DB constraint), so that's where I
> am smarter than Postgres :)

Yeah, I meant to mention - just to make sure that it was intended -
that the original query filtered on POFile.language but not on
TranslationTemplateItem.sequence, as the latter does. I assume that's
what you're referring to.

Robert Collins (lifeless) wrote :

On Fri, Nov 12, 2010 at 4:11 AM, Gavin Panella
<email address hidden> wrote:
>> this one takes ~250ms (with warm caches).  I've already gotten timings
>> for my query on production DB as well: it runs in 0.8ms compared to 60ms
>> for the original version.
>
> That's interesting. I had hand-wavy ideas of Postgres being able to
> optimize more than it can, i.e. realizing that it did not need to
> materialize the IN clause fully. Thanks for trying it out, I have
> learnt something :)

pg 8.4 seems to have substantially worse query plan logic than 8.3 had
*for our use cases*. I suspect they've made other things faster.

Nevertheless - we should always express what we need as narrowly as
possible to postgresql - that will let its statistics provide the most
relevant guidance, prevent manipulating more data than needed and
generally be a win.

-Rob

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/translations/model/translationmessage.py'
2--- lib/lp/translations/model/translationmessage.py 2010-10-27 18:30:31 +0000
3+++ lib/lp/translations/model/translationmessage.py 2010-11-11 15:49:49 +0000
4@@ -35,6 +35,7 @@
5 SQLBase,
6 sqlvalues,
7 )
8+from canonical.launchpad.interfaces.lpstorm import IStore
9 from lp.registry.interfaces.person import validate_public_person
10 from lp.services.propertycache import cachedproperty
11 from lp.translations.interfaces.translationmessage import (
12@@ -360,20 +361,23 @@
13 def getOnePOFile(self):
14 """See `ITranslationMessage`."""
15 from lp.translations.model.pofile import POFile
16- clauses = [
17- "POFile.potemplate = TranslationTemplateItem.potemplate",
18- "TranslationTemplateItem.potmsgset = %s" % (
19- sqlvalues(self.potmsgset)),
20- "POFile.language = %s" % sqlvalues(self.language),
21- ]
22
23- pofiles = POFile.select(' AND '.join(clauses),
24- clauseTables=['TranslationTemplateItem'])
25- pofile = list(pofiles[:1])
26- if len(pofile) > 0:
27- return pofile[0]
28- else:
29- return None
30+ # Get any POFile where this translation exists.
31+ # Because we can't create a subselect with "LIMIT" using Storm,
32+ # we directly embed a subselect using raw SQL instead.
33+ # We can do this because our message sharing code ensures a POFile
34+ # exists for any of the sharing templates.
35+ # This approach gives us roughly a 100x performance improvement
36+ # compared to straightforward join as of 2010-11-11. - danilo
37+ pofile = IStore(self).find(
38+ POFile,
39+ POFile.potemplateID == SQL(
40+ """(SELECT potemplate
41+ FROM TranslationTemplateItem
42+ WHERE potmsgset = %s AND sequence > 0
43+ LIMIT 1)""" % sqlvalues(self.potmsgset)),
44+ POFile.language == self.language).one()
45+ return pofile
46
47 def ensureBrowserPOFile(self):
48 """See `ITranslationMessage`."""
49
50=== modified file 'lib/lp/translations/tests/test_translationmessage.py'
51--- lib/lp/translations/tests/test_translationmessage.py 2010-11-11 09:41:28 +0000
52+++ lib/lp/translations/tests/test_translationmessage.py 2010-11-11 15:49:49 +0000
53@@ -16,6 +16,36 @@
54 from lp.translations.model.potranslation import POTranslation
55
56
57+class TestTranslationMessage(TestCaseWithFactory):
58+ """Basic unit tests for TranslationMessage class.
59+ """
60+ layer = ZopelessDatabaseLayer
61+
62+ def test_getOnePOFile(self):
63+ language = self.factory.makeLanguage('sr@test')
64+ pofile = self.factory.makePOFile(language.code)
65+ tm = self.factory.makeTranslationMessage(pofile=pofile)
66+ self.assertEquals(pofile, tm.getOnePOFile())
67+
68+ def test_getOnePOFile_shared(self):
69+ language = self.factory.makeLanguage('sr@test')
70+ pofile1 = self.factory.makePOFile(language.code)
71+ pofile2 = self.factory.makePOFile(language.code)
72+ tm = self.factory.makeTranslationMessage(pofile=pofile1)
73+ # Share this POTMsgSet with the other POTemplate (and POFile).
74+ tm.potmsgset.setSequence(pofile2.potemplate, 1)
75+ self.assertTrue(tm.getOnePOFile() in [pofile1, pofile2])
76+
77+ def test_getOnePOFile_no_pofile(self):
78+ # When POTMsgSet is obsolete (sequence=0), no matching POFile
79+ # is returned.
80+ language = self.factory.makeLanguage('sr@test')
81+ pofile = self.factory.makePOFile(language.code)
82+ tm = self.factory.makeTranslationMessage(pofile=pofile)
83+ tm.potmsgset.setSequence(pofile.potemplate, 0)
84+ self.assertEquals(None, tm.getOnePOFile())
85+
86+
87 class TestTranslationMessageFindIdenticalMessage(TestCaseWithFactory):
88 """Tests for `TranslationMessage.findIdenticalMessage`."""
89