Merge lp:~danilo/launchpad/bug-662552-fast-pofile-selection into lp:launchpad

Proposed by Данило Шеган
Status: Merged
Approved by: Gavin Panella
Approved revision: no longer in the source branch.
Merged at revision: 11911
Proposed branch: lp:~danilo/launchpad/bug-662552-fast-pofile-selection
Merge into: lp:launchpad
Diff against target: 88 lines (+47/-13)
2 files modified
lib/lp/translations/model/translationmessage.py (+17/-13)
lib/lp/translations/tests/test_translationmessage.py (+30/-0)
To merge this branch: bzr merge lp:~danilo/launchpad/bug-662552-fast-pofile-selection
Reviewer Review Type Date Requested Status
Gavin Panella (community) Approve
Review via email: mp+40615@code.launchpad.net

Commit message

[r=allenap][ui=none][bug=662552] Improve query for finding a POFile for a given TranslationMessage.

Description of the change

= Bug 662552 =

After 8.4 migration, a specific query has become much slower and that's causing a big number of timeouts on POFile:+translate page (the most important page in Translations). Proper solution would be to refactor the entire page and views to not emit hundreds of queries, but that's more work that we'll have to do over time.

For now, we first worked around the problem by disabling global suggestions, one of very useful features in Translations. That bought us some time to fix the problem (before users start complaining about global suggestions not showing up).

And this branch is now, hopefully, fix for the problem. We'll see how much it helps when it gets rolled out to production (I'll cowboy it on staging as soon as a LOSA shows up to test it out), but it does a few what might seem to be nasty things.

1. It uses a subselect for what would otherwise be a very straightforward join query
2. It uses LIMIT in the subselect, so we can't use Storm syntax for the subselect (afaik)
3. It's not equivalent query to the previous one, but my knowledge of the data model allows me to optimize it better (i.e. restrict more before the join)

getOnePOFile does not return a predictable result. The point is that it needs to return one POFile where TranslationMessage is used, and it doesn't really matter which one it is.

I've also added a few unit tests since this method was not tested so far.

= Tests =

bin/test -cvvt getOnePOFile

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/translations/model/translationmessage.py
  lib/lp/translations/tests/test_translationmessage.py

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

Looks good, especially if it's 100 times faster!

If you want to use Storm syntax, it may work just as well to use an IN
query:

   SELECT POFile.*
     FROM POFile
    WHERE POFile.potemplate IN (
            SELECT potemplate
              FROM TranslationTemplateItem
             WHERE potmsgset = 1
               AND sequence > 0)
    LIMIT 1;

If PostgreSQL is sensible it will optimize that. But, to be honest, if
what you've got works, use it and ignore me :)

review: Approve
Revision history for this message
Данило Шеган (danilo) wrote :

Hi Gavin,

Thanks for the review and the suggestion :)

У чет, 11. 11 2010. у 14:47 +0000, Gavin Panella пише:
> If you want to use Storm syntax, it may work just as well to use an IN
> query:

The whole point was to do the restriction in a smarter way: i.e. shorten
the set returned from a query on TranslationTemplateItem table (~10M
rows) without asking Postgres to get a Cartesian product of that table
with the POFile table (~1M rows) :)

> SELECT POFile.*
> FROM POFile
> WHERE POFile.potemplate IN (
> SELECT potemplate
> FROM TranslationTemplateItem
> WHERE potmsgset = 1
> AND sequence > 0)
> LIMIT 1;

So, it's not surprising that this is by far the worst option. Original
query ran in ~100ms on staging, my improved version runs in ~1ms, and
this one takes ~250ms (with warm caches). I've already gotten timings
for my query on production DB as well: it runs in 0.8ms compared to 60ms
for the original version.

Also, note that I am (ab)using more knowledge of the data model to make
a slightly different query: if matching POFile does not exist for only
one selected POTMsgSet in my new version, no POFile will be returned
even though another POTMsgSet that I excluded through LIMIT 1 in the
inner query might have a matching POFile. However, that can't happen in
our model (yet can't be expressed as a DB constraint), so that's where I
am smarter than Postgres :)

Cheers,
Danilo

Revision history for this message
Gavin Panella (allenap) wrote :

> > SELECT POFile.*
> > FROM POFile
> > WHERE POFile.potemplate IN (
> > SELECT potemplate
> > FROM TranslationTemplateItem
> > WHERE potmsgset = 1
> > AND sequence > 0)
> > LIMIT 1;
>
> So, it's not surprising that this is by far the worst option. Original
> query ran in ~100ms on staging, my improved version runs in ~1ms, and
> this one takes ~250ms (with warm caches). I've already gotten timings
> for my query on production DB as well: it runs in 0.8ms compared to 60ms
> for the original version.

That's interesting. I had hand-wavy ideas of Postgres being able to
optimize more than it can, i.e. realizing that it did not need to
materialize the IN clause fully. Thanks for trying it out, I have
learnt something :)

> Also, note that I am (ab)using more knowledge of the data model to make
> a slightly different query: if matching POFile does not exist for only
> one selected POTMsgSet in my new version, no POFile will be returned
> even though another POTMsgSet that I excluded through LIMIT 1 in the
> inner query might have a matching POFile. However, that can't happen in
> our model (yet can't be expressed as a DB constraint), so that's where I
> am smarter than Postgres :)

Yeah, I meant to mention - just to make sure that it was intended -
that the original query filtered on POFile.language but not on
TranslationTemplateItem.sequence, as the latter does. I assume that's
what you're referring to.

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, Nov 12, 2010 at 4:11 AM, Gavin Panella
<email address hidden> wrote:
>> this one takes ~250ms (with warm caches).  I've already gotten timings
>> for my query on production DB as well: it runs in 0.8ms compared to 60ms
>> for the original version.
>
> That's interesting. I had hand-wavy ideas of Postgres being able to
> optimize more than it can, i.e. realizing that it did not need to
> materialize the IN clause fully. Thanks for trying it out, I have
> learnt something :)

pg 8.4 seems to have substantially worse query plan logic than 8.3 had
*for our use cases*. I suspect they've made other things faster.

Nevertheless - we should always express what we need as narrowly as
possible to postgresql - that will let its statistics provide the most
relevant guidance, prevent manipulating more data than needed and
generally be a win.

-Rob

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/translations/model/translationmessage.py'
--- lib/lp/translations/model/translationmessage.py 2010-10-27 18:30:31 +0000
+++ lib/lp/translations/model/translationmessage.py 2010-11-11 15:49:49 +0000
@@ -35,6 +35,7 @@
35 SQLBase,35 SQLBase,
36 sqlvalues,36 sqlvalues,
37 )37 )
38from canonical.launchpad.interfaces.lpstorm import IStore
38from lp.registry.interfaces.person import validate_public_person39from lp.registry.interfaces.person import validate_public_person
39from lp.services.propertycache import cachedproperty40from lp.services.propertycache import cachedproperty
40from lp.translations.interfaces.translationmessage import (41from lp.translations.interfaces.translationmessage import (
@@ -360,20 +361,23 @@
360 def getOnePOFile(self):361 def getOnePOFile(self):
361 """See `ITranslationMessage`."""362 """See `ITranslationMessage`."""
362 from lp.translations.model.pofile import POFile363 from lp.translations.model.pofile import POFile
363 clauses = [
364 "POFile.potemplate = TranslationTemplateItem.potemplate",
365 "TranslationTemplateItem.potmsgset = %s" % (
366 sqlvalues(self.potmsgset)),
367 "POFile.language = %s" % sqlvalues(self.language),
368 ]
369364
370 pofiles = POFile.select(' AND '.join(clauses),365 # Get any POFile where this translation exists.
371 clauseTables=['TranslationTemplateItem'])366 # Because we can't create a subselect with "LIMIT" using Storm,
372 pofile = list(pofiles[:1])367 # we directly embed a subselect using raw SQL instead.
373 if len(pofile) > 0:368 # We can do this because our message sharing code ensures a POFile
374 return pofile[0]369 # exists for any of the sharing templates.
375 else:370 # This approach gives us roughly a 100x performance improvement
376 return None371 # compared to straightforward join as of 2010-11-11. - danilo
372 pofile = IStore(self).find(
373 POFile,
374 POFile.potemplateID == SQL(
375 """(SELECT potemplate
376 FROM TranslationTemplateItem
377 WHERE potmsgset = %s AND sequence > 0
378 LIMIT 1)""" % sqlvalues(self.potmsgset)),
379 POFile.language == self.language).one()
380 return pofile
377381
378 def ensureBrowserPOFile(self):382 def ensureBrowserPOFile(self):
379 """See `ITranslationMessage`."""383 """See `ITranslationMessage`."""
380384
=== modified file 'lib/lp/translations/tests/test_translationmessage.py'
--- lib/lp/translations/tests/test_translationmessage.py 2010-11-11 09:41:28 +0000
+++ lib/lp/translations/tests/test_translationmessage.py 2010-11-11 15:49:49 +0000
@@ -16,6 +16,36 @@
16from lp.translations.model.potranslation import POTranslation16from lp.translations.model.potranslation import POTranslation
1717
1818
19class TestTranslationMessage(TestCaseWithFactory):
20 """Basic unit tests for TranslationMessage class.
21 """
22 layer = ZopelessDatabaseLayer
23
24 def test_getOnePOFile(self):
25 language = self.factory.makeLanguage('sr@test')
26 pofile = self.factory.makePOFile(language.code)
27 tm = self.factory.makeTranslationMessage(pofile=pofile)
28 self.assertEquals(pofile, tm.getOnePOFile())
29
30 def test_getOnePOFile_shared(self):
31 language = self.factory.makeLanguage('sr@test')
32 pofile1 = self.factory.makePOFile(language.code)
33 pofile2 = self.factory.makePOFile(language.code)
34 tm = self.factory.makeTranslationMessage(pofile=pofile1)
35 # Share this POTMsgSet with the other POTemplate (and POFile).
36 tm.potmsgset.setSequence(pofile2.potemplate, 1)
37 self.assertTrue(tm.getOnePOFile() in [pofile1, pofile2])
38
39 def test_getOnePOFile_no_pofile(self):
40 # When POTMsgSet is obsolete (sequence=0), no matching POFile
41 # is returned.
42 language = self.factory.makeLanguage('sr@test')
43 pofile = self.factory.makePOFile(language.code)
44 tm = self.factory.makeTranslationMessage(pofile=pofile)
45 tm.potmsgset.setSequence(pofile.potemplate, 0)
46 self.assertEquals(None, tm.getOnePOFile())
47
48
19class TestTranslationMessageFindIdenticalMessage(TestCaseWithFactory):49class TestTranslationMessageFindIdenticalMessage(TestCaseWithFactory):
20 """Tests for `TranslationMessage.findIdenticalMessage`."""50 """Tests for `TranslationMessage.findIdenticalMessage`."""
2151