У чет, 11. 11 2010. у 14:47 +0000, Gavin Panella пише:
> If you want to use Storm syntax, it may work just as well to use an IN
> query:
The whole point was to do the restriction in a smarter way: i.e. shorten
the set returned from a query on TranslationTemplateItem table (~10M
rows) without asking Postgres to get a Cartesian product of that table
with the POFile table (~1M rows) :)
> SELECT POFile.*
> FROM POFile
> WHERE POFile.potemplate IN (
> SELECT potemplate
> FROM TranslationTemplateItem
> WHERE potmsgset = 1
> AND sequence > 0)
> LIMIT 1;
So, it's not surprising that this is by far the worst option. Original
query ran in ~100ms on staging, my improved version runs in ~1ms, and
this one takes ~250ms (with warm caches). I've already gotten timings
for my query on production DB as well: it runs in 0.8ms compared to 60ms
for the original version.
Also, note that I am (ab)using more knowledge of the data model to make
a slightly different query: if matching POFile does not exist for only
one selected POTMsgSet in my new version, no POFile will be returned
even though another POTMsgSet that I excluded through LIMIT 1 in the
inner query might have a matching POFile. However, that can't happen in
our model (yet can't be expressed as a DB constraint), so that's where I
am smarter than Postgres :)
Hi Gavin,
Thanks for the review and the suggestion :)
У чет, 11. 11 2010. у 14:47 +0000, Gavin Panella пише:
> If you want to use Storm syntax, it may work just as well to use an IN
> query:
The whole point was to do the restriction in a smarter way: i.e. shorten lateItem table (~10M
the set returned from a query on TranslationTemp
rows) without asking Postgres to get a Cartesian product of that table
with the POFile table (~1M rows) :)
> SELECT POFile.* lateItem
> FROM POFile
> WHERE POFile.potemplate IN (
> SELECT potemplate
> FROM TranslationTemp
> WHERE potmsgset = 1
> AND sequence > 0)
> LIMIT 1;
So, it's not surprising that this is by far the worst option. Original
query ran in ~100ms on staging, my improved version runs in ~1ms, and
this one takes ~250ms (with warm caches). I've already gotten timings
for my query on production DB as well: it runs in 0.8ms compared to 60ms
for the original version.
Also, note that I am (ab)using more knowledge of the data model to make
a slightly different query: if matching POFile does not exist for only
one selected POTMsgSet in my new version, no POFile will be returned
even though another POTMsgSet that I excluded through LIMIT 1 in the
inner query might have a matching POFile. However, that can't happen in
our model (yet can't be expressed as a DB constraint), so that's where I
am smarter than Postgres :)
Cheers,
Danilo