Merge ~cjwatson/launchpad:simplify-external-translation-messages into launchpad:master

Proposed by Colin Watson
Status: Needs review
Proposed branch: ~cjwatson/launchpad:simplify-external-translation-messages
Merge into: launchpad:master
Diff against target: 31 lines (+1/-12)
1 file modified
lib/lp/translations/model/potmsgset.py (+1/-12)
Reviewer Review Type Date Requested Status
William Grant code Needs Information
Review via email: mp+381540@code.launchpad.net

Commit message

Simplify POTMsgSet._getExternalTranslationMessages

Description of the change

There's no obvious point to sorting the results of a subquery and then passing it straight to the IN operator; this seems to be a vestige of earlier code that fetched the result of the subquery into Python. This doesn't seem to make a massive difference (though it's a bit hard to tell due to caching), but it at least simplifies the query plan.

To post a comment you must log in.
Revision history for this message
William Grant (wgrant) wrote :

It's trying to get the most recent TranslationMessage with each distinct set of translations, not sort the output as such, so this change is likely to cause the result of the subquery and parent query to become orders of magnitude larger. Have you considered that?

review: Needs Information (code)
Revision history for this message
Colin Watson (cjwatson) wrote :

A realisation of the query in question (from https://oops.canonical.com/oops/?oopsid=OOPS-6ce2e41d4a1fc3371ecbbfa31fafcc74) is:

WITH msgsets AS
  ( SELECT POTMsgSet.id
   FROM POTMsgSet
   JOIN TranslationTemplateItem ON TranslationTemplateItem.potmsgset = POTMsgSet.id
   JOIN SuggestivePOTemplate ON TranslationTemplateItem.potemplate = SuggestivePOTemplate.potemplate
   WHERE POTMsgSet.msgid_singular = 10087
     AND POTMsgSet.id <> 29306498 )
SELECT TranslationMessage.COMMENT, TranslationMessage.date_created,
                                   TranslationMessage.date_reviewed,
                                   TranslationMessage.id,
                                   TranslationMessage.is_current_ubuntu,
                                   TranslationMessage.is_current_upstream,
                                   TranslationMessage.LANGUAGE, TranslationMessage.msgstr0,
                                                                TranslationMessage.msgstr1,
                                                                TranslationMessage.msgstr2,
                                                                TranslationMessage.msgstr3,
                                                                TranslationMessage.msgstr4,
                                                                TranslationMessage.msgstr5,
                                                                TranslationMessage.origin,
                                                                TranslationMessage.potemplate,
                                                                TranslationMessage.potmsgset,
                                                                TranslationMessage.reviewer,
                                                                TranslationMessage.submitter,
                                                                TranslationMessage.validation_status,
                                                                TranslationMessage.was_obsolete_in_last_import
FROM TranslationMessage
WHERE TranslationMessage.id IN
    ( SELECT DISTINCT ON (COALESCE(msgstr0, -1),
                          COALESCE(msgstr1, -1),
                          COALESCE(msgstr2, -1),
                          COALESCE(msgstr3, -1),
                          COALESCE(msgstr4, -1),
                          COALESCE(msgstr5, -1)) TranslationMessage.id
     FROM TranslationMessage
     JOIN msgsets ON msgsets.id = TranslationMessage.potmsgset
     WHERE (TranslationMessage.LANGUAGE IN (129))
     ORDER BY COALESCE(msgstr0, -1),
              COALESCE(msgstr1, -1),
              COALESCE(msgstr2, -1),
              COALESCE(msgstr3, -1),
              COALESCE(msgstr4, -1),
              COALESCE(msgstr5, -1),
              date_created DESC ) LIMIT 2001;

I'm struggling to see how that fetches the most recent TranslationMessage; as far as I can see the parent query just does a set membership check on a subquery that returns all the associated TranslationMessages with some sorting that doesn't seem as though it can change the end result. What am I missing?

Revision history for this message
William Grant (wgrant) wrote :

The subquery's DISTINCT ON combined with its ORDER BY mean only some of the TranslationMessages will be returned, don't they?

Unmerged commits

846d40c... by Colin Watson

Simplify POTMsgSet._getExternalTranslationMessages

There's no obvious point to sorting the results of a subquery and then
passing it straight to the IN operator; this seems to be a vestige of
earlier code that fetched the result of the subquery into Python. This
doesn't seem to make a massive difference (though it's a bit hard to
tell due to caching), but it at least simplifies the query plan.

LP: #736005

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/lib/lp/translations/model/potmsgset.py b/lib/lp/translations/model/potmsgset.py
index 12f9f05..4895840 100644
--- a/lib/lp/translations/model/potmsgset.py
+++ b/lib/lp/translations/model/potmsgset.py
@@ -419,25 +419,14 @@ class POTMsgSet(SQLBase):
419419
420 # Subquery to find the ids of TranslationMessages that are420 # Subquery to find the ids of TranslationMessages that are
421 # matching suggestions.421 # matching suggestions.
422 # We're going to get a lot of duplicates, sometimes resulting in
423 # thousands of suggestions. Weed out most of that duplication by
424 # excluding older messages that are identical to newer ones in
425 # all translated forms. The Python code can later sort out the
426 # distinct translations per form.
427 msgstrs = ', '.join([
428 'COALESCE(msgstr%d, -1)' % form
429 for form in range(TranslationConstants.MAX_PLURAL_FORMS)])
430 ids_query_params = {422 ids_query_params = {
431 'msgstrs': msgstrs,
432 'where': '(' + ' OR '.join(lang_used) + ')',423 'where': '(' + ' OR '.join(lang_used) + ')',
433 }424 }
434 ids_query = '''425 ids_query = '''
435 SELECT DISTINCT ON (%(msgstrs)s)426 SELECT TranslationMessage.id
436 TranslationMessage.id
437 FROM TranslationMessage427 FROM TranslationMessage
438 JOIN msgsets ON msgsets.id = TranslationMessage.potmsgset428 JOIN msgsets ON msgsets.id = TranslationMessage.potmsgset
439 WHERE %(where)s429 WHERE %(where)s
440 ORDER BY %(msgstrs)s, date_created DESC
441 ''' % ids_query_params430 ''' % ids_query_params
442431
443 result = IStore(TranslationMessage).with_(msgsets).find(432 result = IStore(TranslationMessage).with_(msgsets).find(

Subscribers

People subscribed via source and target branches

to status/vote changes: