Merge ~cjwatson/launchpad:simplify-external-translation-messages into launchpad:master

Proposed by Colin Watson
Status: Needs review
Proposed branch: ~cjwatson/launchpad:simplify-external-translation-messages
Merge into: launchpad:master
Diff against target: 31 lines (+1/-12)
1 file modified
lib/lp/translations/model/potmsgset.py (+1/-12)
Reviewer Review Type Date Requested Status
William Grant code Needs Information
Review via email: mp+381540@code.launchpad.net

Commit message

Simplify POTMsgSet._getExternalTranslationMessages

Description of the change

There's no obvious point to sorting the results of a subquery and then passing it straight to the IN operator; this seems to be a vestige of earlier code that fetched the result of the subquery into Python. This doesn't seem to make a massive difference (though it's a bit hard to tell due to caching), but it at least simplifies the query plan.

To post a comment you must log in.
Revision history for this message
William Grant (wgrant) wrote :

It's trying to get the most recent TranslationMessage with each distinct set of translations, not sort the output as such, so this change is likely to cause the result of the subquery and parent query to become orders of magnitude larger. Have you considered that?

review: Needs Information (code)
Revision history for this message
Colin Watson (cjwatson) wrote :

A realisation of the query in question (from https://oops.canonical.com/oops/?oopsid=OOPS-6ce2e41d4a1fc3371ecbbfa31fafcc74) is:

WITH msgsets AS
  ( SELECT POTMsgSet.id
   FROM POTMsgSet
   JOIN TranslationTemplateItem ON TranslationTemplateItem.potmsgset = POTMsgSet.id
   JOIN SuggestivePOTemplate ON TranslationTemplateItem.potemplate = SuggestivePOTemplate.potemplate
   WHERE POTMsgSet.msgid_singular = 10087
     AND POTMsgSet.id <> 29306498 )
SELECT TranslationMessage.COMMENT, TranslationMessage.date_created,
                                   TranslationMessage.date_reviewed,
                                   TranslationMessage.id,
                                   TranslationMessage.is_current_ubuntu,
                                   TranslationMessage.is_current_upstream,
                                   TranslationMessage.LANGUAGE, TranslationMessage.msgstr0,
                                                                TranslationMessage.msgstr1,
                                                                TranslationMessage.msgstr2,
                                                                TranslationMessage.msgstr3,
                                                                TranslationMessage.msgstr4,
                                                                TranslationMessage.msgstr5,
                                                                TranslationMessage.origin,
                                                                TranslationMessage.potemplate,
                                                                TranslationMessage.potmsgset,
                                                                TranslationMessage.reviewer,
                                                                TranslationMessage.submitter,
                                                                TranslationMessage.validation_status,
                                                                TranslationMessage.was_obsolete_in_last_import
FROM TranslationMessage
WHERE TranslationMessage.id IN
    ( SELECT DISTINCT ON (COALESCE(msgstr0, -1),
                          COALESCE(msgstr1, -1),
                          COALESCE(msgstr2, -1),
                          COALESCE(msgstr3, -1),
                          COALESCE(msgstr4, -1),
                          COALESCE(msgstr5, -1)) TranslationMessage.id
     FROM TranslationMessage
     JOIN msgsets ON msgsets.id = TranslationMessage.potmsgset
     WHERE (TranslationMessage.LANGUAGE IN (129))
     ORDER BY COALESCE(msgstr0, -1),
              COALESCE(msgstr1, -1),
              COALESCE(msgstr2, -1),
              COALESCE(msgstr3, -1),
              COALESCE(msgstr4, -1),
              COALESCE(msgstr5, -1),
              date_created DESC ) LIMIT 2001;

I'm struggling to see how that fetches the most recent TranslationMessage; as far as I can see the parent query just does a set membership check on a subquery that returns all the associated TranslationMessages with some sorting that doesn't seem as though it can change the end result. What am I missing?

Revision history for this message
William Grant (wgrant) wrote :

The subquery's DISTINCT ON combined with its ORDER BY mean only some of the TranslationMessages will be returned, don't they?

Unmerged commits

846d40c... by Colin Watson on 2020-04-01

Simplify POTMsgSet._getExternalTranslationMessages

There's no obvious point to sorting the results of a subquery and then
passing it straight to the IN operator; this seems to be a vestige of
earlier code that fetched the result of the subquery into Python. This
doesn't seem to make a massive difference (though it's a bit hard to
tell due to caching), but it at least simplifies the query plan.

LP: #736005

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/lib/lp/translations/model/potmsgset.py b/lib/lp/translations/model/potmsgset.py
2index 12f9f05..4895840 100644
3--- a/lib/lp/translations/model/potmsgset.py
4+++ b/lib/lp/translations/model/potmsgset.py
5@@ -419,25 +419,14 @@ class POTMsgSet(SQLBase):
6
7 # Subquery to find the ids of TranslationMessages that are
8 # matching suggestions.
9- # We're going to get a lot of duplicates, sometimes resulting in
10- # thousands of suggestions. Weed out most of that duplication by
11- # excluding older messages that are identical to newer ones in
12- # all translated forms. The Python code can later sort out the
13- # distinct translations per form.
14- msgstrs = ', '.join([
15- 'COALESCE(msgstr%d, -1)' % form
16- for form in range(TranslationConstants.MAX_PLURAL_FORMS)])
17 ids_query_params = {
18- 'msgstrs': msgstrs,
19 'where': '(' + ' OR '.join(lang_used) + ')',
20 }
21 ids_query = '''
22- SELECT DISTINCT ON (%(msgstrs)s)
23- TranslationMessage.id
24+ SELECT TranslationMessage.id
25 FROM TranslationMessage
26 JOIN msgsets ON msgsets.id = TranslationMessage.potmsgset
27 WHERE %(where)s
28- ORDER BY %(msgstrs)s, date_created DESC
29 ''' % ids_query_params
30
31 result = IStore(TranslationMessage).with_(msgsets).find(

Subscribers

People subscribed via source and target branches

to status/vote changes: