timeout on code.launchpad.net/~ubuntu-branches

Bug #1031764 reported by John A Meinel
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
William Grant

Bug Description

OOPS-5f256222ff8c758238414f93587ad1eb

This is exceeding the 5s default cap on page timeouts. It is spending >5s in SQL, with these primary culprits:

1978.0 1 SQL-main-slave

SELECT BranchMergeProposal.commit_message,
       BranchMergeProposal.date_created,
       BranchMergeProposal.date_merged,
       BranchMergeProposal.date_queued,
       BranchMergeProposal.date_review_requested,
       BranchMergeProposal.date_reviewed,
       BranchMergeProposal.description,
       BranchMergeProposal.id,
       BranchMergeProposal.merge_reporter,
       BranchMergeProposal.merged_revno,
       BranchMergeProposal.dependent_branch,
       BranchMergeProposal.merge_diff,
       BranchMergeProposal.queue_position,
       BranchMergeProposal.queue_status,
       BranchMergeProposal.queued_revision_id,
       BranchMergeProposal.queuer,
       BranchMergeProposal.registrant,
       BranchMergeProposal.reviewed_revision_id,
       BranchMergeProposal.reviewer,
       BranchMergeProposal.root_message_id,
       BranchMergeProposal.source_branch,
       BranchMergeProposal.superseded_by,
       BranchMergeProposal.target_branch,
       BranchMergeProposal.whiteboard
FROM Branch
JOIN BranchMergeProposal ON Branch.id = BranchMergeProposal.source_branch
AND Branch.OWNER = $INT
JOIN Branch AS target ON target.id = BranchMergeProposal.target_branch
WHERE (Branch.information_type IN ($INT, $INT)
  OR COALESCE((Branch.access_grants)&&
    (SELECT ARRAY_AGG(TeamParticipation.team)
     FROM TeamParticipation
     WHERE TeamParticipation.person = $INT), FALSE)
  OR Branch.access_policy IN
    (SELECT AccessPolicyGrant.policy
     FROM AccessPolicyGrant
     JOIN TeamParticipation ON TeamParticipation.team = AccessPolicyGrant.grantee
     WHERE TeamParticipation.person = $INT))
  AND (target.information_type IN ($INT, $INT)
  OR COALESCE((target.access_grants)&&
    (SELECT ARRAY_AGG(TeamParticipation.team)
     FROM TeamParticipation
     WHERE TeamParticipation.person = $INT), FALSE)
  OR target.access_policy IN
    (SELECT AccessPolicyGrant.policy
     FROM AccessPolicyGrant
     JOIN TeamParticipation ON TeamParticipation.team = AccessPolicyGrant.grantee
     WHERE TeamParticipation.person = $INT))
  AND BranchMergeProposal.source_branch IN ($INT ... $INT)
ORDER BY BranchMergeProposal.date_created DESC, BranchMergeProposal.id

^- This appears to be trying to see if there are any merge proposals for the branches being shown.

There are only 100 branch_ids in the "BranchMergeProposal.source_branch in ..." section. Which doesn't seem terrible. Though the question still remains why the individual query is that slow.

1217.0 2 SQL-main-slave

SELECT Branch.access_policy,
       Branch.branch_format,
       Branch.branch_type,
       Branch.metadir_format,
       Branch.date_created,
       Branch.date_last_modified,
       Branch.summary,
       Branch.distroseries,
       Branch.id,
       Branch.information_type,
       Branch.last_mirror_attempt,
       Branch.last_mirrored,
       Branch.last_mirrored_id,
       Branch.last_scanned,
       Branch.last_scanned_id,
       Branch.lifecycle_status,
       Branch.merge_queue_config,
       Branch.merge_queue,
       Branch.mirror_failures,
       Branch.mirror_status_message,
       Branch.name,
       Branch.next_mirror_time,
       Branch.OWNER, Branch.owner_name,
                     Branch.product,
                     Branch.registrant,
                     Branch.repository_format,
                     Branch.reviewer,
                     Branch.revision_count,
                     Branch.sourcepackagename,
                     Branch.stacked_on,
                     Branch.target_suffix,
                     Branch.unique_name,
                     Branch.url,
                     Branch.whiteboard
FROM Branch
WHERE Branch.OWNER = $INT
  AND Branch.lifecycle_status IN ($INT ... $INT)
  AND (Branch.information_type IN ($INT, $INT)
  OR COALESCE((Branch.access_grants)&&
    (SELECT ARRAY_AGG(TeamParticipation.team)
     FROM TeamParticipation
     WHERE TeamParticipation.person = $INT), FALSE)
  OR Branch.access_policy IN
    (SELECT AccessPolicyGrant.policy
     FROM AccessPolicyGrant
     JOIN TeamParticipation ON TeamParticipation.team = AccessPolicyGrant.grantee
     WHERE TeamParticipation.person = $INT))
ORDER BY Branch.date_last_modified DESC, Branch.target_suffix ASC, Branch.lifecycle_status DESC, Branch.owner_name ASC, Branch.name ASC LIMIT $INT
OFFSET $INT

^- One key issue of this request, is that it is being requested 2 times. (The average is 1217s, but both times are close to that.)

Looking at the actual queries, the content is, indeed, identical, so we should be caching the results of the first query, and not issuing a second one.

Note that it does appear to be a data-dependent issue, as https://code.launchpad.net/~jameinel takes 2.81s (with 100/142 possible branches shown, and /850 if you select 'any status').

I wonder if there is an index that could be put for the ORDER BY so that it can return the LIMIT + OFFSET without having to read all 100k+ branches for ~ubuntu-branches.
It could be that this is a regression because of the changes for privacy.

Related branches

John A Meinel (jameinel)
description: updated
description: updated
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (7.0 KiB)

On staging, for the first query, it returns 0 rows, but still takes 1200ms to execute. It would seem we should be filtering in a different order. EXPLAIN ANALYZE gives:
 Sort (cost=95124.34..95124.52 rows=71 width=785) (actual time=1555.036..1555.036 rows=0 loops=1)
   Sort Key: branchmergeproposal.date_created, branchmergeproposal.id
   Sort Method: quicksort Memory: 25kB
   InitPlan 1 (returns $0)
     -> Aggregate (cost=59.47..59.48 rows=1 width=4) (never executed)
           -> Index Scan using teamparticipation_person_idx on teamparticipation (cost=0.00..59.34 rows=51 width=4) (never executed)
                 Index Cond: (person = 10583)
   InitPlan 3 (returns $2)
     -> Aggregate (cost=59.47..59.48 rows=1 width=4) (actual time=0.360..0.361 rows=1 loops=1)
           -> Index Scan using teamparticipation_person_idx on teamparticipation (cost=0.00..59.34 rows=51 width=4) (actual time=0.029..0.283 rows=120 loops=1)
                 Index Cond: (person = 10583)
   -> Hash Join (cost=91145.54..95003.19 rows=71 width=785) (actual time=1555.024..1555.024 rows=0 loops=1)
         Hash Cond: (branchmergeproposal.target_branch = target.id)
         -> Hash Join (cost=44836.77..46503.17 rows=72 width=785) (actual time=0.296..0.296 rows=0 loops=1)
               Hash Cond: (branchmergeproposal.source_branch = branch.id)
               -> Bitmap Heap Scan on branchmergeproposal (cost=178.91..377.49 rows=110 width=785) (actual time=0.293..0.293 rows=0 loops=1)
                     Recheck Cond: (source_branch = ANY ('{642702,594960,642701,642699,642698,642697,626259,627638,612893,608781,642661,605468,591171,642609,642611,642610,642607,642606,642605,642601,642600,642599,567823,642598,642597,642596,642595,642594,642593,642591,642590,642589,642588,642587,642586,642585,642584,615804,642583,618044,618397,642580,642579, 614466,294897,631531,597979,145586,626091,641863,631530,184187,379944,350369,205826,642550,642547,612682,110462,305784,612846,412005,598022,642525,614325,642287,597972,611905,120004,642506,542770,456310,363093,613279,637525,614713,455168,637532,602879,607321,597781,610894,631529,615237,605684,615280,598021,105569,130730,605092,614295,615277,615274,621997,291172,642394,642392,642391,599195,352385}'::integer[]))
                     -> Bitmap Index Scan on branchmergeproposal__source_branch__idx (cost=0.00..178.88 rows=110 width=0) (actual time=0.289..0.289 rows=0 loops=1)
                           Index Cond: (source_branch = ANY ('{642702,594960,642701,642699,642698,642697,626259,627638,612893,608781,642661,605468,591171,642609,642611,642610,642607,642606,642605,642601,642600,642599,567823,642598,642597,642596,642595,642594,642593,642591,642590,642589,642588,642587,642586,642585,642584,615804,642583,618044,618397,642580,642579,614466,294897,631531,597979,145586,626091,641863,631530,184187,379944,350369,205826,642550,642547,612682,110462,305784,612846,412005,598022,642525,614325,642287,597972,611905,120004,642506,542770,456310,363093,613279,637525,614713,455168,637532,602879,607321,597781,610894,631529,615237,605684,615280,598021,105569,130730,605092,614295,615277,615274,621997,291172,642394,642392,642391,599195,352385}'...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

For the second query, it takes ~1200ms on staging as well. The query planner says:
 Limit (cost=50118.35..50118.60 rows=101 width=830) (actual time=1304.710..1304.751 rows=101 loops=1)
   InitPlan 1 (returns $0)
     -> Aggregate (cost=59.47..59.48 rows=1 width=4) (never executed)
           -> Index Scan using teamparticipation_person_idx on teamparticipation (cost=0.00..59.34 rows=51 width=4) (never executed)
                 Index Cond: (person = 10583)
   -> Sort (cost=50058.86..50669.06 rows=244079 width=830) (actual time=1304.707..1304.725 rows=101 loops=1)
         Sort Key: branch.date_last_modified, branch.target_suffix, branch.lifecycle_status, branch.owner_name, branch.name
         Sort Method: top-N heapsort Memory: 77kB
         -> Seq Scan on branch (cost=3245.05..40712.82 rows=244079 width=830) (actual time=0.027..782.749 rows=373887 loops=1)
               Filter: ((owner = 2866082) AND (lifecycle_status = ANY ('{10,30,50}'::integer[])) AND ((information_type = ANY ('{1,2}'::integer[])) OR COALESCE((access_grants && $0), false) OR (hashed SubPlan 2)))
               SubPlan 2
                 -> Hash Join (cost=2167.74..3245.04 rows=6 width=4) (never executed)
                       Hash Cond: (public.teamparticipation.team = accesspolicygrant.grantee)
                       -> Index Scan using teamparticipation_person_idx on teamparticipation (cost=0.00..59.34 rows=51width=4) (never executed)
                             Index Cond: (person = 10583)
                       -> Hash (cost=1229.55..1229.55 rows=75055 width=8) (never executed)
                             -> Seq Scan on accesspolicygrant (cost=0.00..1229.55 rows=75055 width=8) (never executed)
 Total runtime: 1304.974 ms
(18 rows)

That also shows the expensive part being:
         -> Seq Scan on branch (cost=3245.05..40712.82 rows=244079 width=830) (actual time=0.027..782.749 rows=373887 loops=1)

So it is starting with a sequential scan of all 370k branches, it then creates a 100 node bucket, which as it finds nodes 'newer' than what is in the bucket, it puts them in there, caveat the following hash join (which never executes for some reason).

Now, maybe this is actually reasonable for ~ubuntu-branches. By my count, there are 562,000 branches, and according to this query, 373,887 of them are part of ~ubuntu-branches. That is far more than the standard >10% => full-table-scan.

Note that part of the query:
launchpad_staging=> select count(*) from branch where lifecycle_status in (10,30,50) and (information_type in (1,2));
 count
--------
 455713

Since it does need so much information, but it is filtering from 300k rows down to 100. It feels like having an index on those exact fields would help a lot here.

I can't create an actual index to test it, but something like:
CREATE INDEX CONCURRENTLY branch_recent_branches_idx ON Branch(date_last_modified DESC, target_suffix ASC, lifecycle_status DESC, owner_name ASC, name ASC);

Revision history for this message
Daniel Holbach (dholbach) wrote :

<wgrant> dholbach: Bug #1029642 is probably the one you actually care about

Curtis Hovey (sinzui)
tags: removed: oops
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

The hard timeout on this page was increased to 9s to alleviate this issue.

William Grant (wgrant)
Changed in launchpad:
assignee: nobody → William Grant (wgrant)
status: Triaged → In Progress
Revision history for this message
William Grant (wgrant) wrote :

Apart from the core branch queries, the query to find MPs for the current batch of branches is slow for the same reason as bug #1007140, and the same fix works: https://pastebin.canonical.com/74194/ https://pastebin.canonical.com/74196/

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
William Grant (wgrant)
tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
William Grant (wgrant) wrote :

It's still not good, but it's now reliably ~4s.

Changed in launchpad:
assignee: William Grant (wgrant) → nobody
status: Fix Committed → In Progress
assignee: nobody → William Grant (wgrant)
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.