DistroSeries:+queue Timeout accepting packages (bug structural subscriptions)

Bug #745799 reported by Scott Kitterman
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Colin Watson

Bug Description

https://launchpad.net/ubuntu/natty/+queue
Referrer: https://launchpad.net/ubuntu/natty/+queue?queue_state=1

OOPS-1932AU494

SQL time: 7701 ms
Non-sql time: 3425 ms
Total time: 11126 ms
Statement Count: 1209

1200 statements longest 225ms
96 bugsubscription lookups (130ms)
110 batched person-with-account lookups (420ms)
56 archive lookups (80ms)
56 distroarchseries lookups(60ms)
43 individual person-with-account lookups (100ms)
43 packageuploadcustom lookups (70ms)
42 packageuploadbuild lookups (60ms)
37 LFA lookups (50ms)
37 DistributionSourcePackage lookups (50ms)
30 assignee lookups (122ms)
30 bugsubscription lookups (70ms)
27 team membership lookups (120ms)
24 structuralsubscription lookups (2300ms)
3 SourcePackageFilePublishing (700ms)

So while its terrible, the proximate cause is structural bug subscriptions combined with potato programming - querying each bug individually or something like that.

Tags: qa-ok timeout

Related branches

Changed in launchpad:
status: New → Triaged
importance: Undecided → Critical
tags: added: timeout
description: updated
description: updated
summary: - Timeout accepting packages
+ DistroSeries:+queue Timeout accepting packages
Revision history for this message
Scott Kitterman (kitterman) wrote : Re: DistroSeries:+queue Timeout accepting packages

More oops:

(Error ID: OOPS-1916T11)
(Error ID: OOPS-1916F20)

These were accepting two packages (second was a retry of the first).

Doing them one at a time worked for one of them.

(Error ID: OOPS-1916R19)
(Error ID: OOPS-1916V18)
(Error ID: OOPS-1916Q18)

Gave up after three tries on the second.

Revision history for this message
Robert Collins (lifeless) wrote :

568. 7956 21ms SQL-launchpad-main-master
SELECT 1
FROM
  (SELECT PackageDiff.date_fulfilled, PackageDiff.date_requested, PackageDiff.diff_content, PackageDiff.from_source, PackageDiff.id, PackageDiff.requester, PackageDiff.status, PackageDiff.to_source
   FROM PackageDiff
   WHERE PackageDiff.to_source = 952513LIMIT 1) AS "_tmp" LIMIT 1
569. 7979 2797ms SQL-launchpad-main-master
SELECT PackageDiff.date_fulfilled,
       PackageDiff.date_requested,
       PackageDiff.diff_content,
       PackageDiff.from_source,
       PackageDiff.id,
       PackageDiff.requester,
       PackageDiff.status,
       PackageDiff.to_source
FROM PackageDiff
WHERE PackageDiff.to_source = 952513
ORDER BY PackageDiff.date_requested DESC

Revision history for this message
Robert Collins (lifeless) wrote :

SELECT 1
FROM
  (SELECT PackageDiff.date_fulfilled, PackageDiff.date_requested, PackageDiff.diff_content, PackageDiff.from_source, PackageDiff.id, PackageDiff.requester, PackageDiff.status, PackageDiff.to_source
   FROM PackageDiff
   WHERE PackageDiff.to_source = 952513LIMIT 1) AS "_tmp" LIMIT 1
explain analyze SELECT PackageDiff.date_fulfilled,
       PackageDiff.date_requested,
       PackageDiff.diff_content,
       PackageDiff.from_source,
       PackageDiff.id,
       PackageDiff.requester,
       PackageDiff.status,
       PackageDiff.to_source
FROM PackageDiff
WHERE PackageDiff.to_source = 952513
ORDER BY PackageDiff.date_requested DESC
SELECT PackageDiff.date_fulfilled,
       PackageDiff.date_requested,
       PackageDiff.diff_content,
       PackageDiff.from_source,
       PackageDiff.id,
       PackageDiff.requester,
       PackageDiff.status,
       PackageDiff.to_source
FROM PackageDiff
WHERE PackageDiff.to_source = 952513
ORDER BY PackageDiff.date_requested DESC

Revision history for this message
Scott Kitterman (kitterman) wrote :

Here's some more oops:

(Error ID: OOPS-1931CG54)
(Error ID: OOPS-1931I568)

Those were for trying to accept 4 packages. Accepting them individually worked.

Revision history for this message
Scott Kitterman (kitterman) wrote :

Seems to have gotten worse again.

(Error ID: OOPS-1932BC487)

That's with only two packages and last night I was unable to accept chromium-browser by itself on three tries (sorry, didn't save the OOPS numbers as I was in a hurry).

Revision history for this message
Scott Kitterman (kitterman) wrote :

Looks like the problem child is audacity.

Error ID: OOPS-1932CM427
Error ID: OOPS-1932CO462
Error ID: OOPS-1932AU494

Revision history for this message
Robert Collins (lifeless) wrote :

1200 statements longest 225ms
96 bugsubscription lookups (130ms)
110 batched person-with-account lookups (420ms)
56 archive lookups (80ms)
56 distroarchseries lookups(60ms)
43 individual person-with-account lookups (100ms)
43 packageuploadcustom lookups (70ms)
42 packageuploadbuild lookups (60ms)
37 LFA lookups (50ms)
37 DistributionSourcePackage lookups (50ms)
30 assignee lookups (122ms)
30 bugsubscription lookups (70ms)
27 team membership lookups (120ms)
24 structuralsubscription lookups (2300ms)
3 SourcePackageFilePublishing (700ms)

So while its terrible, the proximate cause is structural bug subscriptions combined with potato programming.

summary: - DistroSeries:+queue Timeout accepting packages
+ DistroSeries:+queue Timeout accepting packages (bug structural
+ subscriptions)
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

(by terrible I mean the page does >1000 queries, so its very noisy etc)

Revision history for this message
William Grant (wgrant) wrote :

This affects copies too, eg. OOPS-1956CH112

Revision history for this message
Scott Kitterman (kitterman) wrote :

It's back. (Error ID: OOPS-2004CQ13)

Revision history for this message
Scott Kitterman (kitterman) wrote :

Not sure which bug I'm supposed to be adding these too, but got a timeout again today accepting 4 binaries (worked on the second try):

(Error ID: OOPS-2062CC88)

Revision history for this message
Scott Kitterman (kitterman) wrote :

(Error ID: OOPS-2084E90)

Failed with ten packages at once. Two sets of five worked.

Revision history for this message
Paul Sladen (sladen) wrote :

I've added the:

  https://launchpad.net/ubuntu/oneiric/+queue?queue_state=3&queue_text=&direction=backwards&start=65340

URL to the top of this bug (from my dup), which consistently shows the issue.

description: updated
description: updated
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 745799] Re: DistroSeries:+queue Timeout accepting packages (bug structural subscriptions)

Paul, thats a totally different issue. Please don't mangle bugs like that.

description: updated
Revision history for this message
Paul Sladen (sladen) wrote :

Roberts: Thanks for unduping it (I was not the one that duped it).

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, Sep 16, 2011 at 1:05 PM, Paul Sladen <email address hidden> wrote:
> Roberts: Thanks for unduping it (I was not the one that duped it).

No problem - sorry for being harsh in my initial reply, that was
before I had noticed the source of the confusion.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Friday 16 September 2011 02:05:42 you wrote:
> Roberts: Thanks for unduping it (I was not the one that duped it).

It was me.

One of the reasons I did it is because that page needs throwing away and re-
doing. It's an abomination of the highest order and is full of hacks and
workarounds to mitigate the original code that was even more worse at timing
out. Any fixes made to the existing page is just turd polishing.

Revision history for this message
Kate Stewart (kate.stewart) wrote :

Encountering this again as we're processing unapproved queue for precise.

Revision history for this message
Colin Watson (cjwatson) wrote :

Also affected the PackageUpload.acceptFromQueue API today: OOPS-010486e682dfae5a2594aa81edcedc98

Revision history for this message
Colin Watson (cjwatson) wrote :

As previously mentioned, the main area this all goes horribly wrong is in closing bugs for the upload being accepted. OK, in the latest OOPS I posted there are 172 SQL statements and 1.4 seconds before that, which isn't great, but it would be tolerable on its own.

I spent a bit of time trying to make things generally faster, but that requires a much smarter Launchpad developer than I'm capable of playing on TV. The next obvious approach appears to be to add a new ProcessAcceptedBugsJob, which can run as part of the FREQUENT group (contingent on my fix for bug 839659) and which would basically do all the bug.setStatus and bug.newMessage stuff from close_bugs_for_sourcepackagerelease. This would therefore amount to moving the guts of that latter function into a new job, adding job wrapping around it, and changing a few other functions to get bug IDs rather than IBugs (since bug IDs are a bit easier to pass around in job metadata, and the code that got bug objects was unnecessary potato programming anyway). This actually seems quite straightforward. It would require a DB patch as follows to start with:

CREATE TABLE ProcessAcceptedBugsJob (
    id serial PRIMARY KEY,
    job integer REFERENCES Job ON DELETE CASCADE UNIQUE NOT NULL,
    distroseries integer REFERENCES DistroSeries,
    sourcepackagerelease integer REFERENCES SourcePackageRelease NOT NULL,
    json_data text
);

Does that look reasonable in principle?

Colin Watson (cjwatson)
Changed in launchpad:
assignee: nobody → Colin Watson (cjwatson)
status: Triaged → In Progress
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Revision history for this message
Colin Watson (cjwatson) wrote :

2012-08-22 07:08:28 INFO 2209-27-2 applied just now in 0.1 seconds

tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
removed: qa-ok
Colin Watson (cjwatson)
tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Colin Watson (cjwatson) wrote :

The cron script and feature flag are now enabled on production, and all looks well so far. We'll monitor this through the current 12.10 beta-2 freeze, and if all is well then I'll close this bug by removing the feature flag.

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
removed: qa-ok
Changed in launchpad:
status: In Progress → Fix Committed
Colin Watson (cjwatson)
tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.