Merge lp:~gmb/launchpad/bugjob-indices-bug-539382 into lp:launchpad/db-devel

Proposed by Graham Binns
Status: Merged
Approved by: Graham Binns
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~gmb/launchpad/bugjob-indices-bug-539382
Merge into: lp:launchpad/db-devel
Diff against target: 15 lines (+11/-0)
1 file modified
database/schema/patch-2207-53-0.sql (+11/-0)
To merge this branch: bzr merge lp:~gmb/launchpad/bugjob-indices-bug-539382
Reviewer Review Type Date Requested Status
Björn Tillenius (community) db Approve
Stuart Bishop (community) db Approve
Review via email: mp+24239@code.launchpad.net

Commit message

Indices have been added to BugJob and Job to speed up the queries that search for ready CalculateBugHeatJobs.

Description of the change

This branch adds indices to BugJob and Job on the fields that are used when looking for existing CalculateBugHeatJobs in CalculateBugHeatJob.replace(). The aim is to prevent the timeouts in bug 539382 (and other places)

It may be that we ultimately need to not do the check for existing jobs in the create() method, and instead ensure that only one job is run at a time, but either way these indices should improve performance of several BugJob and Job related queries.

To post a comment you must log in.
Revision history for this message
Stuart Bishop (stub) wrote :

I can't recall why I let this table get created without indexes. Possibly because I didn't think it should get as big as it has (2.5 million jobs, with nearly 1 million of them completed over a month ago that nobody has bothered to garbage collect).

Should BugJob.job be UNIQUE?

Should BugJob.bug be UNIQUE?

Should (BugJob.bug, BugJob.job) be UNIQUE?

To know if the indexes are going to be used, I'll need some more information about the queries being made. In particular, if we search for 'jobs for bug NNN with type ZZZ' we should create an index on (bug, job_type) rather than separate indexes on bug and job_type. If we are querying for 'All bug jobs of type ZZZ' then we should separate the indexes.

Allocated patch number is patch-2207-53-0.sql

Indexes should be named with a __idx suffix, as per the following. If we are creating UNIQUE constraints instead, we will create them with a __key suffix.

SET client_min_messages=ERROR;

-- Indices for BugJob
CREATE INDEX bugjob__job__idx ON BugJob(job);
CREATE INDEX bugjob__bug__idx ON BugJob(bug);
CREATE INDEX bugjob__job_type__idx ON BugJob(job_type);

-- Indices for Job
CREATE INDEX job__scheduled_start__idx ON Job(scheduled_start);
CREATE INDEX job__lease_expires__idx ON Job(lease_expires);

INSERT INTO LaunchpadDatabaseRevision VALUES (2207, 53, 0);

review: Needs Information
Revision history for this message
Graham Binns (gmb) wrote :

On 28 April 2010 05:51, Stuart Bishop <email address hidden>
wrote:
> Review: Needs Information I can't recall why I let this table get
> created without indexes. Possibly because I didn't think it should get
> as big as it has (2.5 million jobs, with nearly 1 million of them
> completed over a month ago that nobody has bothered to garbage
> collect).

There are 2 million rows in BugJob? That's suboptimal; I'll file a bug
to add a task to garbo-daily to expunge old jobs (we don't need them
once they've run, so we can probably remove all but the last 24 hours'
worth (if we even need those)).

> Should BugJob.job be UNIQUE?

Yes.

> Should BugJob.bug be UNIQUE?

No.

> Should (BugJob.bug, BugJob.job) be UNIQUE?

Yes.

> To know if the indexes are going to be used, I'll need some more
> information about the queries being made. In particular, if we search
> for 'jobs for bug NNN with type ZZZ' we should create an index on
> (bug, job_type) rather than separate indexes on bug and job_type. If
> we are querying for 'All bug jobs of type ZZZ' then we should separate
> the indexes.

I think (bug, job_type) is probably a better index to have. The query
that's been causing us some problems is this one:

    SELECT BugJob.*
      FROM BugJob, Job
     WHERE BugJob.bug = 1
       AND BugJob.job_type = 1
       AND BugJob.job = Job.id
       AND Job.id IN (
           SELECT Job.id
             FROM Job WHERE Job.status = 1
              AND (
                  Job.lease_expires IS NULL
                  OR Job.lease_expires < (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'))
              AND (
                  Job.scheduled_start IS NULL
                  OR Job.scheduled_start <= (
                      CURRENT_TIMESTAMP AT TIME ZONE 'UTC')))
     LIMIT 1

Which gets run when a new CalculateBugHeatJob gets created because we
only allow one ready CalculateBugHeatJob to exist per bug at once (I'd
love to be able to make (BugJob.job, BugJob.job_type) UNIQUE, but we
can't since there may be several jobs of a given type generated for a
Bug before they get gc'd).

> Allocated patch number is patch-2207-53-0.sql
>
> Indexes should be named with a __idx suffix, as per the following. If
> we are creating UNIQUE constraints instead, we will create them with a
> __key suffix.
>
> SET client_min_messages=ERROR;
>
> -- Indices for BugJob
> CREATE INDEX bugjob__job__idx ON BugJob(job);
> CREATE INDEX bugjob__bug__idx ON BugJob(bug); CREATE INDEX
> bugjob__job_type__idx ON BugJob(job_type);
>
> -- Indices for Job
> CREATE INDEX job__scheduled_start__idx ON
> Job(scheduled_start); CREATE INDEX job__lease_expires__idx ON
> Job(lease_expires);
>
> INSERT INTO LaunchpadDatabaseRevision VALUES (2207, 53, 0);
>

Okay, thanks. I'll update the patch appropriately.

Revision history for this message
Stuart Bishop (stub) wrote :

The following version is approved. Although there is no real functional difference, a little more metadata ends up stored in the database using the ALTER TABLE syntax for defining uniqueness.

SET client_min_messages=ERROR;

-- Indices for BugJob
ALTER TABLE BugJob ADD CONSTRAINT bugjob__job__key UNIQUE (job);
CREATE INDEX bugjob__bug__job_type__idx ON BugJob(bug, job_type);

-- Indices for Job
CREATE INDEX job__scheduled_start__idx ON Job(scheduled_start);
CREATE INDEX job__lease_expires__idx ON Job(lease_expires);

INSERT INTO LaunchpadDatabaseRevision VALUES (2207, 53, 0);

Your example query is better written as following BTW:

SELECT BugJob.*
FROM BugJob, Job
WHERE
    BugJob.bug = 1
    AND BugJob.job_type = 1
    AND BugJob.job = Job.id
    AND Job.status = 1
    AND (
        Job.lease_expires IS NULL
        OR Job.lease_expires < (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'))
    AND (
        Job.scheduled_start IS NULL
        OR Job.scheduled_start <= (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'))
LIMIT 1;

review: Approve (db)
Revision history for this message
Graham Binns (gmb) wrote :

On 28 April 2010 10:53, Stuart Bishop <email address hidden> wrote:
> Review: Approve db
> The following version is approved. Although there is no real functional difference, a little more metadata ends up stored in the database using the ALTER TABLE syntax for defining uniqueness.
>
> SET client_min_messages=ERROR;
>
> -- Indices for BugJob
> ALTER TABLE BugJob ADD CONSTRAINT bugjob__job__key UNIQUE (job);
> CREATE INDEX bugjob__bug__job_type__idx ON BugJob(bug, job_type);
>
> -- Indices for Job
> CREATE INDEX job__scheduled_start__idx ON Job(scheduled_start);
> CREATE INDEX job__lease_expires__idx ON Job(lease_expires);
>
> INSERT INTO LaunchpadDatabaseRevision VALUES (2207, 53, 0);

Thanks. I'll update the patch accordingly.

>
> Your example query is better written as following BTW:
>
> SELECT BugJob.*
> FROM BugJob, Job
> WHERE
>    BugJob.bug = 1
>    AND BugJob.job_type = 1
>    AND BugJob.job = Job.id
>    AND Job.status = 1
>    AND (
>        Job.lease_expires IS NULL
>        OR Job.lease_expires < (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'))
>    AND (
>        Job.scheduled_start IS NULL
>        OR Job.scheduled_start <= (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'))
> LIMIT 1;
>

Noted. Although what I pasted was the Storm-generated query that came
from an OOPS report.

Revision history for this message
Björn Tillenius (bjornt) :
review: Approve (db)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'database/schema/patch-2207-53-0.sql'
2--- database/schema/patch-2207-53-0.sql 1970-01-01 00:00:00 +0000
3+++ database/schema/patch-2207-53-0.sql 2010-04-28 10:24:43 +0000
4@@ -0,0 +1,11 @@
5+SET client_min_messages=ERROR;
6+
7+-- Indices for BugJob
8+ALTER TABLE BugJob ADD CONSTRAINT bugjob__job__key UNIQUE (job);
9+CREATE INDEX bugjob__bug__job_type__idx ON BugJob(bug, job_type);
10+
11+-- Indices for Job
12+CREATE INDEX job__scheduled_start__idx ON Job(scheduled_start);
13+CREATE INDEX job__lease_expires__idx ON Job(lease_expires);
14+
15+INSERT INTO LaunchpadDatabaseRevision VALUES (2207, 53, 0);

Subscribers

People subscribed via source and target branches

to status/vote changes: