Launchpad itself

Code review comment for lp:~gmb/launchpad/bugjob-indices-bug-539382

bugjob-indices-bug-539382
Merge into db-devel

Revision history for this message

Graham Binns (gmb) wrote on 2010-04-28:

On 28 April 2010 05:51, Stuart Bishop <email address hidden>
wrote:
> Review: Needs Information I can't recall why I let this table get
> created without indexes. Possibly because I didn't think it should get
> as big as it has (2.5 million jobs, with nearly 1 million of them
> completed over a month ago that nobody has bothered to garbage
> collect).

There are 2 million rows in BugJob? That's suboptimal; I'll file a bug
to add a task to garbo-daily to expunge old jobs (we don't need them
once they've run, so we can probably remove all but the last 24 hours'
worth (if we even need those)).

> Should BugJob.job be UNIQUE?

Yes.

> Should BugJob.bug be UNIQUE?

No.

> Should (BugJob.bug, BugJob.job) be UNIQUE?

Yes.

> To know if the indexes are going to be used, I'll need some more
> information about the queries being made. In particular, if we search
> for 'jobs for bug NNN with type ZZZ' we should create an index on
> (bug, job_type) rather than separate indexes on bug and job_type. If
> we are querying for 'All bug jobs of type ZZZ' then we should separate
> the indexes.

I think (bug, job_type) is probably a better index to have. The query
that's been causing us some problems is this one:

    SELECT BugJob.*
      FROM BugJob, Job
     WHERE BugJob.bug = 1
       AND BugJob.job_type = 1
       AND BugJob.job = Job.id
       AND Job.id IN (
           SELECT Job.id
             FROM Job WHERE Job.status = 1
              AND (
                  Job.lease_expires IS NULL
                  OR Job.lease_expires < (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'))
              AND (
                  Job.scheduled_start IS NULL
                  OR Job.scheduled_start <= (
                      CURRENT_TIMESTAMP AT TIME ZONE 'UTC')))
     LIMIT 1

Which gets run when a new CalculateBugHeatJob gets created because we
only allow one ready CalculateBugHeatJob to exist per bug at once (I'd
love to be able to make (BugJob.job, BugJob.job_type) UNIQUE, but we
can't since there may be several jobs of a given type generated for a
Bug before they get gc'd).

> Allocated patch number is patch-2207-53-0.sql
>
> Indexes should be named with a __idx suffix, as per the following. If
> we are creating UNIQUE constraints instead, we will create them with a
> __key suffix.
>
> SET client_min_messages=ERROR;
>
> -- Indices for BugJob
> CREATE INDEX bugjob__job__idx ON BugJob(job);
> CREATE INDEX bugjob__bug__idx ON BugJob(bug); CREATE INDEX
> bugjob__job_type__idx ON BugJob(job_type);
>
> -- Indices for Job
> CREATE INDEX job__scheduled_start__idx ON
> Job(scheduled_start); CREATE INDEX job__lease_expires__idx ON
> Job(lease_expires);
>
> INSERT INTO LaunchpadDatabaseRevision VALUES (2207, 53, 0);
>

Okay, thanks. I'll update the patch appropriately.

On 28 April 2010 05:51, Stuart Bishop <stuart.bishop@canonical.com>
wrote:
> Review: Needs Information I can't recall why I let this table get
> created without indexes. Possibly because I didn't think it should get
> as big as it has (2.5 million jobs, with nearly 1 million of them
> completed over a month ago that nobody has bothered to garbage
> collect).

> Should BugJob.job be UNIQUE?

Yes.

> Should BugJob.bug be UNIQUE?

No.

> Should (BugJob.bug, BugJob.job) be UNIQUE?

Yes.

I think (bug, job_type) is probably a better index to have. The query
that's been causing us some problems is this one:

Okay, thanks. I'll update the patch appropriately.

« Back to merge proposal