On 28 April 2010 05:51, Stuart Bishop <email address hidden>
wrote:
> Review: Needs Information I can't recall why I let this table get
> created without indexes. Possibly because I didn't think it should get
> as big as it has (2.5 million jobs, with nearly 1 million of them
> completed over a month ago that nobody has bothered to garbage
> collect).
There are 2 million rows in BugJob? That's suboptimal; I'll file a bug
to add a task to garbo-daily to expunge old jobs (we don't need them
once they've run, so we can probably remove all but the last 24 hours'
worth (if we even need those)).
> Should BugJob.job be UNIQUE?
Yes.
> Should BugJob.bug be UNIQUE?
No.
> Should (BugJob.bug, BugJob.job) be UNIQUE?
Yes.
> To know if the indexes are going to be used, I'll need some more
> information about the queries being made. In particular, if we search
> for 'jobs for bug NNN with type ZZZ' we should create an index on
> (bug, job_type) rather than separate indexes on bug and job_type. If
> we are querying for 'All bug jobs of type ZZZ' then we should separate
> the indexes.
I think (bug, job_type) is probably a better index to have. The query
that's been causing us some problems is this one:
SELECT BugJob.*
FROM BugJob, Job
WHERE BugJob.bug = 1
AND BugJob.job_type = 1
AND BugJob.job = Job.id
AND Job.id IN (
SELECT Job.id
FROM Job WHERE Job.status = 1
AND ( Job.lease_expires IS NULL OR Job.lease_expires < (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'))
AND ( Job.scheduled_start IS NULL OR Job.scheduled_start <= ( CURRENT_TIMESTAMP AT TIME ZONE 'UTC')))
LIMIT 1
Which gets run when a new CalculateBugHeatJob gets created because we
only allow one ready CalculateBugHeatJob to exist per bug at once (I'd
love to be able to make (BugJob.job, BugJob.job_type) UNIQUE, but we
can't since there may be several jobs of a given type generated for a
Bug before they get gc'd).
> Allocated patch number is patch-2207-53-0.sql
>
> Indexes should be named with a __idx suffix, as per the following. If
> we are creating UNIQUE constraints instead, we will create them with a
> __key suffix.
>
> SET client_min_messages=ERROR;
>
> -- Indices for BugJob
> CREATE INDEX bugjob__job__idx ON BugJob(job);
> CREATE INDEX bugjob__bug__idx ON BugJob(bug); CREATE INDEX
> bugjob__job_type__idx ON BugJob(job_type);
>
> -- Indices for Job
> CREATE INDEX job__scheduled_start__idx ON
> Job(scheduled_start); CREATE INDEX job__lease_expires__idx ON
> Job(lease_expires);
>
> INSERT INTO LaunchpadDatabaseRevision VALUES (2207, 53, 0);
>
Okay, thanks. I'll update the patch appropriately.
On 28 April 2010 05:51, Stuart Bishop <email address hidden>
wrote:
> Review: Needs Information I can't recall why I let this table get
> created without indexes. Possibly because I didn't think it should get
> as big as it has (2.5 million jobs, with nearly 1 million of them
> completed over a month ago that nobody has bothered to garbage
> collect).
There are 2 million rows in BugJob? That's suboptimal; I'll file a bug
to add a task to garbo-daily to expunge old jobs (we don't need them
once they've run, so we can probably remove all but the last 24 hours'
worth (if we even need those)).
> Should BugJob.job be UNIQUE?
Yes.
> Should BugJob.bug be UNIQUE?
No.
> Should (BugJob.bug, BugJob.job) be UNIQUE?
Yes.
> To know if the indexes are going to be used, I'll need some more
> information about the queries being made. In particular, if we search
> for 'jobs for bug NNN with type ZZZ' we should create an index on
> (bug, job_type) rather than separate indexes on bug and job_type. If
> we are querying for 'All bug jobs of type ZZZ' then we should separate
> the indexes.
I think (bug, job_type) is probably a better index to have. The query
that's been causing us some problems is this one:
SELECT BugJob.*
Job. lease_expires IS NULL
OR Job.lease_expires < (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'))
Job. scheduled_ start IS NULL
OR Job.scheduled_start <= (
CURRENT_ TIMESTAMP AT TIME ZONE 'UTC')))
FROM BugJob, Job
WHERE BugJob.bug = 1
AND BugJob.job_type = 1
AND BugJob.job = Job.id
AND Job.id IN (
SELECT Job.id
FROM Job WHERE Job.status = 1
AND (
AND (
LIMIT 1
Which gets run when a new CalculateBugHeatJob gets created because we
only allow one ready CalculateBugHeatJob to exist per bug at once (I'd
love to be able to make (BugJob.job, BugJob.job_type) UNIQUE, but we
can't since there may be several jobs of a given type generated for a
Bug before they get gc'd).
> Allocated patch number is patch-2207-53-0.sql min_messages= ERROR; _job_type_ _idx ON BugJob(job_type); start__ idx ON start); CREATE INDEX job__lease_ expires_ _idx ON seRevision VALUES (2207, 53, 0);
>
> Indexes should be named with a __idx suffix, as per the following. If
> we are creating UNIQUE constraints instead, we will create them with a
> __key suffix.
>
> SET client_
>
> -- Indices for BugJob
> CREATE INDEX bugjob__job__idx ON BugJob(job);
> CREATE INDEX bugjob__bug__idx ON BugJob(bug); CREATE INDEX
> bugjob_
>
> -- Indices for Job
> CREATE INDEX job__scheduled_
> Job(scheduled_
> Job(lease_expires);
>
> INSERT INTO LaunchpadDataba
>
Okay, thanks. I'll update the patch appropriately.