Code review comment for ~ilasc/launchpad:add-get-build-by-store-revision-to-snap

Revision history for this message
Colin Watson (cjwatson) wrote :

My apologies for blithely saying this was just a simple query; I'd forgotten about the way in which store revisions are currently stored. This may be a bit more work than I initially thought. On the other hand: useful database engineering exercise, I guess! As penance, I've walked through my thought process in detail here.

You'll definitely need to find an alternative approach, as walking through all the builds for the snap recipe in Python will be much too slow for snap recipes with more than a few builds.

First, consider doing this in a single PostgreSQL query. This will be a bit fiddly because conceptually what you want is to join with `SnapBuildJob` but only consider those cases from the join where the `SnapBuildJob` is the one with the highest job ID for each snap recipe, and then look at the union of `SnapBuildJob.metadata` and `SnapBuild.store_upload_metadata` and extract the `store_revision` field from that: this emulates the behaviour of the `SnapBuild.store_upload_revision` property (drilling down into `SnapBuild.last_store_upload_job`, `SnapStoreUploadJob.store_revision`, and `SnapStoreUploadJob.store_metadata`). The tools you'd need for this would be PostgreSQL window functions (https://www.postgresql.org/docs/10/tutorial-window.html) and JSON functions (https://www.postgresql.org/docs/10/functions-json.html); a relevant example of using window functions is in `lp.scripts.garbo.SnapBuildJobPruner`.

Looking at this, though, I'm a bit concerned that it may not be possible to implement it efficiently with the current database schema, because PostgreSQL is going to have to walk through all the builds and jobs internally and extract all their JSON metadata fields, then cast them to `jsonb` (since these columns predate us being able to use PostgreSQL's `jsonb` type and are thus just `text`), concatenate `SnapBuildJob.metadata` and `SnapBuild.store_upload_metadata`, and extract the appropriate field from the result. I'm not convinced that it will be possible to construct an index that allows PostgreSQL to do that efficiently for snap recipes with large numbers of builds. It's probably worth having a go at constructing the appropriate query and throwing it at `EXPLAIN ANALYZE` on a database with a respectable number of builds for the relevant snap recipe to see if it can be indexed usefully, if nothing else because window functions are sometimes handy and it's a good opportunity to learn about them. However, there seems a good chance that this won't work out, so I'd advise timeboxing that exercise: no more than half a day or so.

So, the fallback option: let's consider denormalizing `SnapBuild.store_upload_revision` (that is, adding an explicit database column for it on `SnapBuild` that's maintained by either PostgreSQL trigger functions or more likely by Python code called from anywhere that might alter the result of the current `SnapBuild.store_upload_revision` property). We could then add an index on `SnapBuild (snap, store_upload_revision)` that would efficiently serve this query. Note that we'll need to backfill this column using a garbo job before it can be used. This approach seems much more likely to work, since it keeps the actual query run by `Snap.getBuildByStoreRevision` simple.

review: Needs Fixing

« Back to merge proposal