Code review comment for lp:~lifeless/storm/with

Revision history for this message
Robert Collins (lifeless) wrote :

@jamesh
>2. How do we create a WithTable? It would need the following information:
> * an alias for this temporary table.
> * a list of the column names. Do we also want the column types?
> * a query object that generates the temporary table.

there are two ways we've used WITH so far:
 * to create a constraint - e.g. we calculate all the source package names published in a distroseries in ~300ms, and then we can use that as a constraint - with joining or doing a subquery on it. (this is the bugtask use of with). The performance win here is that we know a-priori that determining package names ad-hoc is slow - *most* packagenames have been published in ubuntu always, so cheaper to do a bulk lookup.

 * to get an actual table - see the (complex) use in branchmergeproposal: we have two aliases to BranchRevision, and are doing a DAG difference between the two aliases to find new revisions in the source; the 'source' with clause *is* the origin for the BranchRevision objects we want to return. the key points here, for me is that we want to populate BranchRevision objects in the cache, not some temporary class.

I don't know if this impacts what is needed to create a WithTable.

I haven't had cause to define column names for the temporary tables yet - one will want to minimise the data in the temp table anyway - a fat temp table isn't going to be fast.

> As far as the actual syntax goes, I'd think it common for users to use different variations on > the query on each use (Robert can probably verify whether this is correct), so your second
> suggestion of tying the query to a class defined using standard class syntax seems a bit
> verbose. Something where you could link the query to an existing template class would seem
> more natural.

If I understand this correctly, then yes - see the branchmergeproposal case above.

> One other question Robert might have some input on: is it actually beneficial to use a full
> table class to represent the "with" table? They may not have a well defined primary key, and
> the table will be gone once the query completes, so it isn't clear you'd want to clutter the
> object cache with rows from the table. And are there many cases where you'd want to return
> objects from a "with" table from a Store.find() call, or are they just for use in the WHERE
> clause?

We've only 4 data points so far, but I think we'll want the objects from the with table maybe 10-25% of the time: a significant but not overwhelming amount. Most of the time it will be just a constraint or graph traversal that is best done in a very narrow fashion.

@gustavo
> Please note that I wasn't suggesting using aliases. We might simply not get into the
> business of defining the query in detail, in a way similar to how we don't detail
> how the CREATE TABLE is run with its column names.

It would be nice to use query objects to define a query in a with - I really like the sound of that.

I don't think we'd want the example you have of a full Person table defined on top of a WITH clause: unless it loaded into the storm cache in the same place our regular Person table does (which IIRC would raise errors?).

@all
I'm still at a loss on how to code this up - it sounds nice but I need pointers: I got as far as I did with this on blind luck following the path of least resistance. I'm not asking for the code to be written for me, but I need to know where existing similar tests are that I can copy; how the compiler works so I can tell how to change it - or at least some hints pointing me in the appropriate directions...

« Back to merge proposal