Code review comment for lp:~cbehrens/nova/servers-search

Revision history for this message
Jay Pipes (jaypipes) wrote :

> > I recognize (and agree with your decision) not to do regexp matching via the
> database. Not only is
> > it not portable, it's not any more efficient to do that at the database
> level (still requires a
> > scan of all pre-restricted rows anyway...).
>
> Regular expressions are more expensive than LIKE matches (which in their own
> right, are pretty expensive).

Actually, this is incorrect. LIKE '%something%' and column REGEXP 'someregexp' will produce identical query execution plans. The complexity of the REGEXP determines whether or not a simple string match such as '%something%' would be computationally more expensive to execute per row than a compiled regexp match.

> Do we really want operators doing complex
> regexs? At that point we should be putting our data into a purpose-built
> search indexing solution like Lucene/Solr/ElasticSearch/Sphinx because that's
> what they're good at.

Lucene/Solar/ElasticSearch/Sphinx are fulltext indexing technologies. What's happening here is looking for a particular pattern in a short string. The solution presented here is flexible enough to query for various IP(v6) and name patterns without having to set up a separate fulltext indexing server for this kind of thing, which I think would be overboard.

I understand your concern about the regexp inefficiency. Just saying that it's not that much less efficient than doing a REGEXP or LIKE '%something%' expression in SQL. The same loop and match process is occurring in Python code versus C code. The problem is that not all DBs support the REGEXP operator...

Just my two cents,
-jay

« Back to merge proposal