Comment 2 for bug 1020443

Revision history for this message
Abel Deuring (adeuring) wrote :

We can fix this bug in two ways:

(1) insert spaces on the left and right side of the operators:

select _ftq('? ! .');
 _ftq
------
 ?&!.
(1 row)

So, the required '&' is inserted.

launchpad_dev=# select ftq('? ! .');
NOTICE: text-search query contains only stop words or doesn't
contain lexemes, ignored
CONTEXT: SQL statement "SELECT to_tsquery('default', _ftq($1)) AS x"
PL/Python function "ftq"
 ftq
-----

(1 row)

(2) just replace the operators '&', '|', '!' with spaces in search terms.

I favour the second option: Launchpad is about software development,
so we can expect that a text like

   Shell scripts usually start with #!/bin/sh

appears in an indexed column and that it is used as a search term.

This is indexed as:

select to_tsvector('Shell scripts usually start with #!/bin/sh');
                     to_tsvector
------------------------------------------------------
 '/bin/sh':6 'script':2 'shell':1 'start':4 'usual':3

Using the example text directly as a search expression leads currently
to the OOPS described in this bug. If we insert spaces around the '!',
we get this ts_query:

select ftq('Shell scripts usually start with # ! /bin/sh');
                         ftq
-----------------------------------------------------
 'shell' & 'script' & 'usual' & 'start' & !'/bin/sh'
(1 row)

So, the indexed text will _not_ be found because of the !'/bin/sh'.

Similar oddities might occur for other typical source code lines:

select to_tsvector('x = y | !z');
    to_tsvector
-------------------
 'x':1 'y':2 'z':3

select ftq('x = y | !z');
       ftq
------------------
 'x' & 'y' | !'z'

If we first remove '!', '&', '|' from search terms and then replace only
"AND", "OR", "NOT" with the respective operators, we avoid most of
these oddities, because the words a stop words and thus do not appear
in the FTI data. (This still leaves a problem when somebody tries to
search for an SQL expression like "SELECT ... FROM ... WHERE a=5 AND
(b=6 OR c=7)". I think we can avoid suprises here only by mentioning this
problem on a help page and recommend to remove the anyway useless
stop word from the search term or to use lowercase text...

Another note, not related to the OOPS, but related to possibly suprising
search results: We should not replace a leading '-' with a '!':

select to_tsvector('123 -546');
   to_tsvector
------------------
 '-546':2 '123':1

launchpad_dev=# select ftq('123 -546');
      ftq
----------------
 '123' & !'546'

So again a "non-match".