Merge lp:~adeuring/launchpad/bug-1020443-2 into lp:launchpad

Proposed by Abel Deuring
Status: Merged
Approved by: Abel Deuring
Approved revision: no longer in the source branch.
Merged at revision: 15696
Proposed branch: lp:~adeuring/launchpad/bug-1020443-2
Merge into: lp:launchpad
Prerequisite: lp:~adeuring/launchpad/bug-1020443
Diff against target: 1030 lines (+446/-81)
19 files modified
database/schema/patch-2209-24-3.sql (+124/-0)
lib/lp/answers/doc/faq-vocabulary.txt (+1/-3)
lib/lp/answers/doc/faq.txt (+1/-1)
lib/lp/answers/model/faq.py (+1/-1)
lib/lp/answers/model/question.py (+18/-6)
lib/lp/answers/stories/this-is-a-faq.txt (+2/-2)
lib/lp/bugs/doc/bugtask-find-similar.txt (+1/-0)
lib/lp/bugs/doc/bugtask-search.txt (+58/-2)
lib/lp/bugs/model/bugtasksearch.py (+5/-2)
lib/lp/bugs/model/tests/test_bugtasksearch.py (+4/-2)
lib/lp/registry/browser/product.py (+1/-1)
lib/lp/registry/doc/vocabularies.txt (+11/-1)
lib/lp/registry/model/person.py (+8/-7)
lib/lp/registry/tests/test_person_vocabularies.py (+25/-1)
lib/lp/registry/tests/test_personset.py (+53/-2)
lib/lp/registry/tests/test_product_vocabularies.py (+10/-0)
lib/lp/registry/tests/test_projectgroup_vocabulary.py (+26/-0)
lib/lp/registry/vocabularies.py (+24/-20)
lib/lp/services/database/doc/textsearching.txt (+73/-30)
To merge this branch: bzr merge lp:~adeuring/launchpad/bug-1020443-2
Reviewer Review Type Date Requested Status
j.c.sackett (community) Approve
Stuart Bishop Pending
Review via email: mp+116876@code.launchpad.net

Commit message

Ignore the symbols "&|!" in full text searches; don't treat a leading '-' as the "NOT" operator. Use correct tsquery expressions for search result ranking.

Description of the change

This branch fixes numerous test failures which were triggered by the
changes from lp:~adeuring/launchpad/bug-1020443 (already approved).

In this branch, I dropped the questionble feature that the symbols
"&", "|", "!" are treaded as logical operators in full text search
queries.

This change had more side effects han I expected. DUring my work on
the fix I noticed two interesting bugs:

  - the ranking of search results was sometimes wrong
  - the words used in full text searches were sometimes stemmed twice
    -- and the stemming algorithm from the Postgres text search
    implementation is not idempotent, for certain words. Two examples
    I noticed in the tests:

    stemmed("extension") -> "extens"
    stemmed("extens") -> "exten"
    stemmed("cheese") -> "chees"
    stemmed("chees") -> "chee"

When a text is indexed after a change, the indexer stores the stem of
the words in the FTI; when a full text search expression is processed,
the procedure to_tsquery('searchtext') parses the search text into
tokens, and if a token is treated as a word, it is stemmed. Finally,
the stemmed search words are looked up in an FTI column.

So, if we have the word "cheese" in an indexed text, "chees" is stored
in the FTI. During a regular search for "cheese", "cheese" is passed
(via the procedure ftq()) to to_tsquery(), and to_tsquery returns a
tsquery object containing the word "chees".

The broken searches call the function nl_phrase_search(), which issue
qn SQL query like "SELECT ftq('cheese')", which returns a string
representation of a tsquery. nl_phrase_search() extracts the different
stemmed words from the result, builds all subsets having one element
less than the original set of stemmd words, creates AND expressions
for the words of these subsets and finnaly returns an OR expression
of the AND expressions. In other words: It builds a query where at most
one word from the original search term may be missing in the searched
texts.

The string returned by nl_phrase_search() was often passed again to
ftq(). This has two side effects:

  - ftq() calls to_tsquery() again, but now with the stemmed words,
    leading to mappings like "cheese" -> "chees" -> "chee" -- and
    the search will not return data having the original word "cheese".
    And if "chee" instead if "chees" is used in the "sort by rank"
    expression, we get the wrong sort order.
  - ftq() now converts the symbols "&|!" to " ". (See the MP for
    lp:~adeuring/launchpad/bug-1020443 for the reason of this change)
    This means that the '|' is converted into an implicit AND, hence
    the feature is dropped to find texts where one word from the search
    string may be mssing.

notes about the changes in detail:

lib/lp/answers/doc/faq-vocabulary.txt:

ranking bug.

lib/lp/answers/doc/faq.txt:

'|' no longer treated equivalent to "OR"

lib/lp/answers/model/faq.py:

Avoid "double calls" of ftq(). fti_search is the result of an
nl_phrase_search() call; these results can be directly used
as tsquery expressions. If a query string is used like

  'query&string'::tsquery

the words are not stemmed. (See chapter 8.11.2 of the Postgres 9.1
documentation.)

test: ./bin/test -vvt lib/lp/answers/doc/faq.txt
      ./bin/test -vvt lib/lp/answers/stories/this-is-a-faq.txt

lib/lp/answers/model/question.py

class QuestionSearch is the base class a number of other classes.
Most of them store search phrases provided by a user in
self.search_text, with one exception: class SimilarQuestionsSearch.
The class stores the result of nl_phrase_search() in self.search_text.

I added the flag self.nl_phrase_used to keep track of this difference.
The methods getConstraints() and getOrderByClause() now generate
SQL expressions that are useful both for "plain" search texts and
for texts processed by for nl_phrase_search().

test: ./bin/test -vvt lib/lp/answers/doc/questiontarget.txt

(no explicit additions or changes in this doc test, but the changes
described above fix failures.)

lib/lp/bugs/model/bugtasksearch.py

Correct treatment of BugTaskSearchParams.fast_searchtext, different
from BugTaskSearchParams.searchtext. The latter property stores
search text as provided by a user, the former stores the result of
an nl_phrase_search() call. fast_searchtext is now treated as
described above. I had to change lib/lp/bugs/doc/bugtask-search.txt
The first change in this file (line 138) showed an improper use of
fast_searchtext: That property is supposed to contain only stemmed
words, which are always lower case. With my changes to
bugtasksearch.py, the example fails. I added a few lines to explain
the difference between the the BugTaskSearchParams properties.

The change in lib/lp/bugs/doc/bugtask-find-similar.txt -- one more
search result -- is caused because the search word "cheese" is no
longer stemmed twice before being used in an SQL expression like
"table.fti @@ 'chees'::tsquery.

test: ./bin/test -vvt lib/lp/bugs/doc/bugtask-search.txt
      ./bin/test -vvt lib/lp/bugs/doc/bugtask-find-similar.txt
      ./bin/test -vvt lp.bugs.model.tests.test_bugtasksearch.*test_fast_fulltext_search

lib/lp/registry/browser/product.py

Product.search_results() builds a query where a number of words are
OR combined. The method used the no longer working operator '|';
now changed to 'OR'.

test: ./bin/test -vvt lib/lp/registry/stories/product/xx-product-add.txt

(yes -- testing this change only in a story is not strictly sufficient --
but the feature "search for any word" should have had it's own test before.
I can add a test if it's really necessary.)

lib/lp/registry/model/person.py

The methods find(text,...), findPerson(text,...), findTeam(text,...) of
class PersonSet called "text = text.lower()" before doing any further
processing. This breaks ftq() when the query includes the operators
AND, OR, NOT: They are only treated as operators if they are upper case.

Since these methods also use things like

      EmailAddress.email.lower().startswith(search_text)

where search_text should of course be lower case, the code looks now a
bit more complicated. I added a few tests to test_personset.py to
check that the variants "plain search text" and "lower case search text"
are properly used.

test: ./bin/test -vvt test_personset.*test_find
      ./bin/test -vvt lib/lp/registry/doc/vocabularies.txt

(the changed methods are used in "vocabulary filtering".)

lib/lp/registry/vocabularies.py

Vocabularies have a method search(text, ...), where text is used in
SQL expressions like "Product.name LIKE 'text'" as well as
"Product.fti @@ ftq('text')".

These methods used, like Person.findSomething(...), the lower
case variant for both expressions. I changed this for the FTI query.

test: ./bin/test registry -vvt test_person_vocabularies.*test_search_accepts_or_expressions
      ./bin/test registry -vvt test_product_vocabularies.*test_search_with_or_expression
      ./bin/test registry -vvt test_projectgroup_vocabulary

Lint: Lots of messages about moin headers in doc tests and related
doc test complaints, but none of them are related to my changes. (OK,
except that I added a moin header for consistency's sake to
lib/lp/bugs/doc/bugtask-search.txt.)

To post a comment you must log in.
Revision history for this message
j.c.sackett (jcsackett) wrote :

Abel--

Looks alright. Two possible issues I'll leave to you to resolve before landing.

#710 you define like_query as query.lower(), then redefine it as another derivative of query; was the second line meant to be replaced, or further modify like_query instead of query?

#720 Same thing

review: Approve
Revision history for this message
Abel Deuring (adeuring) wrote :

Jon, thanks for the reviwe!

On 26.07.2012 17:19, j.c.sackett wrote:
> Review: Approve
>
> Abel--
>
> Looks alright. Two possible issues I'll leave to you to resolve before landing.
>
> #710 you define like_query as query.lower(), then redefine it as another derivative of query; was the second line meant to be replaced, or further modify like_query instead of query?
>
> #720 Same thing
>

ouch... that should have been

            like_query = query.lower()
            like_query = "'%%' || %s || '%%'" % quote_like(like_query)

in both cases. Fixed.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'database/schema/patch-2209-24-3.sql'
2--- database/schema/patch-2209-24-3.sql 1970-01-01 00:00:00 +0000
3+++ database/schema/patch-2209-24-3.sql 2012-07-26 15:40:44 +0000
4@@ -0,0 +1,124 @@
5+-- Copyright 2012 Canonical Ltd. This software is licensed under the
6+-- GNU Affero General Public License version 3 (see the file LICENSE).
7+
8+SET client_min_messages=ERROR;
9+
10+CREATE OR REPLACE FUNCTION _ftq(text) RETURNS text
11+ LANGUAGE plpythonu IMMUTABLE STRICT
12+ AS $_$
13+ import re
14+
15+ # I think this method would be more robust if we used a real
16+ # tokenizer and parser to generate the query string, but we need
17+ # something suitable for use as a stored procedure which currently
18+ # means no external dependancies.
19+
20+ # Convert to Unicode
21+ query = args[0].decode('utf8')
22+ ## plpy.debug('1 query is %s' % repr(query))
23+
24+ # Replace tsquery operators with ' '.
25+ query = re.sub('[|&!]', ' ', query)
26+
27+ # Normalize whitespace
28+ query = re.sub("(?u)\s+"," ", query)
29+
30+ # Convert AND, OR, NOT to tsearch2 punctuation
31+ query = re.sub(r"(?u)\bAND\b", "&", query)
32+ query = re.sub(r"(?u)\bOR\b", "|", query)
33+ query = re.sub(r"(?u)\bNOT\b", " !", query)
34+ ## plpy.debug('2 query is %s' % repr(query))
35+
36+ # Deal with unwanted punctuation.
37+ # ':' is used in queries to specify a weight of a word.
38+ # '\' is treated differently in to_tsvector() and to_tsquery().
39+ punctuation = r'[:\\]'
40+ query = re.sub(r"(?u)%s+" % (punctuation,), " ", query)
41+ ## plpy.debug('3 query is %s' % repr(query))
42+
43+ # Now that we have handle case sensitive booleans, convert to lowercase
44+ query = query.lower()
45+
46+ # Remove unpartnered bracket on the left and right
47+ query = re.sub(r"(?ux) ^ ( [^(]* ) \)", r"(\1)", query)
48+ query = re.sub(r"(?ux) \( ( [^)]* ) $", r"(\1)", query)
49+
50+ # Remove spurious brackets
51+ query = re.sub(r"(?u)\(([^\&\|]*?)\)", r" \1 ", query)
52+ ## plpy.debug('5 query is %s' % repr(query))
53+
54+ # Insert & between tokens without an existing boolean operator
55+ # ( not proceeded by (|&!
56+ query = re.sub(r"(?u)(?<![\(\|\&\!])\s*\(", "&(", query)
57+ ## plpy.debug('6 query is %s' % repr(query))
58+ # ) not followed by )|&
59+ query = re.sub(r"(?u)\)(?!\s*(\)|\||\&|\s*$))", ")&", query)
60+ ## plpy.debug('6.1 query is %s' % repr(query))
61+ # Whitespace not proceded by (|&! not followed by &|
62+ query = re.sub(r"(?u)(?<![\(\|\&\!\s])\s+(?![\&\|\s])", "&", query)
63+ ## plpy.debug('7 query is %s' % repr(query))
64+
65+ # Detect and repair syntax errors - we are lenient because
66+ # this input is generally from users.
67+
68+ # Fix unbalanced brackets
69+ openings = query.count("(")
70+ closings = query.count(")")
71+ if openings > closings:
72+ query = query + " ) "*(openings-closings)
73+ elif closings > openings:
74+ query = " ( "*(closings-openings) + query
75+ ## plpy.debug('8 query is %s' % repr(query))
76+
77+ # Strip ' character that do not have letters on both sides
78+ query = re.sub(r"(?u)((?<!\w)'|'(?!\w))", "", query)
79+
80+ # Brackets containing nothing but whitespace and booleans, recursive
81+ last = ""
82+ while last != query:
83+ last = query
84+ query = re.sub(r"(?u)\([\s\&\|\!]*\)", "", query)
85+ ## plpy.debug('9 query is %s' % repr(query))
86+
87+ # An & or | following a (
88+ query = re.sub(r"(?u)(?<=\()[\&\|\s]+", "", query)
89+ ## plpy.debug('10 query is %s' % repr(query))
90+
91+ # An &, | or ! immediatly before a )
92+ query = re.sub(r"(?u)[\&\|\!\s]*[\&\|\!]+\s*(?=\))", "", query)
93+ ## plpy.debug('11 query is %s' % repr(query))
94+
95+ # An &,| or ! followed by another boolean.
96+ query = re.sub(r"(?ux) \s* ( [\&\|\!] ) [\s\&\|]+", r"\1", query)
97+ ## plpy.debug('12 query is %s' % repr(query))
98+
99+ # Leading & or |
100+ query = re.sub(r"(?u)^[\s\&\|]+", "", query)
101+ ## plpy.debug('13 query is %s' % repr(query))
102+
103+ # Trailing &, | or !
104+ query = re.sub(r"(?u)[\&\|\!\s]+$", "", query)
105+ ## plpy.debug('14 query is %s' % repr(query))
106+
107+ # If we have nothing but whitespace and tsearch2 operators,
108+ # return NULL.
109+ if re.search(r"(?u)^[\&\|\!\s\(\)]*$", query) is not None:
110+ return None
111+
112+ # Convert back to UTF-8
113+ query = query.encode('utf8')
114+ ## plpy.debug('15 query is %s' % repr(query))
115+
116+ return query or None
117+ $_$;
118+
119+CREATE OR REPLACE FUNCTION ftq(text) RETURNS pg_catalog.tsquery
120+ LANGUAGE plpythonu IMMUTABLE STRICT
121+ AS $_$
122+ p = plpy.prepare(
123+ "SELECT to_tsquery('default', _ftq($1)) AS x", ["text"])
124+ query = plpy.execute(p, args, 1)[0]["x"]
125+ return query or None
126+ $_$;
127+
128+INSERT INTO LaunchpadDatabaseRevision VALUES (2209, 24, 3);
129
130=== modified file 'lib/lp/answers/doc/faq-vocabulary.txt'
131--- lib/lp/answers/doc/faq-vocabulary.txt 2011-12-24 17:49:30 +0000
132+++ lib/lp/answers/doc/faq-vocabulary.txt 2012-07-26 15:40:44 +0000
133@@ -82,7 +82,5 @@
134 2
135 >>> for term in terms:
136 ... print term.title
137+ How do I install Extensions?
138 How do I troubleshoot problems with extensions/themes?
139- How do I install Extensions?
140-
141-
142
143=== modified file 'lib/lp/answers/doc/faq.txt'
144--- lib/lp/answers/doc/faq.txt 2011-12-24 17:49:30 +0000
145+++ lib/lp/answers/doc/faq.txt 2012-07-26 15:40:44 +0000
146@@ -232,7 +232,7 @@
147 >>> from lp.registry.interfaces.person import IPersonSet
148 >>> foo_bar = getUtility(IPersonSet).getByEmail('foo.bar@canonical.com')
149 >>> for faq in faqset.searchFAQs(
150- ... search_text='java | flash', owner=foo_bar):
151+ ... search_text='java OR flash', owner=foo_bar):
152 ... print '%s (%s)' % (faq.title, faq.target.displayname)
153 How do I install plugins (Shockwave, QuickTime, etc.)? (Mozilla Firefox)
154 How can I play MP3/Divx/DVDs/Quicktime/Realmedia files
155
156=== modified file 'lib/lp/answers/model/faq.py'
157--- lib/lp/answers/model/faq.py 2011-12-30 06:14:56 +0000
158+++ lib/lp/answers/model/faq.py 2012-07-26 15:40:44 +0000
159@@ -138,7 +138,7 @@
160 return FAQ.select(
161 '%s AND FAQ.fti @@ %s' % (target_constraint, quote(fti_search)),
162 orderBy=[
163- SQLConstant("-rank(FAQ.fti, ftq(%s))" % quote(fti_search)),
164+ SQLConstant("-rank(FAQ.fti, %s::tsquery)" % quote(fti_search)),
165 "-FAQ.date_created"])
166
167 @staticmethod
168
169=== modified file 'lib/lp/answers/model/question.py'
170--- lib/lp/answers/model/question.py 2011-12-30 06:14:56 +0000
171+++ lib/lp/answers/model/question.py 2012-07-26 15:40:44 +0000
172@@ -864,6 +864,7 @@
173 product=None, distribution=None, sourcepackagename=None,
174 project=None):
175 self.search_text = search_text
176+ self.nl_phrase_used = False
177
178 if zope_isinstance(status, DBItem):
179 self.status = [status]
180@@ -944,8 +945,12 @@
181 constraints = self.getTargetConstraints()
182
183 if self.search_text is not None:
184- constraints.append(
185- 'Question.fti @@ ftq(%s)' % quote(self.search_text))
186+ if self.nl_phrase_used:
187+ constraints.append(
188+ 'Question.fti @@ %s' % quote(self.search_text))
189+ else:
190+ constraints.append(
191+ 'Question.fti @@ ftq(%s)' % quote(self.search_text))
192
193 if self.status:
194 constraints.append('Question.status IN %s' % sqlvalues(
195@@ -1009,10 +1014,16 @@
196 elif sort is QuestionSort.RELEVANCY:
197 if self.search_text:
198 # SQLConstant is a workaround for bug 53455
199- return [SQLConstant(
200- "-rank(Question.fti, ftq(%s))" % quote(
201- self.search_text)),
202- "-Question.datecreated"]
203+ if self.nl_phrase_used:
204+ return [SQLConstant(
205+ "-rank(Question.fti, %s::tsquery)" % quote(
206+ self.search_text)),
207+ "-Question.datecreated"]
208+ else:
209+ return [SQLConstant(
210+ "-rank(Question.fti, ftq(%s))" % quote(
211+ self.search_text)),
212+ "-Question.datecreated"]
213 else:
214 return "-Question.datecreated"
215 elif sort is QuestionSort.RECENT_OWNER_ACTIVITY:
216@@ -1113,6 +1124,7 @@
217 # similarity search algorithm.
218 self.search_text = nl_phrase_search(
219 title, Question, " AND ".join(self.getTargetConstraints()))
220+ self.nl_phrase_used = True
221
222
223 class QuestionPersonSearch(QuestionSearch):
224
225=== modified file 'lib/lp/answers/stories/this-is-a-faq.txt'
226--- lib/lp/answers/stories/this-is-a-faq.txt 2011-12-23 23:44:59 +0000
227+++ lib/lp/answers/stories/this-is-a-faq.txt 2012-07-26 15:40:44 +0000
228@@ -66,8 +66,8 @@
229 ... print radio, label, link
230 >>> printFAQOptions(user_browser.contents)
231 (*) No existing FAQs are relevant
232+ ( ) 8: How do I install Extensions?
233 ( ) 9: How do I troubleshoot problems with extensions/themes?
234- ( ) 8: How do I install Extensions?
235
236 >>> print user_browser.getLink('How do I troubleshoot problems').url
237 http://answers.launchpad.dev/firefox/+faq/9
238@@ -157,8 +157,8 @@
239 >>> printFAQOptions(user_browser.contents)
240 ( ) No existing FAQs are relevant
241 (*) 10: How do I install plugins (Shockwave, QuickTime, etc.)?
242+ ( ) 8: How do I install Extensions?
243 ( ) 9: How do I troubleshoot problems with extensions/themes?
244- ( ) 8: How do I install Extensions?
245
246 He changes the message and click 'Link to FAQ'.
247
248
249=== modified file 'lib/lp/bugs/doc/bugtask-find-similar.txt'
250--- lib/lp/bugs/doc/bugtask-find-similar.txt 2012-04-13 21:04:08 +0000
251+++ lib/lp/bugs/doc/bugtask-find-similar.txt 2012-07-26 15:40:44 +0000
252@@ -142,6 +142,7 @@
253 ... distribution=test_distro)
254 >>> for bugtask in similar_bugs:
255 ... print bugtask.bug.title
256+ Nothing to do with cheese or sandwiches
257 This cheese sandwich should show up
258
259 >>> similar_bugs = getUtility(IBugTaskSet).findSimilar(
260
261=== modified file 'lib/lp/bugs/doc/bugtask-search.txt'
262--- lib/lp/bugs/doc/bugtask-search.txt 2012-05-14 07:30:13 +0000
263+++ lib/lp/bugs/doc/bugtask-search.txt 2012-07-26 15:40:44 +0000
264@@ -138,7 +138,7 @@
265
266 For example, there are no bugs with the word 'Fnord' in Firefox.
267
268- >>> text_search = BugTaskSearchParams(user=None, fast_searchtext=u'Fnord')
269+ >>> text_search = BugTaskSearchParams(user=None, searchtext=u'Fnord')
270 >>> found_bugtasks = firefox.searchTasks(text_search)
271 >>> found_bugtasks.count()
272 0
273@@ -156,6 +156,63 @@
274 ... print "#%s" % bugtask.bug.id
275 #4
276
277+=== BugTaskSearchParams' parameters searchtext and fast_searchtext ===
278+
279+Normally, the parameter searchtext should be used. The alternative
280+parameter fast_searchtext requires a syntactically correct tsquery
281+expression containing stemmed words.
282+
283+A simple phrase can be passed as searchtext, but not as fast_searchtext,
284+see below.
285+
286+ >>> good_search = BugTaskSearchParams(
287+ ... user=None, searchtext=u'happens pretty often')
288+ >>> found_bugtasks = firefox.searchTasks(good_search)
289+ >>> for bugtask in found_bugtasks:
290+ ... print "#%s" % bugtask.bug.id
291+ #4
292+
293+The unstemmed word "happens" does not yield any results when used
294+as fast_textsearch.
295+
296+ >>> bad_search = BugTaskSearchParams(
297+ ... user=None, fast_searchtext=u'happens')
298+ >>> found_bugtasks = firefox.searchTasks(bad_search)
299+ >>> print found_bugtasks.count()
300+ 0
301+
302+If the stem of "happens" is used, we get results.
303+
304+ >>> good_search = BugTaskSearchParams(
305+ ... user=None, fast_searchtext=u'happen')
306+ >>> found_bugtasks = firefox.searchTasks(good_search)
307+ >>> for bugtask in found_bugtasks:
308+ ... print "#%s" % bugtask.bug.id
309+ #4
310+ #6
311+
312+Stemmed words may be combined into a valid tsquery expression.
313+
314+ >>> good_search = BugTaskSearchParams(
315+ ... user=None, fast_searchtext=u'happen&pretti&often')
316+ >>> found_bugtasks = firefox.searchTasks(good_search)
317+ >>> for bugtask in found_bugtasks:
318+ ... print "#%s" % bugtask.bug.id
319+ #4
320+
321+Passing invalid tsquery expressions as fast_searchtext raises an exception.
322+
323+ >>> bad_search = BugTaskSearchParams(
324+ ... user=None, fast_searchtext=u'happens pretty often')
325+ >>> list(firefox.searchTasks(bad_search))
326+ Traceback (most recent call last):
327+ ...
328+ ProgrammingError: syntax error in tsquery: "happens pretty often"
329+ ...
330+
331+ >>> import transaction
332+ >>> transaction.abort()
333+
334
335 == Searching by bug reporter ==
336
337@@ -340,7 +397,6 @@
338 bugs are found for him:
339
340 >>> from lp.registry.interfaces.person import IPersonSet
341- >>> import transaction
342 >>> transaction.abort()
343
344 >>> no_priv = getUtility(IPersonSet).getByName('no-priv')
345
346=== modified file 'lib/lp/bugs/model/bugtasksearch.py'
347--- lib/lp/bugs/model/bugtasksearch.py 2012-07-16 01:24:26 +0000
348+++ lib/lp/bugs/model/bugtasksearch.py 2012-07-26 15:40:44 +0000
349@@ -862,18 +862,21 @@
350 assert params.searchtext is None, (
351 'Cannot use searchtext at the same time as fast_searchtext.')
352 searchtext = params.fast_searchtext
353+ fti_expression = "?::tsquery"
354 else:
355 assert params.fast_searchtext is None, (
356 'Cannot use fast_searchtext at the same time as searchtext.')
357 searchtext = params.searchtext
358+ fti_expression = "ftq(?)"
359
360 if params.orderby is None:
361 # Unordered search results aren't useful, so sort by relevance
362 # instead.
363 params.orderby = [
364- SQL("-rank(BugTaskFlat.fti, ftq(?))", params=(searchtext,))]
365+ SQL("-rank(BugTaskFlat.fti, %s)" % fti_expression,
366+ params=(searchtext,))]
367
368- return SQL("BugTaskFlat.fti @@ ftq(?)", params=(searchtext,))
369+ return SQL("BugTaskFlat.fti @@ %s" % fti_expression, params=(searchtext,))
370
371
372 def _build_status_clause(col, status):
373
374=== modified file 'lib/lp/bugs/model/tests/test_bugtasksearch.py'
375--- lib/lp/bugs/model/tests/test_bugtasksearch.py 2012-07-24 04:05:21 +0000
376+++ lib/lp/bugs/model/tests/test_bugtasksearch.py 2012-07-26 15:40:44 +0000
377@@ -285,9 +285,11 @@
378
379 def test_fast_fulltext_search(self):
380 # Fast full text searches find text indexed by Bug.fti...
381+ # Note that a valid tsquery expression with stemmed words must
382+ # be specified.
383 self.setUpFullTextSearchTests()
384 params = self.getBugTaskSearchParams(
385- user=None, fast_searchtext=u'one title')
386+ user=None, fast_searchtext=u'one&titl')
387 self.assertSearchFinds(params, self.bugtasks[:1])
388
389 def test_tags(self):
390@@ -750,7 +752,7 @@
391 # Someone without permission to see deactiveated projects does
392 # not see bugtasks for deactivated projects.
393 bugtask_set = getUtility(IBugTaskSet)
394- param = BugTaskSearchParams(user=None, fast_searchtext=u'Monkeys')
395+ param = BugTaskSearchParams(user=None, searchtext=u'Monkeys')
396 results = bugtask_set.search(param, _noprejoins=True)
397 self.assertEqual([self.active_bugtask], list(results))
398
399
400=== modified file 'lib/lp/registry/browser/product.py'
401--- lib/lp/registry/browser/product.py 2012-07-13 08:29:56 +0000
402+++ lib/lp/registry/browser/product.py 2012-07-26 15:40:44 +0000
403@@ -205,7 +205,7 @@
404 )
405
406
407-OR = '|'
408+OR = ' OR '
409 SPACE = ' '
410
411
412
413=== modified file 'lib/lp/registry/doc/vocabularies.txt'
414--- lib/lp/registry/doc/vocabularies.txt 2012-07-06 19:36:03 +0000
415+++ lib/lp/registry/doc/vocabularies.txt 2012-07-26 15:40:44 +0000
416@@ -684,6 +684,11 @@
417 >>> checked_count == len(INACTIVE_ACCOUNT_STATUSES)
418 True
419
420+It is possible to search for alternative names.
421+
422+ >>> [person.name for person in vocab.search('matsubara OR salgado')]
423+ [u'matsubara', u'salgado']
424+
425
426 AdminMergeablePerson
427 --------------------
428@@ -738,6 +743,11 @@
429 >>> fooperson in vocab
430 False
431
432+It is possible to search for alternative names.
433+
434+ >>> [(p.name) for p in vocab.search('matsubara OR salgado')]
435+ [u'matsubara', u'salgado']
436+
437
438 ValidPersonOrTeam
439 .................
440@@ -1050,7 +1060,7 @@
441 [(u'testing Spanish team', u'Carlos Perell\xf3 Mar\xedn')]
442
443 >>> sorted((team.displayname, team.teamowner.displayname)
444- ... for team in vocab.search('spanish | ubuntu'))
445+ ... for team in vocab.search('spanish OR ubuntu'))
446 [(u'Mirror Administrators', u'Mark Shuttleworth'),
447 (u'Ubuntu Gnome Team', u'Mark Shuttleworth'),
448 (u'Ubuntu Security Team', u'Colin Watson'),
449
450=== modified file 'lib/lp/registry/model/person.py'
451--- lib/lp/registry/model/person.py 2012-07-16 20:42:12 +0000
452+++ lib/lp/registry/model/person.py 2012-07-26 15:40:44 +0000
453@@ -3483,7 +3483,8 @@
454 return EmptyResultSet()
455
456 orderBy = Person._sortingColumnsForSetOperations
457- text = ensure_unicode(text).lower()
458+ text = ensure_unicode(text)
459+ lower_case_text = text.lower()
460 # Teams may not have email addresses, so we need to either use a LEFT
461 # OUTER JOIN or do a UNION between four queries. Using a UNION makes
462 # it a lot faster than with a LEFT OUTER JOIN.
463@@ -3493,7 +3494,7 @@
464 EmailAddress.person == Person.id,
465 Person.account == Account.id,
466 Not(Account.status.is_in(INACTIVE_ACCOUNT_STATUSES)),
467- EmailAddress.email.lower().startswith(text))
468+ EmailAddress.email.lower().startswith(lower_case_text))
469
470 store = IStore(Person)
471
472@@ -3516,7 +3517,7 @@
473
474 results = results.union(store.find(
475 Person, person_name_query)).order_by()
476- team_email_query = self._teamEmailQuery(text)
477+ team_email_query = self._teamEmailQuery(lower_case_text)
478 results = results.union(
479 store.find(Person, team_email_query)).order_by()
480 team_name_query = self._teamNameQuery(text)
481@@ -3530,7 +3531,7 @@
482 must_have_email=False, created_after=None, created_before=None):
483 """See `IPersonSet`."""
484 orderBy = Person._sortingColumnsForSetOperations
485- text = ensure_unicode(text).lower()
486+ text = ensure_unicode(text)
487 store = IStore(Person)
488 base_query = And(
489 Person.teamowner == None,
490@@ -3570,7 +3571,7 @@
491 email_query = And(
492 base_query,
493 EmailAddress.person == Person.id,
494- EmailAddress.email.lower().startswith(ensure_unicode(text)))
495+ EmailAddress.email.lower().startswith(text.lower()))
496
497 name_query = And(
498 base_query,
499@@ -3583,11 +3584,11 @@
500 def findTeam(self, text=""):
501 """See `IPersonSet`."""
502 orderBy = Person._sortingColumnsForSetOperations
503- text = ensure_unicode(text).lower()
504+ text = ensure_unicode(text)
505 # Teams may not have email addresses, so we need to either use a LEFT
506 # OUTER JOIN or do a UNION between two queries. Using a UNION makes
507 # it a lot faster than with a LEFT OUTER JOIN.
508- email_query = self._teamEmailQuery(text)
509+ email_query = self._teamEmailQuery(text.lower())
510 store = IStore(Person)
511 email_results = store.find(Person, email_query).order_by()
512 name_query = self._teamNameQuery(text)
513
514=== modified file 'lib/lp/registry/tests/test_person_vocabularies.py'
515--- lib/lp/registry/tests/test_person_vocabularies.py 2012-07-20 03:15:04 +0000
516+++ lib/lp/registry/tests/test_person_vocabularies.py 2012-07-26 15:40:44 +0000
517@@ -25,6 +25,7 @@
518 from lp.services.webapp.vocabulary import FilteredVocabularyBase
519 from lp.testing import (
520 login_person,
521+ person_logged_in,
522 StormStatementRecorder,
523 TestCaseWithFactory,
524 )
525@@ -188,6 +189,22 @@
526 name="fredteam", email="fredteam@foo.com")
527 self._team_filter_tests([team])
528
529+ def test_search_accepts_or_expressions(self):
530+ person = self.factory.makePerson(name='baz')
531+ team = self.factory.makeTeam(name='blah')
532+ result = list(self.searchVocabulary(None, 'baz OR blah'))
533+ self.assertEqual([person, team], result)
534+ private_team_one = self.factory.makeTeam(
535+ name='private-eye', visibility=PersonVisibility.PRIVATE,
536+ owner=person)
537+ private_team_two = self.factory.makeTeam(
538+ name='paranoid', visibility=PersonVisibility.PRIVATE,
539+ owner=person)
540+ with person_logged_in(person):
541+ result = list(
542+ self.searchVocabulary(None, 'paranoid OR private-eye'))
543+ self.assertEqual([private_team_one, private_team_two], result)
544+
545
546 class TestValidPersonOrTeamPreloading(VocabularyTestBase,
547 TestCaseWithFactory):
548@@ -373,13 +390,20 @@
549
550 def test_unvalidated_emails_ignored(self):
551 person = self.factory.makePerson()
552- unvalidated_email = self.factory.makeEmail(
553+ self.factory.makeEmail(
554 'fnord@example.com',
555 person,
556 email_status=EmailAddressStatus.NEW)
557 search = self.searchVocabulary(None, 'fnord@example.com')
558 self.assertEqual([], [s for s in search])
559
560+ def test_search_accepts_or_expressions(self):
561+ team_one = self.factory.makeTeam(name='baz')
562+ team_two = self.factory.makeTeam(name='blah')
563+ result = list(self.searchVocabulary(None, 'baz OR blah'))
564+ self.assertEqual([team_one, team_two], result)
565+
566+
567 class TestNewPillarGranteeVocabulary(VocabularyTestBase,
568 TestCaseWithFactory):
569 """Test that the NewPillarGranteeVocabulary behaves as expected."""
570
571=== modified file 'lib/lp/registry/tests/test_personset.py'
572--- lib/lp/registry/tests/test_personset.py 2012-06-07 21:38:31 +0000
573+++ lib/lp/registry/tests/test_personset.py 2012-07-26 15:40:44 +0000
574@@ -117,13 +117,13 @@
575
576 def test_getByEmail_ignores_unvalidated_emails(self):
577 person = self.factory.makePerson()
578- unvalidated_email = self.factory.makeEmail(
579+ self.factory.makeEmail(
580 'fnord@example.com',
581 person,
582 email_status=EmailAddressStatus.NEW)
583 found = self.person_set.getByEmail('fnord@example.com')
584 self.assertTrue(found is None)
585-
586+
587 def test_getPrecachedPersonsFromIDs(self):
588 # The getPrecachedPersonsFromIDs() method should only make one
589 # query to load all the extraneous data. Accessing the
590@@ -183,6 +183,57 @@
591 self.person_set.getByOpenIDIdentifier(
592 u'http://not.launchpad.dev/+id/%s' % identifier))
593
594+ def test_find__accepts_queries_with_or_operator(self):
595+ # PersonSet.find() allows to search for OR combined terms.
596+ person_one = self.factory.makePerson(name='baz')
597+ person_two = self.factory.makeTeam(name='blah')
598+ result = list(self.person_set.find('baz OR blah'))
599+ self.assertEqual([person_one, person_two], result)
600+
601+ def test_findPerson__accepts_queries_with_or_operator(self):
602+ # PersonSet.findPerson() allows to search for OR combined terms.
603+ person_one = self.factory.makePerson(
604+ name='baz', email='one@example.org')
605+ person_two = self.factory.makePerson(
606+ name='blah', email='two@example.com')
607+ result = list(self.person_set.findPerson('baz OR blah'))
608+ self.assertEqual([person_one, person_two], result)
609+ # Note that these OR searches do not work for email addresses.
610+ result = list(self.person_set.findPerson(
611+ 'one@example.org OR two@example.org'))
612+ self.assertEqual([], result)
613+
614+ def test_findPerson__case_insensitive_email_address_search(self):
615+ # A search for email addresses is case insensitve.
616+ person_one = self.factory.makePerson(
617+ name='baz', email='ONE@example.org')
618+ person_two = self.factory.makePerson(
619+ name='blah', email='two@example.com')
620+ result = list(self.person_set.findPerson('one@example.org'))
621+ self.assertEqual([person_one], result)
622+ result = list(self.person_set.findPerson('TWO@example.com'))
623+ self.assertEqual([person_two], result)
624+
625+ def test_findTeam__accepts_queries_with_or_operator(self):
626+ # PersonSet.findTeam() allows to search for OR combined terms.
627+ team_one = self.factory.makeTeam(name='baz', email='ONE@example.org')
628+ team_two = self.factory.makeTeam(name='blah', email='TWO@example.com')
629+ result = list(self.person_set.findTeam('baz OR blah'))
630+ self.assertEqual([team_one, team_two], result)
631+ # Note that these OR searches do not work for email addresses.
632+ result = list(self.person_set.findTeam(
633+ 'one@example.org OR two@example.org'))
634+ self.assertEqual([], result)
635+
636+ def test_findTeam__case_insensitive_email_address_search(self):
637+ # A search for email addresses is case insensitve.
638+ team_one = self.factory.makeTeam(name='baz', email='ONE@example.org')
639+ team_two = self.factory.makeTeam(name='blah', email='TWO@example.com')
640+ result = list(self.person_set.findTeam('one@example.org'))
641+ self.assertEqual([team_one], result)
642+ result = list(self.person_set.findTeam('TWO@example.com'))
643+ self.assertEqual([team_two], result)
644+
645
646 class TestPersonSetMergeMailingListSubscriptions(TestCaseWithFactory):
647
648
649=== modified file 'lib/lp/registry/tests/test_product_vocabularies.py'
650--- lib/lp/registry/tests/test_product_vocabularies.py 2012-01-01 02:58:52 +0000
651+++ lib/lp/registry/tests/test_product_vocabularies.py 2012-07-26 15:40:44 +0000
652@@ -66,6 +66,16 @@
653 self.assertEqual(
654 [quux_product, bar_product], list(result))
655
656+ def test_search_with_or_expression(self):
657+ # Searches for either of two or more names are possible.
658+ blah_product = self.factory.makeProduct(
659+ name='blah', displayname='Blah', summary='Blah blather')
660+ baz_product = self.factory.makeProduct(
661+ name='baz', displayname='Baz')
662+ result = self.vocabulary.search('blah OR baz')
663+ self.assertEqual(
664+ [blah_product, baz_product], list(result))
665+
666 def test_exact_match_is_first(self):
667 # When the flag is enabled, an exact name match always wins.
668 the_quux_product = self.factory.makeProduct(
669
670=== added file 'lib/lp/registry/tests/test_projectgroup_vocabulary.py'
671--- lib/lp/registry/tests/test_projectgroup_vocabulary.py 1970-01-01 00:00:00 +0000
672+++ lib/lp/registry/tests/test_projectgroup_vocabulary.py 2012-07-26 15:40:44 +0000
673@@ -0,0 +1,26 @@
674+# Copyright 2012 Canonical Ltd. This software is licensed under the
675+# GNU Affero General Public License version 3 (see the file LICENSE).
676+
677+"""Test the ProjectGroup vocabulary."""
678+
679+__metaclass__ = type
680+
681+from lp.registry.vocabularies import ProjectGroupVocabulary
682+from lp.testing import TestCaseWithFactory
683+from lp.testing.layers import DatabaseFunctionalLayer
684+
685+
686+class TestProjectGroupVocabulary(TestCaseWithFactory):
687+ """Test that the ProjectGroupVocabulary behaves as expected."""
688+ layer = DatabaseFunctionalLayer
689+
690+ def test_search_with_or_expression(self):
691+ # Searches for either of two or more names are possible.
692+ blah_group = self.factory.makeProject(
693+ name='blah', displayname='Blah', summary='Blah blather')
694+ baz_group = self.factory.makeProject(
695+ name='baz', displayname='Baz')
696+ vocabulary = ProjectGroupVocabulary()
697+ result = vocabulary.search('blah OR baz')
698+ self.assertEqual(
699+ [blah_group, baz_group], list(result))
700
701=== modified file 'lib/lp/registry/vocabularies.py'
702--- lib/lp/registry/vocabularies.py 2012-07-20 03:15:04 +0000
703+++ lib/lp/registry/vocabularies.py 2012-07-26 15:40:44 +0000
704@@ -304,8 +304,9 @@
705 if query is None or an empty string.
706 """
707 if query:
708- query = ensure_unicode(query).lower()
709- like_query = "'%%' || %s || '%%'" % quote_like(query)
710+ query = ensure_unicode(query)
711+ like_query = query.lower()
712+ like_query = "'%%' || %s || '%%'" % quote_like(like_query)
713 fti_query = quote(query)
714 sql = "active = 't' AND (name LIKE %s OR fti @@ ftq(%s))" % (
715 like_query, fti_query)
716@@ -354,8 +355,9 @@
717 if query is None or an empty string.
718 """
719 if query:
720- query = ensure_unicode(query).lower()
721- like_query = "'%%' || %s || '%%'" % quote_like(query)
722+ query = ensure_unicode(query)
723+ like_query = query.lower()
724+ like_query = "'%%' || %s || '%%'" % quote_like(like_query)
725 fti_query = quote(query)
726 sql = "active = 't' AND (name LIKE %s OR fti @@ ftq(%s))" % (
727 like_query, fti_query)
728@@ -435,7 +437,7 @@
729 if not text:
730 return self.emptySelectResults()
731
732- return self._select(ensure_unicode(text).lower())
733+ return self._select(ensure_unicode(text))
734
735
736 class PersonAccountToMergeVocabulary(
737@@ -469,7 +471,7 @@
738 if not text:
739 return self.emptySelectResults()
740
741- text = ensure_unicode(text).lower()
742+ text = ensure_unicode(text)
743 return self._select(text)
744
745
746@@ -649,14 +651,15 @@
747 FROM (
748 SELECT Person.id,
749 (case
750- when person.name=? then 100
751- when person.name like ? || '%%' then 0.6
752- when lower(person.displayname) like ? || '%%' then 0.5
753+ when person.name=lower(?) then 100
754+ when person.name like lower(?) || '%%' then 0.6
755+ when lower(person.displayname) like lower(?)
756+ || '%%' then 0.5
757 else rank(fti, ftq(?))
758 end) as rank
759 FROM Person
760- WHERE Person.name LIKE ? || '%%'
761- or lower(Person.displayname) LIKE ? || '%%'
762+ WHERE Person.name LIKE lower(?) || '%%'
763+ or lower(Person.displayname) LIKE lower(?) || '%%'
764 or Person.fti @@ ftq(?)
765 UNION ALL
766 SELECT Person.id, 0.8 AS rank
767@@ -667,7 +670,7 @@
768 SELECT Person.id, 0.4 AS rank
769 FROM Person, EmailAddress
770 WHERE Person.id = EmailAddress.person
771- AND LOWER(EmailAddress.email) LIKE ? || '%%'
772+ AND LOWER(EmailAddress.email) LIKE lower(?) || '%%'
773 AND status IN (?, ?)
774 ) AS person_match
775 GROUP BY id, is_private_team
776@@ -680,9 +683,10 @@
777 private_tables = [Person] + private_tables
778 private_ranking_sql = SQL("""
779 (case
780- when person.name=? then 100
781- when person.name like ? || '%%' then 0.6
782- when lower(person.displayname) like ? || '%%' then 0.5
783+ when person.name=lower(?) then 100
784+ when person.name like lower(?) || '%%' then 0.6
785+ when lower(person.displayname) like lower(?)
786+ || '%%' then 0.5
787 else rank(fti, ftq(?))
788 end) as rank
789 """, (text, text, text, text))
790@@ -696,8 +700,8 @@
791 SQL("true as is_private_team")),
792 where=And(
793 SQL("""
794- Person.name LIKE ? || '%%'
795- OR lower(Person.displayname) LIKE ? || '%%'
796+ Person.name LIKE lower(?) || '%%'
797+ OR lower(Person.displayname) LIKE lower(?) || '%%'
798 OR Person.fti @@ ftq(?)
799 """, [text, text, text]),
800 private_query))
801@@ -792,7 +796,7 @@
802 else:
803 return self.emptySelectResults()
804
805- text = ensure_unicode(text).lower()
806+ text = ensure_unicode(text)
807 return self._doSearch(text=text, vocab_filter=vocab_filter)
808
809 def searchForTerms(self, query=None, vocab_filter=None):
810@@ -838,8 +842,8 @@
811 result = self.store.using(*tables).find(Person, query)
812 else:
813 name_match_query = SQL("""
814- Person.name LIKE ? || '%%'
815- OR lower(Person.displayname) LIKE ? || '%%'
816+ Person.name LIKE lower(?) || '%%'
817+ OR lower(Person.displayname) LIKE lower(?) || '%%'
818 OR Person.fti @@ ftq(?)
819 """, [text, text, text]),
820
821
822=== modified file 'lib/lp/services/database/doc/textsearching.txt'
823--- lib/lp/services/database/doc/textsearching.txt 2012-06-26 09:40:38 +0000
824+++ lib/lp/services/database/doc/textsearching.txt 2012-07-26 15:40:44 +0000
825@@ -172,23 +172,16 @@
826 >>> ftq('hi AND mom')
827 hi&mom <=> 'hi' & 'mom'
828
829- >>> ftq('hi & mom')
830- hi&mom <=> 'hi' & 'mom'
831-
832 >>> ftq('hi OR mom')
833 hi|mom <=> 'hi' | 'mom'
834
835- >>> ftq('hi | mom')
836- hi|mom <=> 'hi' | 'mom'
837-
838- >>> ftq('hi & -dad')
839+ >>> ftq('hi AND NOT dad')
840 hi&!dad <=> 'hi' & !'dad'
841
842
843-
844 Brackets are allowed to specify precidence
845
846- >>> ftq('(HI OR HELLO) & mom')
847+ >>> ftq('(HI OR HELLO) AND mom')
848 (hi|hello)&mom <=> ( 'hi' | 'hello' ) & 'mom'
849
850 >>> ftq('Hi(Mom)')
851@@ -203,19 +196,16 @@
852 >>> ftq('foo(bar OR baz)') # Bug #32071
853 foo&(bar|baz) <=> 'foo' & ( 'bar' | 'baz' )
854
855- >>> ftq('foo (bar OR baz)')
856- foo&(bar|baz) <=> 'foo' & ( 'bar' | 'baz' )
857-
858
859 We also support negation
860
861- >>> ftq('!Hi')
862+ >>> ftq('NOT Hi')
863 !hi <=> !'hi'
864
865- >>> ftq('-(Hi & Mom)')
866+ >>> ftq('NOT(Hi AND Mom)')
867 !(hi&mom) <=> !( 'hi' & 'mom' )
868
869- >>> ftq('Foo & ! Bar')
870+ >>> ftq('Foo AND NOT Bar')
871 foo&!bar <=> 'foo' & !'bar'
872
873
874@@ -224,7 +214,7 @@
875 >>> ftq('Hi Mom')
876 hi&mom <=> 'hi' & 'mom'
877
878- >>> ftq('Hi -mom')
879+ >>> ftq('Hi NOT mom')
880 hi&!mom <=> 'hi' & !'mom'
881
882 >>> ftq('hi (mom OR mum)')
883@@ -233,18 +223,34 @@
884 >>> ftq('(hi OR hello) mom')
885 (hi|hello)&mom <=> ( 'hi' | 'hello' ) & 'mom'
886
887- >>> ftq('(hi OR hello) -mom')
888+ >>> ftq('(hi OR hello) NOT mom')
889 (hi|hello)&!mom <=> ( 'hi' | 'hello' ) & !'mom'
890
891 >>> ftq('(hi ho OR hoe) work go')
892 (hi&ho|hoe)&work&go <=> ( 'hi' & 'ho' | 'hoe' ) & 'work' & 'go'
893
894
895-If a single '-' precedes a word, it is converted into the '!' operator.
896-Note also that a trailing '-' is dropped by to_tsquery().
897-
898- >>> ftq('-foo bar-')
899- !foo&bar- <=> !'foo' & 'bar'
900+'-' symbols are treated by the Postgres FTI parser context sensitive.
901+If they precede a word, they are removed.
902+
903+ >>> print search_same('foo -bar')
904+ FTI data: 'bar':2 'foo':1
905+ query: 'foo' & 'bar'
906+ match: True
907+
908+If a '-' precedes a number, it is retained.
909+
910+ >>> print search_same('123 -456')
911+ FTI data: '-456':2 '123':1
912+ query: '123' & '-456'
913+ match: True
914+
915+Trailing '-' are always ignored.
916+
917+ >>> print search_same('bar- 123-')
918+ FTI data: '123':2 'bar':1
919+ query: 'bar' & '123'
920+ match: True
921
922 Repeated '-' are simply ignored by to_tsquery().
923
924@@ -259,6 +265,12 @@
925 query: 'foo-bar' & 'foo' & 'bar'
926 match: True
927
928+A '-' surrounded by numbers is treated as the sign of the right-hand number.
929+
930+ >>> print search_same('123-456')
931+ FTI data: '-456':2 '123':1
932+ query: '123' & '-456'
933+ match: True
934
935 Punctuation is handled consistently. If a string containing punctuation
936 appears in an FTI, it can also be passed to ftq(),and a search for this
937@@ -342,11 +354,36 @@
938 >>> print search('some text <div>whatever</div>', 'div')
939 FTI data: 'text':2 'whatev':3 query: 'div' match: False
940
941-Treatment of characters that are used as operators in to_tsquery():
942+The symbols '&', '|' and '!' are treated as operators by to_tsquery();
943+to_tsvector() treats them as whitespace. ftq() converts the words 'AND',
944+'OR', 'NOT' are into these operators expected by to_tsquery(), and it
945+replaces the symbols '&', '|' and '!' with spaces. This avoids
946+surprising search results when the operator symbols appear accidentally
947+in search terms, e.g., by using a plain copy of a source code line as
948+the search term.
949
950 >>> ftq('cool!')
951 cool <=> 'cool'
952
953+ >>> print search_same('Shell scripts usually start with #!/bin/sh.')
954+ FTI data: '/bin/sh':6 'script':2 'shell':1 'start':4 'usual':3
955+ query: 'shell' & 'script' & 'usual' & 'start' & '/bin/sh'
956+ match: True
957+
958+ >>> print search_same('int foo = (bar & ! baz) | bla;')
959+ FTI data: 'bar':3 'baz':4 'bla':5 'foo':2 'int':1
960+ query: 'int' & 'foo' & 'bar' & 'baz' & 'bla'
961+ match: True
962+
963+Queries containing only punctuation symbols yield an empty ts_query
964+object. Note that _ftq() first replaces the '!' with a ' '; later on,
965+_ftq() joins the two remaining terms '?' and '.' with the "AND"
966+operator '&'. Finally, to_tsquery() detects the AND combination of
967+two symbols that are not tokenized and returns null.
968+
969+ >>> ftq('?!.') # Bug 1020443
970+ ?&. <=> None
971+
972 Email addresses are retained as a whole, both by to_tsvector() and by
973 ftq().
974
975@@ -430,11 +467,17 @@
976 >>> ftq("administrate")
977 administrate <=> 'administr'
978
979+Note that stemming is not always idempotent:
980+
981+ >>> ftq('extension')
982+ extension <=> 'extens'
983+ >>> ftq('extens')
984+ extens <=> 'exten'
985
986 Dud queries are 'repaired', such as doubled operators, trailing operators
987 or invalid leading operators
988
989- >>> ftq('hi & OR mom')
990+ >>> ftq('hi AND OR mom')
991 hi&mom <=> 'hi' & 'mom'
992
993 >>> ftq('(hi OR OR hello) AND mom')
994@@ -443,7 +486,7 @@
995 >>> ftq('(hi OR AND hello) AND mom')
996 (hi|hello)&mom <=> ( 'hi' | 'hello' ) & 'mom'
997
998- >>> ftq('(hi OR -AND hello) AND mom')
999+ >>> ftq('(hi OR NOT AND hello) AND mom')
1000 (hi|!hello)&mom <=> ( 'hi' | !'hello' ) & 'mom'
1001
1002 >>> ftq('(hi OR - AND hello) AND mom')
1003@@ -452,13 +495,13 @@
1004 >>> ftq('hi AND mom AND')
1005 hi&mom <=> 'hi' & 'mom'
1006
1007- >>> ftq('& hi & mom')
1008+ >>> ftq('AND hi AND mom')
1009 hi&mom <=> 'hi' & 'mom'
1010
1011- >>> ftq('(& hi | hello) AND mom')
1012+ >>> ftq('(AND hi OR hello) AND mom')
1013 (hi|hello)&mom <=> ( 'hi' | 'hello' ) & 'mom'
1014
1015- >>> ftq('() hi mom ( ) ((! |((&)))) :-)')
1016+ >>> ftq('() hi mom ( ) ((NOT OR((AND)))) :-)')
1017 (hi&mom&-) <=> 'hi' & 'mom'
1018
1019 >>> ftq("(hi mom")
1020@@ -502,10 +545,10 @@
1021
1022 Bug #160236
1023
1024- >>> ftq("foo&&bar-baz")
1025+ >>> ftq("foo AND AND bar-baz")
1026 foo&bar-baz <=> 'foo' & 'bar-baz' & 'bar' & 'baz'
1027
1028- >>> ftq("foo||bar.baz")
1029+ >>> ftq("foo OR OR bar.baz")
1030 foo|bar.baz <=> 'foo' | 'bar.baz'
1031
1032