HTML/XML tags are dropped from the Postgres full text index and from search queries.

Bug #1015519 reported by Abel Deuring
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
Low
Unassigned

Bug Description

Example:

    launchpad_dev=# select to_tsvector('aaa <div>bbb</div> ccc');
           to_tsvector
    -------------------------
     'aaa':1 'bbb':2 'ccc':3
    (1 row)

    launchpad_dev=# select to_tsquery('aaa & <div>bbb</div> & ccc');
          to_tsquery
    -----------------------
     'aaa' & 'bbb' & 'ccc'
    (1 row)

    launchpad_dev=# select to_tsquery('aaa & <div> & bbb & </div> & ccc');
          to_tsquery
    -----------------------
     'aaa' & 'bbb' & 'ccc'
    (1 row)

    launchpad_dev=# select ts_debug('aaa <div>bbb</div> ccc');
                                  ts_debug
    ---------------------------------------------------------------------
     (asciiword,"Word, all ASCII",aaa,{english_stem},english_stem,{aaa})
     (blank,"Space symbols"," ",{},,)
     (tag,"XML tag",<div>,{},,)
     (asciiword,"Word, all ASCII",bbb,{english_stem},english_stem,{bbb})
     (tag,"XML tag",</div>,{},,)
     (blank,"Space symbols"," ",{},,)
     (asciiword,"Word, all ASCII",ccc,{english_stem},english_stem,{ccc})
    (7 rows)

So, strings like '<div>' are treated as tokens of type "tag" -- but these
tokens do not appear in the FTI data, and they do not appear the the
result of to_tsquery().

Tags: search
Revision history for this message
Abel Deuring (adeuring) wrote :

Marked as "critical" since this bug describes one detail of the quite generic bug 29713, which itself is critical

Changed in launchpad:
importance: Undecided → Critical
status: New → Triaged
William Grant (wgrant)
Changed in launchpad:
importance: Critical → Low
tags: added: search
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.