Sentence is incorrectly incremented when token characters end without sentence terminator, take 2

Bug #924063 reported by Paul J. Lucas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zorba
Fix Released
Medium
Paul J. Lucas

Bug Description

The original bug (bug #863320) was fixed, but then it caused other tests to fail (bug #897800), so the fix was reverted so the release could be done. This new bug is to fix the original bug without causing any other tests to fail.

The original bug was:

The following query:

let $x := <msg>hello world</msg>
return $x contains text "hello" ftand "world" same sentence

incorrectly returns "false" because tokenizer incorrectly increments the sentence number when there are no more characters without encountering a sentence terminating character.

Tags: full-text

Related branches

Changed in zorba:
status: New → In Progress
Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

It turns out that the original bug fixes were correct. It happens that ICU uses more than just sentence terminating characters (like '.') to know when a sentence ends: the first letter of the first word after the '.' has to be capitalized. Hence the tests were wrong, e.g., "hello. world". Once that test was changed to "Hello. World" it passed.

Changed in zorba:
status: In Progress → Fix Committed
Changed in zorba:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.