Zorba

Sentence is incorrectly incremented when token characters end without sentence terminator, take 2

Bug #924063 reported by Paul J. Lucas on 2012-01-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Zorba	Fix Released	Medium	Paul J. Lucas

Bug Description

The original bug (bug #863320) was fixed, but then it caused other tests to fail (bug #897800), so the fix was reverted so the release could be done. This new bug is to fix the original bug without causing any other tests to fail.

The original bug was:

The following query:

let $x := <msg>hello world</msg>
return $x contains text "hello" ftand "world" same sentence

incorrectly returns "false" because tokenizer incorrectly increments the sentence number when there are no more characters without encountering a sentence terminating character.

Tags:

Related branches

lp:~paul-lucas/zorba/bug-924063

Merged into lp:zorba at revision 10637

Matthias Brantner: Approve on 2012-01-31

Paul J. Lucas: Approve on 2012-01-31

Paul J. Lucas (paul-lucas) on 2012-01-31

Changed in zorba:
status:	New → In Progress

Revision history for this message

Paul J. Lucas (paul-lucas) wrote on 2012-01-31:

It turns out that the original bug fixes were correct. It happens that ICU uses more than just sentence terminating characters (like '.') to know when a sentence ends: the first letter of the first word after the '.' has to be capitalized. Hence the tests were wrong, e.g., "hello. world". Once that test was changed to "Hello. World" it passed.

Paul J. Lucas (paul-lucas) on 2012-01-31

Changed in zorba:
status:	In Progress → Fix Committed

Paul J. Lucas (paul-lucas) on 2012-03-27

Changed in zorba:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.