Merge lp:~zorba-coders/zorba/bug-897800 into lp:zorba

Proposed by Paul J. Lucas
Status: Superseded
Proposed branch: lp:~zorba-coders/zorba/bug-897800
Merge into: lp:zorba
Diff against target: 164 lines (+36/-19)
5 files modified
src/runtime/full_text/icu_tokenizer.cpp (+30/-17)
src/runtime/full_text/icu_tokenizer.h (+2/-2)
test/rbkt/ExpQueryResults/zorba/fulltext/ft-same-sentence-false-2.xml.res (+1/-0)
test/rbkt/Queries/CMakeLists.txt (+1/-0)
test/rbkt/Queries/zorba/fulltext/ft-same-sentence-false-2.xq (+2/-0)
To merge this branch: bzr merge lp:~zorba-coders/zorba/bug-897800
Reviewer Review Type Date Requested Status
Paul J. Lucas Approve
Matthias Brantner Pending
Review via email: mp+84530@code.launchpad.net

This proposal supersedes a proposal from 2011-11-30.

Description of the change

Fixed.

To post a comment you must log in.
Revision history for this message
Paul J. Lucas (paul-lucas) : Posted in a previous version of this proposal
review: Approve
Revision history for this message
Matthias Brantner (matthias-brantner) : Posted in a previous version of this proposal
review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

The attempt to merge lp:~paul-lucas/zorba/bug-897800 into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job bug-897800-2011-12-01T03-37-10.592Z is finished. The
  final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

The attempt to merge lp:~paul-lucas/zorba/bug-897800 into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job bug-897800-2011-12-05T18-17-03.87Z is finished. The
  final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

The attempt to merge lp:~paul-lucas/zorba/bug-897800 into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job bug-897800-2011-12-05T18-58-04.34Z is finished. The
  final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Paul J. Lucas (paul-lucas) :
review: Approve
lp:~zorba-coders/zorba/bug-897800 updated
10560. By Matthias Brantner

marked test as expected failure

10561. By Paul J. Lucas

Merge.

10562. By Paul J. Lucas

Merge.

Unmerged revisions

10562. By Paul J. Lucas

Merge.

10561. By Paul J. Lucas

Merge.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'src/runtime/full_text/icu_tokenizer.cpp'
--- src/runtime/full_text/icu_tokenizer.cpp 2011-12-01 11:02:25 +0000
+++ src/runtime/full_text/icu_tokenizer.cpp 2011-12-06 00:11:25 +0000
@@ -69,7 +69,7 @@
69 void send( void *payload, Tokenizer::Callback &callback ) {69 void send( void *payload, Tokenizer::Callback &callback ) {
70 if ( !empty() ) {70 if ( !empty() ) {
71# if DEBUG_TOKENIZER71# if DEBUG_TOKENIZER
72 cout << "TOKEN: \"" << value_ << "\"\n";72 cout << "TOKEN: \"" << value_ << "\" (" << pos_ << ',' << sent_ << ',' << para_ << ")\n";
73# endif73# endif
74 callback( value_.data(), value_.size(), pos_, sent_, para_, payload );74 callback( value_.data(), value_.size(), pos_, sent_, para_, payload );
75 clear();75 clear();
@@ -131,7 +131,7 @@
131 Locale const &icu_locale = get_icu_locale_for( lang );131 Locale const &icu_locale = get_icu_locale_for( lang );
132 UErrorCode status = U_ZERO_ERROR;132 UErrorCode status = U_ZERO_ERROR;
133133
134 word_.reset(134 word_it_.reset(
135 dynamic_cast<RuleBasedBreakIterator*>(135 dynamic_cast<RuleBasedBreakIterator*>(
136 BreakIterator::createWordInstance( icu_locale, status )136 BreakIterator::createWordInstance( icu_locale, status )
137 )137 )
@@ -139,7 +139,7 @@
139 if ( U_FAILURE( status ) )139 if ( U_FAILURE( status ) )
140 throw ZORBA_EXCEPTION( zerr::ZXQP0036_BREAKITERATOR_CREATION_FAILED );140 throw ZORBA_EXCEPTION( zerr::ZXQP0036_BREAKITERATOR_CREATION_FAILED );
141141
142 sent_.reset(142 sent_it_.reset(
143 dynamic_cast<RuleBasedBreakIterator*>(143 dynamic_cast<RuleBasedBreakIterator*>(
144 BreakIterator::createSentenceInstance( Locale::getUS(), status )144 BreakIterator::createSentenceInstance( Locale::getUS(), status )
145 )145 )
@@ -199,11 +199,12 @@
199 // This unicode::string wraps the existing buffer: no copy is made.199 // This unicode::string wraps the existing buffer: no copy is made.
200 unicode::string const utf16_s( false, utf16_buf, utf16_len );200 unicode::string const utf16_s( false, utf16_buf, utf16_len );
201201
202 word_->setText( utf16_s );202 word_it_->setText( utf16_s );
203 unicode::size_type word_start = word_->first(), word_end = word_->next();203 unicode::size_type word_start = word_it_->first();
204 unicode::size_type word_end = word_it_->next();
204205
205 sent_->setText( utf16_s );206 sent_it_->setText( utf16_s );
206 unicode::size_type sent_end = sent_->first(); sent_end = sent_->next();207 unicode::size_type sent_end = sent_it_->first(); sent_end = sent_it_->next();
207208
208 temp_token t;209 temp_token t;
209210
@@ -227,10 +228,11 @@
227 }228 }
228 unique_ptr<utf8::storage_type[]> const auto_utf8_buf( utf8_buf );229 unique_ptr<utf8::storage_type[]> const auto_utf8_buf( utf8_buf );
229230
230 zstring_b utf8_word;231 zstring_b utf8_word; // used only for debugging & error reporting
231 utf8_word.wrap_memory( utf8_buf, utf8_len );232 utf8_word.wrap_memory( utf8_buf, utf8_len );
232233# if DEBUG_TOKENIZER
233 unicode::size_type const rule_status = word_->getRuleStatus();234 cout << "GOT: \"" << utf8_word << "\" ";
235# endif
234236
235 //237 //
236 // "Junk" tokens are whitespace and punctuation -- except some punctuation238 // "Junk" tokens are whitespace and punctuation -- except some punctuation
@@ -238,10 +240,7 @@
238 //240 //
239 bool is_junk = false;241 bool is_junk = false;
240242
241# if DEBUG_TOKENIZER243 int32_t const rule_status = word_it_->getRuleStatus();
242 cout << "GOT: \"" << utf8_word << "\" ";
243# endif
244
245 if ( IS_WORD_BREAK( NONE, rule_status ) ) {244 if ( IS_WORD_BREAK( NONE, rule_status ) ) {
246 //245 //
247 // "NONE" tokens are what ICU calls whitespace and punctuation.246 // "NONE" tokens are what ICU calls whitespace and punctuation.
@@ -289,7 +288,7 @@
289 default:288 default:
290 in_wild = false;289 in_wild = false;
291 }290 }
292 }291 } // if ( wildcards )
293 is_junk = true;292 is_junk = true;
294 }293 }
295294
@@ -350,10 +349,16 @@
350 t.send( payload, callback );349 t.send( payload, callback );
351350
352set_token:351set_token:
352# if DEBUG_TOKENIZER
353 cout << "at set_token" << endl;
354# endif
353 if ( !is_junk ) {355 if ( !is_junk ) {
354 if ( in_wild || got_backslash )356 if ( in_wild || got_backslash )
355 t.append( utf8_buf, utf8_len );357 t.append( utf8_buf, utf8_len );
356 else {358 else {
359# if DEBUG_TOKENIZER
360 cout << "setting token" << endl;
361# endif
357 t.set(362 t.set(
358 utf8_buf, utf8_len, numbers().token, numbers().sent, numbers().para363 utf8_buf, utf8_len, numbers().token, numbers().sent, numbers().para
359 );364 );
@@ -362,9 +367,14 @@
362 }367 }
363368
364next:369next:
365 word_start = word_end, word_end = word_->next();370# if DEBUG_TOKENIZER
371 cout << "at next" << endl;
372# endif
373 word_start = word_end, word_end = word_it_->next();
366 if ( word_end >= sent_end && sent_end != BreakIterator::DONE ) {374 if ( word_end >= sent_end && sent_end != BreakIterator::DONE ) {
367 sent_end = sent_->next();375 sent_end = sent_it_->next();
376 // The addition of the "if" fixes:
377 // https://bugs.launchpad.net/bugs/863320
368 if ( sent_end != BreakIterator::DONE )378 if ( sent_end != BreakIterator::DONE )
369 ++numbers().sent;379 ++numbers().sent;
370 }380 }
@@ -375,6 +385,9 @@
375 err::FTDY0020, ERROR_PARAMS( "", ZED( UnbalancedChar_3 ), '}' )385 err::FTDY0020, ERROR_PARAMS( "", ZED( UnbalancedChar_3 ), '}' )
376 );386 );
377 t.send( payload, callback );387 t.send( payload, callback );
388 // Incrementing "sent" here fixes:
389 // https://bugs.launchpad.net/bugs/897800
390 ++numbers().sent;
378}391}
379392
380///////////////////////////////////////////////////////////////////////////////393///////////////////////////////////////////////////////////////////////////////
381394
=== modified file 'src/runtime/full_text/icu_tokenizer.h'
--- src/runtime/full_text/icu_tokenizer.h 2011-09-05 02:06:22 +0000
+++ src/runtime/full_text/icu_tokenizer.h 2011-12-06 00:11:25 +0000
@@ -55,8 +55,8 @@
55 typedef std::unique_ptr<RuleBasedBreakIterator> rbbi_ptr;55 typedef std::unique_ptr<RuleBasedBreakIterator> rbbi_ptr;
5656
57 locale::iso639_1::type const lang_;57 locale::iso639_1::type const lang_;
58 rbbi_ptr word_;58 rbbi_ptr word_it_;
59 rbbi_ptr sent_;59 rbbi_ptr sent_it_;
60};60};
6161
62///////////////////////////////////////////////////////////////////////////////62///////////////////////////////////////////////////////////////////////////////
6363
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-same-sentence-false-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-same-sentence-false-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-same-sentence-false-2.xml.res 2011-12-06 00:11:25 +0000
@@ -0,0 +1,1 @@
1false
02
=== modified file 'test/rbkt/Queries/CMakeLists.txt'
--- test/rbkt/Queries/CMakeLists.txt 2011-10-26 13:43:15 +0000
+++ test/rbkt/Queries/CMakeLists.txt 2011-12-06 00:11:25 +0000
@@ -294,3 +294,4 @@
294294
295EXPECTED_FAILURE(test/rbkt/zorba/reference/reference_5 868640)295EXPECTED_FAILURE(test/rbkt/zorba/reference/reference_5 868640)
296296
297EXPECTED_FAILURE(test/rbkt/zorba/fulltext/ft-same-sentence-false-2 897800)
297298
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-same-sentence-false-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-same-sentence-false-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-same-sentence-false-2.xq 2011-12-06 00:11:25 +0000
@@ -0,0 +1,2 @@
1let $x := <msg>hello. world</msg>
2return $x contains text "hello" ftand "world" same sentence

Subscribers

People subscribed via source and target branches