Merge lp:~zorba-coders/zorba/bug-897800 into lp:zorba

Proposed by Paul J. Lucas
Status: Superseded
Proposed branch: lp:~zorba-coders/zorba/bug-897800
Merge into: lp:zorba
Diff against target: 164 lines (+36/-19)
5 files modified
src/runtime/full_text/icu_tokenizer.cpp (+30/-17)
src/runtime/full_text/icu_tokenizer.h (+2/-2)
test/rbkt/ExpQueryResults/zorba/fulltext/ft-same-sentence-false-2.xml.res (+1/-0)
test/rbkt/Queries/CMakeLists.txt (+1/-0)
test/rbkt/Queries/zorba/fulltext/ft-same-sentence-false-2.xq (+2/-0)
To merge this branch: bzr merge lp:~zorba-coders/zorba/bug-897800
Reviewer Review Type Date Requested Status
Paul J. Lucas Approve
Matthias Brantner Pending
Review via email: mp+84530@code.launchpad.net

This proposal supersedes a proposal from 2011-11-30.

Description of the change

Fixed.

To post a comment you must log in.
Revision history for this message
Paul J. Lucas (paul-lucas) : Posted in a previous version of this proposal
review: Approve
Revision history for this message
Matthias Brantner (matthias-brantner) : Posted in a previous version of this proposal
review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

The attempt to merge lp:~paul-lucas/zorba/bug-897800 into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job bug-897800-2011-12-01T03-37-10.592Z is finished. The
  final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

The attempt to merge lp:~paul-lucas/zorba/bug-897800 into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job bug-897800-2011-12-05T18-17-03.87Z is finished. The
  final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

The attempt to merge lp:~paul-lucas/zorba/bug-897800 into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job bug-897800-2011-12-05T18-58-04.34Z is finished. The
  final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Paul J. Lucas (paul-lucas) :
review: Approve
lp:~zorba-coders/zorba/bug-897800 updated
10560. By Matthias Brantner

marked test as expected failure

10561. By Paul J. Lucas

Merge.

10562. By Paul J. Lucas

Merge.

Unmerged revisions

10562. By Paul J. Lucas

Merge.

10561. By Paul J. Lucas

Merge.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/runtime/full_text/icu_tokenizer.cpp'
2--- src/runtime/full_text/icu_tokenizer.cpp 2011-12-01 11:02:25 +0000
3+++ src/runtime/full_text/icu_tokenizer.cpp 2011-12-06 00:11:25 +0000
4@@ -69,7 +69,7 @@
5 void send( void *payload, Tokenizer::Callback &callback ) {
6 if ( !empty() ) {
7 # if DEBUG_TOKENIZER
8- cout << "TOKEN: \"" << value_ << "\"\n";
9+ cout << "TOKEN: \"" << value_ << "\" (" << pos_ << ',' << sent_ << ',' << para_ << ")\n";
10 # endif
11 callback( value_.data(), value_.size(), pos_, sent_, para_, payload );
12 clear();
13@@ -131,7 +131,7 @@
14 Locale const &icu_locale = get_icu_locale_for( lang );
15 UErrorCode status = U_ZERO_ERROR;
16
17- word_.reset(
18+ word_it_.reset(
19 dynamic_cast<RuleBasedBreakIterator*>(
20 BreakIterator::createWordInstance( icu_locale, status )
21 )
22@@ -139,7 +139,7 @@
23 if ( U_FAILURE( status ) )
24 throw ZORBA_EXCEPTION( zerr::ZXQP0036_BREAKITERATOR_CREATION_FAILED );
25
26- sent_.reset(
27+ sent_it_.reset(
28 dynamic_cast<RuleBasedBreakIterator*>(
29 BreakIterator::createSentenceInstance( Locale::getUS(), status )
30 )
31@@ -199,11 +199,12 @@
32 // This unicode::string wraps the existing buffer: no copy is made.
33 unicode::string const utf16_s( false, utf16_buf, utf16_len );
34
35- word_->setText( utf16_s );
36- unicode::size_type word_start = word_->first(), word_end = word_->next();
37+ word_it_->setText( utf16_s );
38+ unicode::size_type word_start = word_it_->first();
39+ unicode::size_type word_end = word_it_->next();
40
41- sent_->setText( utf16_s );
42- unicode::size_type sent_end = sent_->first(); sent_end = sent_->next();
43+ sent_it_->setText( utf16_s );
44+ unicode::size_type sent_end = sent_it_->first(); sent_end = sent_it_->next();
45
46 temp_token t;
47
48@@ -227,10 +228,11 @@
49 }
50 unique_ptr<utf8::storage_type[]> const auto_utf8_buf( utf8_buf );
51
52- zstring_b utf8_word;
53+ zstring_b utf8_word; // used only for debugging & error reporting
54 utf8_word.wrap_memory( utf8_buf, utf8_len );
55-
56- unicode::size_type const rule_status = word_->getRuleStatus();
57+# if DEBUG_TOKENIZER
58+ cout << "GOT: \"" << utf8_word << "\" ";
59+# endif
60
61 //
62 // "Junk" tokens are whitespace and punctuation -- except some punctuation
63@@ -238,10 +240,7 @@
64 //
65 bool is_junk = false;
66
67-# if DEBUG_TOKENIZER
68- cout << "GOT: \"" << utf8_word << "\" ";
69-# endif
70-
71+ int32_t const rule_status = word_it_->getRuleStatus();
72 if ( IS_WORD_BREAK( NONE, rule_status ) ) {
73 //
74 // "NONE" tokens are what ICU calls whitespace and punctuation.
75@@ -289,7 +288,7 @@
76 default:
77 in_wild = false;
78 }
79- }
80+ } // if ( wildcards )
81 is_junk = true;
82 }
83
84@@ -350,10 +349,16 @@
85 t.send( payload, callback );
86
87 set_token:
88+# if DEBUG_TOKENIZER
89+ cout << "at set_token" << endl;
90+# endif
91 if ( !is_junk ) {
92 if ( in_wild || got_backslash )
93 t.append( utf8_buf, utf8_len );
94 else {
95+# if DEBUG_TOKENIZER
96+ cout << "setting token" << endl;
97+# endif
98 t.set(
99 utf8_buf, utf8_len, numbers().token, numbers().sent, numbers().para
100 );
101@@ -362,9 +367,14 @@
102 }
103
104 next:
105- word_start = word_end, word_end = word_->next();
106+# if DEBUG_TOKENIZER
107+ cout << "at next" << endl;
108+# endif
109+ word_start = word_end, word_end = word_it_->next();
110 if ( word_end >= sent_end && sent_end != BreakIterator::DONE ) {
111- sent_end = sent_->next();
112+ sent_end = sent_it_->next();
113+ // The addition of the "if" fixes:
114+ // https://bugs.launchpad.net/bugs/863320
115 if ( sent_end != BreakIterator::DONE )
116 ++numbers().sent;
117 }
118@@ -375,6 +385,9 @@
119 err::FTDY0020, ERROR_PARAMS( "", ZED( UnbalancedChar_3 ), '}' )
120 );
121 t.send( payload, callback );
122+ // Incrementing "sent" here fixes:
123+ // https://bugs.launchpad.net/bugs/897800
124+ ++numbers().sent;
125 }
126
127 ///////////////////////////////////////////////////////////////////////////////
128
129=== modified file 'src/runtime/full_text/icu_tokenizer.h'
130--- src/runtime/full_text/icu_tokenizer.h 2011-09-05 02:06:22 +0000
131+++ src/runtime/full_text/icu_tokenizer.h 2011-12-06 00:11:25 +0000
132@@ -55,8 +55,8 @@
133 typedef std::unique_ptr<RuleBasedBreakIterator> rbbi_ptr;
134
135 locale::iso639_1::type const lang_;
136- rbbi_ptr word_;
137- rbbi_ptr sent_;
138+ rbbi_ptr word_it_;
139+ rbbi_ptr sent_it_;
140 };
141
142 ///////////////////////////////////////////////////////////////////////////////
143
144=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-same-sentence-false-2.xml.res'
145--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-same-sentence-false-2.xml.res 1970-01-01 00:00:00 +0000
146+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-same-sentence-false-2.xml.res 2011-12-06 00:11:25 +0000
147@@ -0,0 +1,1 @@
148+false
149
150=== modified file 'test/rbkt/Queries/CMakeLists.txt'
151--- test/rbkt/Queries/CMakeLists.txt 2011-10-26 13:43:15 +0000
152+++ test/rbkt/Queries/CMakeLists.txt 2011-12-06 00:11:25 +0000
153@@ -294,3 +294,4 @@
154
155 EXPECTED_FAILURE(test/rbkt/zorba/reference/reference_5 868640)
156
157+EXPECTED_FAILURE(test/rbkt/zorba/fulltext/ft-same-sentence-false-2 897800)
158
159=== added file 'test/rbkt/Queries/zorba/fulltext/ft-same-sentence-false-2.xq'
160--- test/rbkt/Queries/zorba/fulltext/ft-same-sentence-false-2.xq 1970-01-01 00:00:00 +0000
161+++ test/rbkt/Queries/zorba/fulltext/ft-same-sentence-false-2.xq 2011-12-06 00:11:25 +0000
162@@ -0,0 +1,2 @@
163+let $x := <msg>hello. world</msg>
164+return $x contains text "hello" ftand "world" same sentence

Subscribers

People subscribed via source and target branches