Merge lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba

Proposed by Paul J. Lucas
Status: Merged
Approved by: Matthias Brantner
Approved revision: 10856
Merged at revision: 10908
Proposed branch: lp:~paul-lucas/zorba/feature-ft_bw
Merge into: lp:zorba
Diff against target: 2563 lines (+758/-670)
25 files modified
ChangeLog (+2/-0)
include/zorba/tokenizer.h (+1/-1)
modules/com/zorba-xquery/www/modules/full-text.xq (+43/-7)
src/functions/func_ft_module_impl.cpp (+32/-3)
src/functions/func_ft_module_impl.h (+20/-0)
src/functions/function_consts.h (+3/-1)
src/runtime/full_text/CMakeLists.txt (+1/-0)
src/runtime/full_text/apply.h (+4/-0)
src/runtime/full_text/ft_module_impl.cpp (+226/-125)
src/runtime/full_text/ft_module_util.cpp (+57/-0)
src/runtime/full_text/ft_module_util.h (+80/-0)
src/runtime/full_text/ft_util.cpp (+24/-0)
src/runtime/full_text/ft_util.h (+17/-1)
src/runtime/full_text/pregenerated/ft_module.cpp (+0/-463)
src/runtime/full_text/pregenerated/ft_module.h (+64/-10)
src/runtime/full_text/tokenizer.cpp (+6/-16)
src/runtime/json/jsonml_array.cpp (+7/-14)
src/runtime/pregenerated/iterator_enum.h (+1/-0)
src/runtime/spec/full_text/ft_module.xml (+63/-24)
src/runtime/visitors/pregenerated/planiter_visitor.h (+7/-0)
src/runtime/visitors/pregenerated/printer_visitor.cpp (+15/-0)
src/runtime/visitors/pregenerated/printer_visitor.h (+5/-0)
src/util/xml_util.h (+37/-5)
test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res (+1/-0)
test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq (+42/-0)
To merge this branch: bzr merge lp:~paul-lucas/zorba/feature-ft_bw
Reviewer Review Type Date Requested Status
Matthias Brantner Approve
Paul J. Lucas Approve
Review via email: mp+112811@code.launchpad.net

Commit message

Added tokenize-nodes() function.

Description of the change

Added tokenize-nodes() function.

To post a comment you must log in.
Revision history for this message
Paul J. Lucas (paul-lucas) :
review: Approve
Revision history for this message
Matthias Brantner (matthias-brantner) wrote :

- the changelog says that it's a new function but it has been there before
- ft:tokenize-nodes#2 comment is confusing. Why does it say
The default
74 + : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
75 + : is assumed to be the one returned by <code>ft:current-lang()</code>
in between the two pragmas.

review: Needs Fixing
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job feature-ft_bw-2012-06-29T17-53-01.219Z is finished. The final status was:

All tests succeeded!

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1, Needs Fixing < 1, Pending < 1. Got: 1 Approve, 1 Needs Fixing.

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

> - the changelog says that it's a new function but it has been there before

No it hasn't.

> - ft:tokenize-nodes#2 comment is confusing. Why does it say
> The default
> 74 + : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
> 75 + : is assumed to be the one returned by <code>ft:current-
> lang()</code>
> in between the two pragmas.

Because it specifies the language for the $includes. It's part of the $includes documentation.

Revision history for this message
Matthias Brantner (matthias-brantner) :
review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job feature-ft_bw-2012-06-29T22-57-52.686Z is finished. The final status was:

All tests succeeded!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog'
2--- ChangeLog 2012-06-29 13:25:20 +0000
3+++ ChangeLog 2012-06-29 16:57:20 +0000
4@@ -4,8 +4,10 @@
5 version 2.x
6
7 New Features:
8+
9 * Item::isSeekable API extension for streamable content (xs:string and xs:base64Binary).
10 * Implemented the latest W3C specification for the group by clause
11+ * Added ft:tokenize-nodes() function to full-text module
12 * New XQuery 3.0 functions
13 - fn:parse-xml-fragment#1
14 * Added support for transient maps to the http://www.zorba-xquery.com/modules/store/data-structures/unordered-map module.
15
16=== modified file 'include/zorba/tokenizer.h'
17--- include/zorba/tokenizer.h 2012-06-28 04:14:03 +0000
18+++ include/zorba/tokenizer.h 2012-06-29 16:57:20 +0000
19@@ -79,7 +79,7 @@
20
21 /**
22 * This member-function is called whenever an item that is being tokenized
23- * is entered or exited.
24+ * is entered or exited. The default implementation does nothing.
25 *
26 * @param item The item being entered or exited.
27 * @param entering If \c true, the item is being entered; if \c false, the
28
29=== modified file 'modules/com/zorba-xquery/www/modules/full-text.xq'
30--- modules/com/zorba-xquery/www/modules/full-text.xq 2012-06-28 04:14:03 +0000
31+++ modules/com/zorba-xquery/www/modules/full-text.xq 2012-06-29 16:57:20 +0000
32@@ -767,14 +767,14 @@
33 as xs:string* external;
34
35 (:~
36- : Tokenizes the given node and all of its descendants.
37+ : Tokenizes the given node and all of its decendants.
38 :
39 : @param $node The node to tokenize.
40 : @param $lang The default
41 : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
42 : of <code>$node</code>.
43 : @return a (possibly empty) sequence of tokens.
44- : @error err:FTST0009 if <code>$lang</code> is not supported in general.
45+ : @error err:FTST0009 if <code>$lang</code> is not supported.
46 : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-1.xq
47 :)
48 declare function ft:tokenize-node( $node as node(), $lang as xs:language )
49@@ -784,12 +784,11 @@
50 : Tokenizes the given node and all of its descendants.
51 :
52 : @param $node The node to tokenize.
53- : The document's default
54+ : The node's default
55 : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
56 : is assumed to be the one returned by <code>ft:current-lang()</code>.
57 : @return a (possibly empty) sequence of tokens.
58- : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
59- : general.
60+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported.
61 : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-2.xq
62 : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-3.xq
63 : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-4.xq
64@@ -798,10 +797,47 @@
65 as element(ft-schema:token)* external;
66
67 (:~
68+ : Tokenizes the set of nodes comprising <code>$includes</code> (and all of its
69+ : descendants) but excluding <code>$excludes</code> (and all of its
70+ : descendants), if any.
71+ :
72+ : @param $includes The set of nodes (and its descendants) to include.
73+ : The default
74+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
75+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
76+ : @param $excludes The set of nodes (and its descendants) to exclude.
77+ : @return a (possibly empty) sequence of tokens.
78+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported.
79+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq
80+ :)
81+declare function ft:tokenize-nodes( $includes as node()+,
82+ $excludes as node()* )
83+ as element(ft-schema:token)* external;
84+
85+(:~
86+ : Tokenizes the set of nodes comprising <code>$includes</code> (and all of its
87+ : descendants) but excluding <code>$excludes</code> (and all of its
88+ : descendants), if any.
89+ :
90+ : @param $includes The set of nodes (and its descendants) to include.
91+ : @param $excludes The set of nodes (and its descendants) to exclude.
92+ : @param $lang The default
93+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
94+ : for nodes.
95+ : @return a (possibly empty) sequence of tokens.
96+ : @error err:FTST0009 if <code>$lang</code> is not supported.
97+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq
98+ :)
99+declare function ft:tokenize-nodes( $includes as node()+,
100+ $excludes as node()*,
101+ $lang as xs:language )
102+ as element(ft-schema:token)* external;
103+
104+(:~
105 : Tokenizes the given string.
106 :
107 : @param $string The string to tokenize.
108- : @param $lang The default
109+ : @param $lang The
110 : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
111 : of <code>$string</code>.
112 : @return a (possibly empty) sequence of tokens.
113@@ -816,7 +852,7 @@
114 : Tokenizes the given string.
115 :
116 : @param $string The string to tokenize.
117- : The string's default
118+ : The string's
119 : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
120 : is assumed to be the one returned by <code>ft:current-lang()</code>.
121 : @return a (possibly empty) sequence of tokens.
122
123=== modified file 'src/functions/func_ft_module_impl.cpp'
124--- src/functions/func_ft_module_impl.cpp 2012-06-28 04:14:03 +0000
125+++ src/functions/func_ft_module_impl.cpp 2012-06-29 16:57:20 +0000
126@@ -36,6 +36,17 @@
127 }
128
129
130+PlanIter_t full_text_tokenize_nodes::codegen(
131+ CompilerCB*,
132+ static_context* sctx,
133+ const QueryLoc& loc,
134+ std::vector<PlanIter_t>& argv,
135+ expr& ann) const
136+{
137+ return new TokenizeNodesIterator(sctx, loc, argv);
138+}
139+
140+
141 PlanIter_t full_text_tokenizer_properties::codegen(
142 CompilerCB*,
143 static_context* sctx,
144@@ -59,7 +70,6 @@
145
146 #endif // ZORBA_NO_FULL_TEXT
147
148-
149 ///////////////////////////////////////////////////////////////////////////////
150
151 void populate_context_ft_module_impl(static_context* sctx)
152@@ -105,6 +115,25 @@
153 tokenize_return_type),
154 FunctionConsts::FULL_TEXT_TOKENIZE_NODE_2);
155 }
156+ {
157+ DECL_WITH_KIND(sctx,
158+ full_text_tokenize_nodes,
159+ (createQName( FT_MODULE_NS, "", "tokenize-nodes"),
160+ GENV_TYPESYSTEM.ANY_NODE_TYPE_PLUS,
161+ GENV_TYPESYSTEM.ANY_NODE_TYPE_STAR,
162+ tokenize_return_type),
163+ FunctionConsts::FULL_TEXT_TOKENIZE_NODES_2);
164+ }
165+ {
166+ DECL_WITH_KIND(sctx,
167+ full_text_tokenize_nodes,
168+ (createQName( FT_MODULE_NS, "", "tokenize-nodes"),
169+ GENV_TYPESYSTEM.ANY_NODE_TYPE_PLUS,
170+ GENV_TYPESYSTEM.ANY_NODE_TYPE_STAR,
171+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
172+ tokenize_return_type),
173+ FunctionConsts::FULL_TEXT_TOKENIZE_NODES_3);
174+ }
175
176 xqtref_t tokenizer_properties_return_type =
177 GENV_TYPESYSTEM.create_node_type(store::StoreConsts::elementNode,
178@@ -128,10 +157,10 @@
179 tokenizer_properties_return_type),
180 FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_1);
181 }
182-#endif // ZORBA_NO_FULL_TEXT
183+#endif /* ZORBA_NO_FULL_TEXT */
184 }
185
186-
187+///////////////////////////////////////////////////////////////////////////////
188
189 } // namespace zorba
190 /* vim:set et sw=2 ts=2: */
191
192=== modified file 'src/functions/func_ft_module_impl.h'
193--- src/functions/func_ft_module_impl.h 2012-06-28 04:14:03 +0000
194+++ src/functions/func_ft_module_impl.h 2012-06-29 16:57:20 +0000
195@@ -49,6 +49,26 @@
196 };
197
198
199+//full-text:tokenize_nodes
200+class full_text_tokenize_nodes : public function
201+{
202+public:
203+ full_text_tokenize_nodes(const signature& sig,
204+ FunctionConsts::FunctionKind kind) :
205+ function(sig, kind)
206+ {
207+
208+ }
209+
210+ // Mark the function as accessing the dyn ctx so that it won't be
211+ // const-folded. We must prevent const-folding because the function
212+ // uses the store to get access to the tokenizer provider.
213+ bool accessesDynCtx() const { return true; }
214+
215+ CODEGEN_DECL();
216+};
217+
218+
219 //full-text:tokenizer-properties
220 class full_text_tokenizer_properties : public function
221 {
222
223=== modified file 'src/functions/function_consts.h'
224--- src/functions/function_consts.h 2012-06-28 04:14:03 +0000
225+++ src/functions/function_consts.h 2012-06-29 16:57:20 +0000
226@@ -238,7 +238,9 @@
227 FULL_TEXT_TOKENIZER_PROPERTIES_0,
228 FULL_TEXT_TOKENIZE_NODE_2,
229 FULL_TEXT_TOKENIZE_NODE_1,
230-#endif
231+ FULL_TEXT_TOKENIZE_NODES_3,
232+ FULL_TEXT_TOKENIZE_NODES_2,
233+#endif /* ZORBA_NO_FULL_TEXT */
234
235 #include "functions/function_enum.h"
236
237
238=== modified file 'src/runtime/full_text/CMakeLists.txt'
239--- src/runtime/full_text/CMakeLists.txt 2012-06-28 04:14:03 +0000
240+++ src/runtime/full_text/CMakeLists.txt 2012-06-29 16:57:20 +0000
241@@ -41,6 +41,7 @@
242 thesaurus.cpp
243 tokenizer.cpp
244 default_tokenizer.cpp
245+ ft_module_util.cpp
246 ft_module.cpp
247 )
248
249
250=== modified file 'src/runtime/full_text/apply.h'
251--- src/runtime/full_text/apply.h 2012-06-28 04:14:03 +0000
252+++ src/runtime/full_text/apply.h 2012-06-29 16:57:20 +0000
253@@ -24,6 +24,8 @@
254
255 namespace zorba {
256
257+///////////////////////////////////////////////////////////////////////////////
258+
259 void apply_ftand( ft_all_matches const&, ft_all_matches const&,
260 ft_all_matches &result );
261
262@@ -52,6 +54,8 @@
263 void apply_ftwindow( ft_all_matches const&, ft_int window_size, ft_unit::type,
264 ft_all_matches &result );
265
266+///////////////////////////////////////////////////////////////////////////////
267+
268 } // namespace zorba
269 #endif /* ZORBA_FULL_TEXT_APPLY_H */
270 /* vim:set et sw=2 ts=2: */
271
272=== modified file 'src/runtime/full_text/ft_module_impl.cpp'
273--- src/runtime/full_text/ft_module_impl.cpp 2012-06-28 04:14:03 +0000
274+++ src/runtime/full_text/ft_module_impl.cpp 2012-06-29 16:57:20 +0000
275@@ -13,7 +13,7 @@
276 * See the License for the specific language governing permissions and
277 * limitations under the License.
278 */
279-#include "stdafx.h"
280+
281 #include <zorba/config.h>
282
283 //
284@@ -23,6 +23,8 @@
285 //
286 #ifndef ZORBA_NO_FULL_TEXT
287
288+#include "stdafx.h"
289+
290 #include <limits>
291 #include <typeinfo>
292
293@@ -42,10 +44,12 @@
294 #include "types/casting.h"
295 #include "types/typeimpl.h"
296 #include "types/typeops.h"
297+#include "util/stl_util.h"
298 #include "util/utf8_util.h"
299 #include "zorbatypes/URI.h"
300 #include "zorbautils/locale.h"
301
302+#include "ft_module_util.h"
303 #include "ft_stop_words_set.h"
304 #include "ft_token_seq_iterator.h"
305 #include "ft_util.h"
306@@ -87,6 +91,85 @@
307 );
308 }
309
310+static Tokenizer::ptr get_tokenizer( iso639_1::type lang,
311+ Tokenizer::State *t_state,
312+ QueryLoc const &loc ) {
313+ TokenizerProvider const *const provider = GENV_STORE.getTokenizerProvider();
314+ ZORBA_ASSERT( provider );
315+ Tokenizer::ptr tokenizer;
316+ if ( !provider->getTokenizer( lang, t_state, &tokenizer ) )
317+ throw XQUERY_EXCEPTION(
318+ err::FTST0009 /* lang not supported */,
319+ ERROR_PARAMS(
320+ iso639_1::string_of[ lang ], ZED( FTST0009_BadTokenizerLang )
321+ ),
322+ ERROR_LOC( loc )
323+ );
324+ return std::move( tokenizer );
325+}
326+
327+static void make_token_element( FTToken const &token,
328+ TokenQNames const &qnames,
329+ store::Item_t &result ) {
330+ zstring base_uri = static_context::ZORBA_FULL_TEXT_FN_NS;
331+ store::Item_t item, attr_node, node_name, type_name;
332+ store::NsBindings const ns_bindings;
333+ zstring value_string;
334+
335+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
336+ node_name = qnames.token;
337+ GENV_ITEMFACTORY->createElementNode(
338+ result, nullptr, node_name, type_name, false, false,
339+ ns_bindings, base_uri
340+ );
341+
342+ if ( token.lang() ) {
343+ value_string = iso639_1::string_of[ token.lang() ];
344+ GENV_ITEMFACTORY->createString( item, value_string );
345+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
346+ node_name = qnames.lang;
347+ GENV_ITEMFACTORY->createAttributeNode(
348+ attr_node, result, node_name, type_name, item
349+ );
350+ }
351+
352+ ztd::to_string( token.para(), &value_string );
353+ GENV_ITEMFACTORY->createString( item, value_string );
354+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
355+ node_name = qnames.paragraph;
356+ GENV_ITEMFACTORY->createAttributeNode(
357+ attr_node, result, node_name, type_name, item
358+ );
359+
360+ ztd::to_string( token.sent(), &value_string );
361+ GENV_ITEMFACTORY->createString( item, value_string );
362+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
363+ node_name = qnames.sentence;
364+ GENV_ITEMFACTORY->createAttributeNode(
365+ attr_node, result, node_name, type_name, item
366+ );
367+
368+ value_string = token.value();
369+ GENV_ITEMFACTORY->createString( item, value_string );
370+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
371+ node_name = qnames.value;
372+ GENV_ITEMFACTORY->createAttributeNode(
373+ attr_node, result, node_name, type_name, item
374+ );
375+
376+ if ( store::Item const *const token_item = token.item() ) {
377+ if ( GENV_STORE.getNodeReference( item, token_item ) ) {
378+ item->getStringValue2( value_string );
379+ GENV_ITEMFACTORY->createString( item, value_string );
380+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
381+ node_name = qnames.node_ref;
382+ GENV_ITEMFACTORY->createAttributeNode(
383+ attr_node, result, node_name, type_name, item
384+ );
385+ }
386+ }
387+}
388+
389 ///////////////////////////////////////////////////////////////////////////////
390
391 bool CurrentCompareOptionsIterator::nextImpl( store::Item_t &result,
392@@ -296,10 +379,9 @@
393 }
394
395 try {
396- static_context const *const sctx = getStaticContext();
397- ZORBA_ASSERT( sctx );
398 iso639_1::type const lang = get_lang_from( item, loc );
399-
400+ static_context const *const sctx = getStaticContext();
401+ ZORBA_ASSERT( sctx );
402 zstring error_msg;
403 auto_ptr<internal::Resource> rsrc = sctx->resolve_uri(
404 uri, internal::EntityData::THESAURUS, error_msg
405@@ -369,7 +451,6 @@
406 PlanIteratorState *state;
407 DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
408
409-
410 consumeNext( item, theChildren[0], plan_state );
411 item->getStringValue2( word );
412 utf8::to_lower( word );
413@@ -535,45 +616,12 @@
414
415 ///////////////////////////////////////////////////////////////////////////////
416
417-TokenizeNodeIterator::TokenizeNodeIterator( static_context *sctx,
418- QueryLoc const &loc,
419- std::vector<PlanIter_t>& children ):
420- NaryBaseIterator<TokenizeNodeIterator,TokenizeNodeIteratorState>(sctx, loc, children)
421-{
422- initMembers();
423-}
424-
425-void TokenizeNodeIterator::initMembers() {
426- GENV_ITEMFACTORY->createQName(
427- token_qname_, static_context::ZORBA_FULL_TEXT_FN_NS, "", "token" );
428-
429- GENV_ITEMFACTORY->createQName(
430- lang_qname_, "", "", "lang" );
431-
432- GENV_ITEMFACTORY->createQName(
433- para_qname_, "", "", "paragraph" );
434-
435- GENV_ITEMFACTORY->createQName(
436- sent_qname_, "", "", "sentence" );
437-
438- GENV_ITEMFACTORY->createQName(
439- value_qname_, "", "", "value" );
440-
441- GENV_ITEMFACTORY->createQName(
442- ref_qname_, "", "", "node-ref" );
443-}
444-
445 bool TokenizeNodeIterator::nextImpl( store::Item_t &result,
446 PlanState &plan_state ) const {
447- store::Item_t node_name, attr_node;
448- zstring base_uri;
449 store::Item_t item;
450 iso639_1::type lang;
451 Tokenizer::State t_state;
452- store::NsBindings const ns_bindings;
453 TokenizerProvider const *tokenizer_provider;
454- store::Item_t type_name;
455- zstring value_string;
456
457 TokenizeNodeIteratorState *state;
458 DEFAULT_STACK_INIT( TokenizeNodeIteratorState, state, plan_state );
459@@ -594,66 +642,11 @@
460 state->doc_item_->getTokens( *tokenizer_provider, t_state, lang );
461
462 while ( state->doc_tokens_->hasNext() ) {
463- FTToken const *token;
464- token = state->doc_tokens_->next();
465- ZORBA_ASSERT( token );
466-
467- base_uri = static_context::ZORBA_FULL_TEXT_FN_NS;
468- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
469- node_name = token_qname_;
470- GENV_ITEMFACTORY->createElementNode(
471- result, nullptr, node_name, type_name, false, false,
472- ns_bindings, base_uri
473- );
474-
475- if ( token->lang() ) {
476- value_string = iso639_1::string_of[ token->lang() ];
477- GENV_ITEMFACTORY->createString( item, value_string );
478- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
479- node_name = lang_qname_;
480- GENV_ITEMFACTORY->createAttributeNode(
481- attr_node, result, node_name, type_name, item
482- );
483- }
484-
485- ztd::to_string( token->para(), &value_string );
486- GENV_ITEMFACTORY->createString( item, value_string );
487- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
488- node_name = para_qname_;
489- GENV_ITEMFACTORY->createAttributeNode(
490- attr_node, result, node_name, type_name, item
491- );
492-
493- ztd::to_string( token->sent(), &value_string );
494- GENV_ITEMFACTORY->createString( item, value_string );
495- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
496- node_name = sent_qname_;
497- GENV_ITEMFACTORY->createAttributeNode(
498- attr_node, result, node_name, type_name, item
499- );
500-
501- value_string = token->value();
502- GENV_ITEMFACTORY->createString( item, value_string );
503- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
504- node_name = value_qname_;
505- GENV_ITEMFACTORY->createAttributeNode(
506- attr_node, result, node_name, type_name, item
507- );
508-
509- if ( store::Item const *const token_item = token->item() ) {
510- if ( GENV_STORE.getNodeReference( item, token_item ) ) {
511- item->getStringValue2( value_string );
512- GENV_ITEMFACTORY->createString( item, value_string );
513- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
514- node_name = ref_qname_;
515- GENV_ITEMFACTORY->createAttributeNode(
516- attr_node, result, node_name, type_name, item
517- );
518- }
519- }
520-
521+ make_token_element(
522+ *state->doc_tokens_->next(), state->token_qnames_, result
523+ );
524 STACK_PUSH( true, state );
525- } // while
526+ }
527 }
528
529 STACK_END( state );
530@@ -669,12 +662,140 @@
531 state->doc_tokens_->reset();
532 }
533
534-void TokenizeNodeIterator::serialize( serialization::Archiver &ar ) {
535- serialize_baseclass(
536- ar, (NaryBaseIterator<TokenizeNodeIterator,TokenizeNodeIteratorState>*)this
537- );
538- if ( !ar.is_serializing_out() )
539- initMembers();
540+///////////////////////////////////////////////////////////////////////////////
541+
542+bool TokenizeNodesIterator::nextImpl( store::Item_t &result,
543+ PlanState &plan_state ) const {
544+ store::Item_t item;
545+ iso639_1::type lang;
546+ Tokenizer::State t_state;
547+ Tokenizer::ptr tokenizer;
548+
549+ TokenizeNodesIteratorState *state;
550+ DEFAULT_STACK_INIT( TokenizeNodesIteratorState, state, plan_state );
551+
552+ if ( theChildren.size() > 2 ) {
553+ consumeNext( item, theChildren[2], plan_state );
554+ lang = get_lang_from( item, loc );
555+ } else {
556+ static_context const *const sctx = getStaticContext();
557+ ZORBA_ASSERT( sctx );
558+ lang = get_lang_from( sctx );
559+ }
560+
561+ tokenizer = get_tokenizer( lang, &state->t_state_, loc );
562+
563+ // $includes
564+ while ( consumeNext( item, theChildren[0], plan_state ) )
565+ state->includes_.push_back( item );
566+ state->includes_.push_back( store::Item_t() ); // sentinel
567+
568+ // $excludes
569+ while ( consumeNext( item, theChildren[1], plan_state ) ) {
570+ store::Item_t exc_si;
571+ GENV_STORE.getStructuralInformation( exc_si, item.getp() );
572+ state->excludes_.push_back( exc_si );
573+ }
574+
575+ state->callback_.set_tokens( state->tokens_ );
576+ state->langs_.push( lang );
577+ state->tokenizers_.push( tokenizer.release() );
578+
579+ while ( true ) {
580+ if ( state->tokens_.empty() ) {
581+ if ( state->includes_.empty() )
582+ break;
583+
584+ store::Item_t inc( state->includes_.front() );
585+ state->includes_.pop_front();
586+ if ( inc.isNull() ) { // sentinel
587+ state->langs_.pop();
588+ Tokenizer::ptr deleter( ztd::pop_stack( state->tokenizers_ ) );
589+ continue;
590+ }
591+
592+ store::Item_t inc_si;
593+ GENV_STORE.getStructuralInformation( inc_si, inc.getp() );
594+ bool excluded = false;
595+ FOR_EACH( vector<store::Item_t>, exc, state->excludes_ ) {
596+ if ( inc_si->equals( *exc ) || (*exc)->isInSubtreeOf( inc_si ) ) {
597+ excluded = true;
598+ break;
599+ }
600+ }
601+ if ( excluded )
602+ continue;
603+
604+ bool add_sentinel = false;
605+ switch ( inc->getNodeKind() ) {
606+ case store::StoreConsts::elementNode:
607+ ++state->t_state_.para;
608+ if ( find_lang_attribute( *inc, &lang ) ) {
609+ state->langs_.push( lang );
610+ tokenizer = get_tokenizer( lang, &state->t_state_, loc );
611+ state->tokenizers_.push( tokenizer.release() );
612+ add_sentinel = true;
613+ }
614+ // no break;
615+ case store::StoreConsts::documentNode: {
616+ list<store::Item_t>::iterator pos = state->includes_.begin();
617+ store::Iterator_t i = inc->getChildren();
618+ i->open();
619+ for ( store::Item_t child; i->next( child ); ) {
620+ switch ( child->getNodeKind() ) {
621+ case store::StoreConsts::attributeNode:
622+ case store::StoreConsts::commentNode:
623+ case store::StoreConsts::piNode:
624+ continue; // never include these implicitly
625+ default:
626+ pos = state->includes_.insert( pos, child );
627+ ++pos;
628+ }
629+ }
630+ i->close();
631+ if ( add_sentinel ) // sentinel
632+ state->includes_.insert( pos, store::Item_t() );
633+ continue;
634+ }
635+
636+ case store::StoreConsts::attributeNode:
637+ case store::StoreConsts::commentNode:
638+ case store::StoreConsts::piNode:
639+ // tokenize these because they were included explicitly
640+ case store::StoreConsts::textNode: {
641+ zstring const s( inc->getStringValue() );
642+ Item const temp( inc.getp() );
643+ state->tokenizers_.top()->tokenize_string(
644+ s.data(), s.size(), state->langs_.top(), false, state->callback_,
645+ &temp
646+ );
647+ break;
648+ }
649+
650+ default:
651+ break;
652+ } // switch
653+ continue;
654+ } // if ( state->tokens_.empty() )
655+
656+ make_token_element(
657+ state->tokens_.front(), state->token_qnames_, result
658+ );
659+ state->tokens_.pop_front();
660+ STACK_PUSH( true, state );
661+ } // while
662+
663+ STACK_END( state );
664+}
665+
666+void TokenizeNodesIterator::resetImpl( PlanState &plan_state ) const {
667+ NaryBaseIterator<TokenizeNodesIterator,TokenizeNodesIteratorState>::
668+ resetImpl( plan_state );
669+ TokenizeNodesIteratorState *const state =
670+ StateTraitsImpl<TokenizeNodesIteratorState>::getState(
671+ plan_state, this->theStateOffset
672+ );
673+ state->doc_tokens_->reset();
674 }
675
676 ///////////////////////////////////////////////////////////////////////////////
677@@ -689,7 +810,6 @@
678 Tokenizer::ptr tokenizer;
679 store::Item_t type_name;
680 Tokenizer::Properties props;
681- TokenizerProvider const *tokenizer_provider;
682 zstring value_string;
683
684 PlanIteratorState *state;
685@@ -704,15 +824,7 @@
686 lang = get_lang_from( sctx );
687 }
688
689- tokenizer_provider = GENV_STORE.getTokenizerProvider();
690- ZORBA_ASSERT( tokenizer_provider );
691- if ( !tokenizer_provider->getTokenizer( lang, &t_state, &tokenizer ) )
692- throw XQUERY_EXCEPTION(
693- err::FTST0009 /* lang not supported */,
694- ERROR_PARAMS(
695- iso639_1::string_of[ lang ], ZED( FTST0009_BadTokenizerLang )
696- )
697- );
698+ tokenizer = get_tokenizer( lang, &t_state, loc );
699 tokenizer->properties( &props );
700
701 GENV_ITEMFACTORY->createQName(
702@@ -840,19 +952,8 @@
703 }
704
705 { // local scope
706- TokenizerProvider const *const tokenizer_provider =
707- GENV_STORE.getTokenizerProvider();
708- ZORBA_ASSERT( tokenizer_provider );
709 Tokenizer::State t_state;
710- Tokenizer::ptr tokenizer;
711- if ( !tokenizer_provider->getTokenizer( lang, &t_state, &tokenizer ) )
712- throw XQUERY_EXCEPTION(
713- err::FTST0009 /* lang not supported */,
714- ERROR_PARAMS(
715- iso639_1::string_of[ lang ], ZED( FTST0009_BadTokenizerLang )
716- )
717- );
718-
719+ Tokenizer::ptr const tokenizer( get_tokenizer( lang, &t_state, loc ) );
720 TokenizeStringIteratorCallback callback;
721 tokenizer->tokenize_string(
722 value_string.data(), value_string.size(), lang, false, callback
723
724=== added file 'src/runtime/full_text/ft_module_util.cpp'
725--- src/runtime/full_text/ft_module_util.cpp 1970-01-01 00:00:00 +0000
726+++ src/runtime/full_text/ft_module_util.cpp 2012-06-29 16:57:20 +0000
727@@ -0,0 +1,57 @@
728+/*
729+ * Copyright 2006-2008 The FLWOR Foundation.
730+ *
731+ * Licensed under the Apache License, Version 2.0 (the "License");
732+ * you may not use this file except in compliance with the License.
733+ * You may obtain a copy of the License at
734+ *
735+ * http://www.apache.org/licenses/LICENSE-2.0
736+ *
737+ * Unless required by applicable law or agreed to in writing, software
738+ * distributed under the License is distributed on an "AS IS" BASIS,
739+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
740+ * See the License for the specific language governing permissions and
741+ * limitations under the License.
742+ */
743+
744+#include "api/unmarshaller.h"
745+#include "context/static_context.h"
746+#include "store/api/item_factory.h"
747+#include "system/globalenv.h"
748+
749+#include "ft_module_util.h"
750+
751+using namespace std;
752+using namespace zorba::locale;
753+
754+namespace zorba {
755+
756+///////////////////////////////////////////////////////////////////////////////
757+
758+void TokenizeNodesCallback::token( char const *utf8_s, size_type utf8_len,
759+ iso639_1::type lang, size_type token_no,
760+ size_type sent_no, size_type para_no,
761+ Item const *api_item ) {
762+ store::Item const *const item = Unmarshaller::getInternalItem( *api_item );
763+ tokens_->push_back(
764+ FTToken( utf8_s, utf8_len, token_no, sent_no, para_no, item )
765+ );
766+}
767+
768+///////////////////////////////////////////////////////////////////////////////
769+
770+TokenQNames::TokenQNames() {
771+ GENV_ITEMFACTORY->createQName(
772+ token, static_context::ZORBA_FULL_TEXT_FN_NS, "", "token"
773+ );
774+ GENV_ITEMFACTORY->createQName( lang, "", "", "lang" );
775+ GENV_ITEMFACTORY->createQName( paragraph, "", "", "paragraph" );
776+ GENV_ITEMFACTORY->createQName( sentence, "", "", "sentence" );
777+ GENV_ITEMFACTORY->createQName( value, "", "", "value" );
778+ GENV_ITEMFACTORY->createQName( node_ref, "", "", "node-ref" );
779+}
780+
781+///////////////////////////////////////////////////////////////////////////////
782+
783+} // namespace zorba
784+/* vim:set et sw=2 ts=2: */
785
786=== added file 'src/runtime/full_text/ft_module_util.h'
787--- src/runtime/full_text/ft_module_util.h 1970-01-01 00:00:00 +0000
788+++ src/runtime/full_text/ft_module_util.h 2012-06-29 16:57:20 +0000
789@@ -0,0 +1,80 @@
790+/*
791+ * Copyright 2006-2008 The FLWOR Foundation.
792+ *
793+ * Licensed under the Apache License, Version 2.0 (the "License");
794+ * you may not use this file except in compliance with the License.
795+ * You may obtain a copy of the License at
796+ *
797+ * http://www.apache.org/licenses/LICENSE-2.0
798+ *
799+ * Unless required by applicable law or agreed to in writing, software
800+ * distributed under the License is distributed on an "AS IS" BASIS,
801+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
802+ * See the License for the specific language governing permissions and
803+ * limitations under the License.
804+ */
805+
806+#ifndef ZORBA_FT_MODULE_UTIL_H
807+#define ZORBA_FT_MODULE_UTIL_H
808+
809+//
810+// The reason this header (and related .cpp) are necessary (instead of just
811+// puting this code into ft_module.h/.cpp directly) is because this header
812+// needs to be #include'd into the .cpp generated from the ft_module.xml file.
813+//
814+
815+#include <zorba/tokenizer.h>
816+
817+#include <deque>
818+
819+#include "store/api/item.h"
820+#include "util/cxx_util.h"
821+#include "zorbatypes/ft_token.h"
822+
823+#include "ft_module_util.h"
824+
825+namespace zorba {
826+
827+///////////////////////////////////////////////////////////////////////////////
828+
829+/**
830+ * A %TokenizeNodesCallback is-a Tokenizer::Callback that's used exclusively by
831+ * the TokenizeNodesIterator that implements the ft:tokenize-nodes() full-text
832+ * module function.
833+ */
834+class TokenizeNodesCallback : public Tokenizer::Callback {
835+public:
836+ TokenizeNodesCallback() : tokens_( nullptr ) { }
837+ TokenizeNodesCallback( std::deque<FTToken> &tokens ) : tokens_( &tokens ) { }
838+
839+ void set_tokens( std::deque<FTToken> &tokens ) {
840+ tokens_ = &tokens;
841+ }
842+
843+ // inherited
844+ void token( char const *utf8_s, size_type utf8_len,
845+ locale::iso639_1::type lang, size_type token_no,
846+ size_type sent_no, size_type para_no, Item const *item = 0 );
847+
848+private:
849+ std::deque<FTToken> *tokens_;
850+};
851+
852+///////////////////////////////////////////////////////////////////////////////
853+
854+struct TokenQNames {
855+ store::Item_t token;
856+ store::Item_t lang;
857+ store::Item_t paragraph;
858+ store::Item_t sentence;
859+ store::Item_t value;
860+ store::Item_t node_ref;
861+
862+ TokenQNames();
863+};
864+
865+///////////////////////////////////////////////////////////////////////////////
866+
867+} // namespace zorba
868+#endif /* ZORBA_FT_MODULE_UTIL_H */
869+/* vim:set et sw=2 ts=2: */
870
871=== modified file 'src/runtime/full_text/ft_util.cpp'
872--- src/runtime/full_text/ft_util.cpp 2012-04-27 17:07:47 +0000
873+++ src/runtime/full_text/ft_util.cpp 2012-06-29 16:57:20 +0000
874@@ -19,14 +19,38 @@
875 #include <stdexcept>
876
877 #include "diagnostics/xquery_diagnostics.h"
878+#include "zorbamisc/ns_consts.h"
879 #include "zorbatypes/numconversions.h"
880+#include "zorbautils/locale.h"
881
882 #include "ft_util.h"
883
884+using namespace zorba::locale;
885+
886 namespace zorba {
887
888 ///////////////////////////////////////////////////////////////////////////////
889
890+bool find_lang_attribute( store::Item const &item, iso639_1::type *lang ) {
891+ bool found_lang = false;
892+ if ( item.getNodeKind() == store::StoreConsts::elementNode ) {
893+ store::Iterator_t i( item.getAttributes() );
894+ i->open();
895+ for ( store::Item_t attr; i->next( attr ); ) {
896+ store::Item const *const qname = attr->getNodeName();
897+ if ( qname &&
898+ qname->getLocalName() == "lang" &&
899+ qname->getNamespace() == XML_NS ) {
900+ *lang = locale::find_lang( attr->getStringValue().c_str() );
901+ found_lang = true;
902+ break;
903+ }
904+ }
905+ i->close();
906+ }
907+ return found_lang;
908+}
909+
910 ft_int to_ft_int( xs_integer const &i ) {
911 try {
912 return to_xs_unsignedInt( i );
913
914=== modified file 'src/runtime/full_text/ft_util.h'
915--- src/runtime/full_text/ft_util.h 2012-06-28 04:14:03 +0000
916+++ src/runtime/full_text/ft_util.h 2012-06-29 16:57:20 +0000
917@@ -17,11 +17,13 @@
918 #ifndef ZORBA_FULL_TEXT_UTIL_H
919 #define ZORBA_FULL_TEXT_UTIL_H
920
921+#include <zorba/item.h>
922 #include <zorba/locale.h>
923
924 #include "compiler/expression/ftnode.h"
925+#include "store/api/item.h"
926+#include "util/cxx_util.h"
927 #include "zorbatypes/schema_types.h"
928-#include "util/cxx_util.h"
929
930 #include "ft_match.h"
931
932@@ -44,6 +46,18 @@
933 ////////// Functions //////////////////////////////////////////////////////////
934
935 /**
936+ * Finds the <code>xml:lang</code> attribute, if any, of the XML element
937+ * specified by \a item and obtains its value.
938+ *
939+ * @param item The item for an XML element to check.
940+ * @param lang A pointer to received the found language.
941+ * @return Returns \c true only if an <code>xml:lang</code> attribute was
942+ * found.
943+ */
944+bool find_lang_attribute( store::Item const &item,
945+ locale::iso639_1::type *lang );
946+
947+/**
948 * Gets the language from the given ftmatch_options, if any.
949 *
950 * @param options The ftmatch_options to get the language from. This may be \c
951@@ -98,6 +112,8 @@
952 */
953 ft_int to_ft_int( xs_integer const &i );
954
955+///////////////////////////////////////////////////////////////////////////////
956+
957 } // namespace zorba
958 #endif /* ZORBA_FULL_TEXT_UTIL_H */
959 /* vim:set et sw=2 ts=2: */
960
961=== added file 'src/runtime/full_text/pregenerated/ft_module.cpp'
962--- src/runtime/full_text/pregenerated/ft_module.cpp 1970-01-01 00:00:00 +0000
963+++ src/runtime/full_text/pregenerated/ft_module.cpp 2012-06-29 16:57:20 +0000
964@@ -0,0 +1,506 @@
965+/*
966+ * Copyright 2006-2008 The FLWOR Foundation.
967+ *
968+ * Licensed under the Apache License, Version 2.0 (the "License");
969+ * you may not use this file except in compliance with the License.
970+ * You may obtain a copy of the License at
971+ *
972+ * http://www.apache.org/licenses/LICENSE-2.0
973+ *
974+ * Unless required by applicable law or agreed to in writing, software
975+ * distributed under the License is distributed on an "AS IS" BASIS,
976+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
977+ * See the License for the specific language governing permissions and
978+ * limitations under the License.
979+ */
980+
981+// ******************************************
982+// * *
983+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
984+// * SEE .xml FILE WITH SAME NAME *
985+// * *
986+// ******************************************
987+
988+#include "stdafx.h"
989+#include "zorbatypes/rchandle.h"
990+#include "zorbatypes/zstring.h"
991+#include "runtime/visitors/planiter_visitor.h"
992+#include "runtime/full_text/ft_module.h"
993+#include "system/globalenv.h"
994+
995+
996+#include "store/api/iterator.h"
997+
998+namespace zorba {
999+
1000+#ifndef ZORBA_NO_FULL_TEXT
1001+// <CurrentCompareOptionsIterator>
1002+SERIALIZABLE_CLASS_VERSIONS(CurrentCompareOptionsIterator)
1003+
1004+void CurrentCompareOptionsIterator::serialize(::zorba::serialization::Archiver& ar)
1005+{
1006+ serialize_baseclass(ar,
1007+ (NaryBaseIterator<CurrentCompareOptionsIterator, PlanIteratorState>*)this);
1008+}
1009+
1010+
1011+void CurrentCompareOptionsIterator::accept(PlanIterVisitor& v) const
1012+{
1013+ v.beginVisit(*this);
1014+
1015+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1016+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1017+ for ( ; lIter != lEnd; ++lIter ){
1018+ (*lIter)->accept(v);
1019+ }
1020+
1021+ v.endVisit(*this);
1022+}
1023+
1024+CurrentCompareOptionsIterator::~CurrentCompareOptionsIterator() {}
1025+
1026+// </CurrentCompareOptionsIterator>
1027+
1028+#endif
1029+#ifndef ZORBA_NO_FULL_TEXT
1030+// <CurrentLangIterator>
1031+SERIALIZABLE_CLASS_VERSIONS(CurrentLangIterator)
1032+
1033+void CurrentLangIterator::serialize(::zorba::serialization::Archiver& ar)
1034+{
1035+ serialize_baseclass(ar,
1036+ (NaryBaseIterator<CurrentLangIterator, PlanIteratorState>*)this);
1037+}
1038+
1039+
1040+void CurrentLangIterator::accept(PlanIterVisitor& v) const
1041+{
1042+ v.beginVisit(*this);
1043+
1044+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1045+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1046+ for ( ; lIter != lEnd; ++lIter ){
1047+ (*lIter)->accept(v);
1048+ }
1049+
1050+ v.endVisit(*this);
1051+}
1052+
1053+CurrentLangIterator::~CurrentLangIterator() {}
1054+
1055+// </CurrentLangIterator>
1056+
1057+#endif
1058+#ifndef ZORBA_NO_FULL_TEXT
1059+// <HostLangIterator>
1060+SERIALIZABLE_CLASS_VERSIONS(HostLangIterator)
1061+
1062+void HostLangIterator::serialize(::zorba::serialization::Archiver& ar)
1063+{
1064+ serialize_baseclass(ar,
1065+ (NaryBaseIterator<HostLangIterator, PlanIteratorState>*)this);
1066+}
1067+
1068+
1069+void HostLangIterator::accept(PlanIterVisitor& v) const
1070+{
1071+ v.beginVisit(*this);
1072+
1073+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1074+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1075+ for ( ; lIter != lEnd; ++lIter ){
1076+ (*lIter)->accept(v);
1077+ }
1078+
1079+ v.endVisit(*this);
1080+}
1081+
1082+HostLangIterator::~HostLangIterator() {}
1083+
1084+// </HostLangIterator>
1085+
1086+#endif
1087+#ifndef ZORBA_NO_FULL_TEXT
1088+// <IsStemLangSupportedIterator>
1089+SERIALIZABLE_CLASS_VERSIONS(IsStemLangSupportedIterator)
1090+
1091+void IsStemLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
1092+{
1093+ serialize_baseclass(ar,
1094+ (NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>*)this);
1095+}
1096+
1097+
1098+void IsStemLangSupportedIterator::accept(PlanIterVisitor& v) const
1099+{
1100+ v.beginVisit(*this);
1101+
1102+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1103+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1104+ for ( ; lIter != lEnd; ++lIter ){
1105+ (*lIter)->accept(v);
1106+ }
1107+
1108+ v.endVisit(*this);
1109+}
1110+
1111+IsStemLangSupportedIterator::~IsStemLangSupportedIterator() {}
1112+
1113+// </IsStemLangSupportedIterator>
1114+
1115+#endif
1116+#ifndef ZORBA_NO_FULL_TEXT
1117+// <IsStopWordIterator>
1118+SERIALIZABLE_CLASS_VERSIONS(IsStopWordIterator)
1119+
1120+void IsStopWordIterator::serialize(::zorba::serialization::Archiver& ar)
1121+{
1122+ serialize_baseclass(ar,
1123+ (NaryBaseIterator<IsStopWordIterator, PlanIteratorState>*)this);
1124+}
1125+
1126+
1127+void IsStopWordIterator::accept(PlanIterVisitor& v) const
1128+{
1129+ v.beginVisit(*this);
1130+
1131+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1132+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1133+ for ( ; lIter != lEnd; ++lIter ){
1134+ (*lIter)->accept(v);
1135+ }
1136+
1137+ v.endVisit(*this);
1138+}
1139+
1140+IsStopWordIterator::~IsStopWordIterator() {}
1141+
1142+// </IsStopWordIterator>
1143+
1144+#endif
1145+#ifndef ZORBA_NO_FULL_TEXT
1146+// <IsStopWordLangSupportedIterator>
1147+SERIALIZABLE_CLASS_VERSIONS(IsStopWordLangSupportedIterator)
1148+
1149+void IsStopWordLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
1150+{
1151+ serialize_baseclass(ar,
1152+ (NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>*)this);
1153+}
1154+
1155+
1156+void IsStopWordLangSupportedIterator::accept(PlanIterVisitor& v) const
1157+{
1158+ v.beginVisit(*this);
1159+
1160+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1161+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1162+ for ( ; lIter != lEnd; ++lIter ){
1163+ (*lIter)->accept(v);
1164+ }
1165+
1166+ v.endVisit(*this);
1167+}
1168+
1169+IsStopWordLangSupportedIterator::~IsStopWordLangSupportedIterator() {}
1170+
1171+// </IsStopWordLangSupportedIterator>
1172+
1173+#endif
1174+#ifndef ZORBA_NO_FULL_TEXT
1175+// <IsThesaurusLangSupportedIterator>
1176+SERIALIZABLE_CLASS_VERSIONS(IsThesaurusLangSupportedIterator)
1177+
1178+void IsThesaurusLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
1179+{
1180+ serialize_baseclass(ar,
1181+ (NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>*)this);
1182+}
1183+
1184+
1185+void IsThesaurusLangSupportedIterator::accept(PlanIterVisitor& v) const
1186+{
1187+ v.beginVisit(*this);
1188+
1189+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1190+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1191+ for ( ; lIter != lEnd; ++lIter ){
1192+ (*lIter)->accept(v);
1193+ }
1194+
1195+ v.endVisit(*this);
1196+}
1197+
1198+IsThesaurusLangSupportedIterator::~IsThesaurusLangSupportedIterator() {}
1199+
1200+// </IsThesaurusLangSupportedIterator>
1201+
1202+#endif
1203+#ifndef ZORBA_NO_FULL_TEXT
1204+// <IsTokenizerLangSupportedIterator>
1205+SERIALIZABLE_CLASS_VERSIONS(IsTokenizerLangSupportedIterator)
1206+
1207+void IsTokenizerLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
1208+{
1209+ serialize_baseclass(ar,
1210+ (NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>*)this);
1211+}
1212+
1213+
1214+void IsTokenizerLangSupportedIterator::accept(PlanIterVisitor& v) const
1215+{
1216+ v.beginVisit(*this);
1217+
1218+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1219+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1220+ for ( ; lIter != lEnd; ++lIter ){
1221+ (*lIter)->accept(v);
1222+ }
1223+
1224+ v.endVisit(*this);
1225+}
1226+
1227+IsTokenizerLangSupportedIterator::~IsTokenizerLangSupportedIterator() {}
1228+
1229+// </IsTokenizerLangSupportedIterator>
1230+
1231+#endif
1232+#ifndef ZORBA_NO_FULL_TEXT
1233+// <StemIterator>
1234+SERIALIZABLE_CLASS_VERSIONS(StemIterator)
1235+
1236+void StemIterator::serialize(::zorba::serialization::Archiver& ar)
1237+{
1238+ serialize_baseclass(ar,
1239+ (NaryBaseIterator<StemIterator, PlanIteratorState>*)this);
1240+}
1241+
1242+
1243+void StemIterator::accept(PlanIterVisitor& v) const
1244+{
1245+ v.beginVisit(*this);
1246+
1247+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1248+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1249+ for ( ; lIter != lEnd; ++lIter ){
1250+ (*lIter)->accept(v);
1251+ }
1252+
1253+ v.endVisit(*this);
1254+}
1255+
1256+StemIterator::~StemIterator() {}
1257+
1258+// </StemIterator>
1259+
1260+#endif
1261+#ifndef ZORBA_NO_FULL_TEXT
1262+// <StripDiacriticsIterator>
1263+SERIALIZABLE_CLASS_VERSIONS(StripDiacriticsIterator)
1264+
1265+void StripDiacriticsIterator::serialize(::zorba::serialization::Archiver& ar)
1266+{
1267+ serialize_baseclass(ar,
1268+ (NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>*)this);
1269+}
1270+
1271+
1272+void StripDiacriticsIterator::accept(PlanIterVisitor& v) const
1273+{
1274+ v.beginVisit(*this);
1275+
1276+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1277+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1278+ for ( ; lIter != lEnd; ++lIter ){
1279+ (*lIter)->accept(v);
1280+ }
1281+
1282+ v.endVisit(*this);
1283+}
1284+
1285+StripDiacriticsIterator::~StripDiacriticsIterator() {}
1286+
1287+// </StripDiacriticsIterator>
1288+
1289+#endif
1290+#ifndef ZORBA_NO_FULL_TEXT
1291+// <ThesaurusLookupIterator>
1292+SERIALIZABLE_CLASS_VERSIONS(ThesaurusLookupIterator)
1293+
1294+void ThesaurusLookupIterator::serialize(::zorba::serialization::Archiver& ar)
1295+{
1296+ serialize_baseclass(ar,
1297+ (NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>*)this);
1298+}
1299+
1300+
1301+void ThesaurusLookupIterator::accept(PlanIterVisitor& v) const
1302+{
1303+ v.beginVisit(*this);
1304+
1305+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1306+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1307+ for ( ; lIter != lEnd; ++lIter ){
1308+ (*lIter)->accept(v);
1309+ }
1310+
1311+ v.endVisit(*this);
1312+}
1313+
1314+ThesaurusLookupIterator::~ThesaurusLookupIterator() {}
1315+
1316+ThesaurusLookupIteratorState::ThesaurusLookupIteratorState() {}
1317+
1318+ThesaurusLookupIteratorState::~ThesaurusLookupIteratorState() {}
1319+
1320+
1321+void ThesaurusLookupIteratorState::reset(PlanState& planState) {
1322+ PlanIteratorState::reset(planState);
1323+}
1324+// </ThesaurusLookupIterator>
1325+
1326+#endif
1327+#ifndef ZORBA_NO_FULL_TEXT
1328+// <TokenizeNodeIterator>
1329+SERIALIZABLE_CLASS_VERSIONS(TokenizeNodeIterator)
1330+
1331+void TokenizeNodeIterator::serialize(::zorba::serialization::Archiver& ar)
1332+{
1333+ serialize_baseclass(ar,
1334+ (NaryBaseIterator<TokenizeNodeIterator, TokenizeNodeIteratorState>*)this);
1335+}
1336+
1337+
1338+void TokenizeNodeIterator::accept(PlanIterVisitor& v) const
1339+{
1340+ v.beginVisit(*this);
1341+
1342+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1343+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1344+ for ( ; lIter != lEnd; ++lIter ){
1345+ (*lIter)->accept(v);
1346+ }
1347+
1348+ v.endVisit(*this);
1349+}
1350+
1351+TokenizeNodeIterator::~TokenizeNodeIterator() {}
1352+
1353+TokenizeNodeIteratorState::TokenizeNodeIteratorState() {}
1354+
1355+TokenizeNodeIteratorState::~TokenizeNodeIteratorState() {}
1356+
1357+
1358+void TokenizeNodeIteratorState::reset(PlanState& planState) {
1359+ PlanIteratorState::reset(planState);
1360+}
1361+// </TokenizeNodeIterator>
1362+
1363+#endif
1364+#ifndef ZORBA_NO_FULL_TEXT
1365+// <TokenizeNodesIterator>
1366+SERIALIZABLE_CLASS_VERSIONS(TokenizeNodesIterator)
1367+
1368+void TokenizeNodesIterator::serialize(::zorba::serialization::Archiver& ar)
1369+{
1370+ serialize_baseclass(ar,
1371+ (NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>*)this);
1372+}
1373+
1374+
1375+void TokenizeNodesIterator::accept(PlanIterVisitor& v) const
1376+{
1377+ v.beginVisit(*this);
1378+
1379+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1380+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1381+ for ( ; lIter != lEnd; ++lIter ){
1382+ (*lIter)->accept(v);
1383+ }
1384+
1385+ v.endVisit(*this);
1386+}
1387+
1388+TokenizeNodesIterator::~TokenizeNodesIterator() {}
1389+
1390+TokenizeNodesIteratorState::TokenizeNodesIteratorState() {}
1391+
1392+TokenizeNodesIteratorState::~TokenizeNodesIteratorState() {}
1393+
1394+
1395+void TokenizeNodesIteratorState::reset(PlanState& planState) {
1396+ PlanIteratorState::reset(planState);
1397+}
1398+// </TokenizeNodesIterator>
1399+
1400+#endif
1401+#ifndef ZORBA_NO_FULL_TEXT
1402+// <TokenizerPropertiesIterator>
1403+SERIALIZABLE_CLASS_VERSIONS(TokenizerPropertiesIterator)
1404+
1405+void TokenizerPropertiesIterator::serialize(::zorba::serialization::Archiver& ar)
1406+{
1407+ serialize_baseclass(ar,
1408+ (NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>*)this);
1409+}
1410+
1411+
1412+void TokenizerPropertiesIterator::accept(PlanIterVisitor& v) const
1413+{
1414+ v.beginVisit(*this);
1415+
1416+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1417+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1418+ for ( ; lIter != lEnd; ++lIter ){
1419+ (*lIter)->accept(v);
1420+ }
1421+
1422+ v.endVisit(*this);
1423+}
1424+
1425+TokenizerPropertiesIterator::~TokenizerPropertiesIterator() {}
1426+
1427+// </TokenizerPropertiesIterator>
1428+
1429+#endif
1430+#ifndef ZORBA_NO_FULL_TEXT
1431+// <TokenizeStringIterator>
1432+SERIALIZABLE_CLASS_VERSIONS(TokenizeStringIterator)
1433+
1434+void TokenizeStringIterator::serialize(::zorba::serialization::Archiver& ar)
1435+{
1436+ serialize_baseclass(ar,
1437+ (NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>*)this);
1438+}
1439+
1440+
1441+void TokenizeStringIterator::accept(PlanIterVisitor& v) const
1442+{
1443+ v.beginVisit(*this);
1444+
1445+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1446+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1447+ for ( ; lIter != lEnd; ++lIter ){
1448+ (*lIter)->accept(v);
1449+ }
1450+
1451+ v.endVisit(*this);
1452+}
1453+
1454+TokenizeStringIterator::~TokenizeStringIterator() {}
1455+
1456+TokenizeStringIteratorState::TokenizeStringIteratorState() {}
1457+
1458+TokenizeStringIteratorState::~TokenizeStringIteratorState() {}
1459+
1460+
1461+void TokenizeStringIteratorState::reset(PlanState& planState) {
1462+ PlanIteratorState::reset(planState);
1463+}
1464+// </TokenizeStringIterator>
1465+
1466+#endif
1467+
1468+}
1469+
1470+
1471
1472=== removed file 'src/runtime/full_text/pregenerated/ft_module.cpp'
1473--- src/runtime/full_text/pregenerated/ft_module.cpp 2012-05-22 19:09:20 +0000
1474+++ src/runtime/full_text/pregenerated/ft_module.cpp 1970-01-01 00:00:00 +0000
1475@@ -1,463 +0,0 @@
1476-/*
1477- * Copyright 2006-2008 The FLWOR Foundation.
1478- *
1479- * Licensed under the Apache License, Version 2.0 (the "License");
1480- * you may not use this file except in compliance with the License.
1481- * You may obtain a copy of the License at
1482- *
1483- * http://www.apache.org/licenses/LICENSE-2.0
1484- *
1485- * Unless required by applicable law or agreed to in writing, software
1486- * distributed under the License is distributed on an "AS IS" BASIS,
1487- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1488- * See the License for the specific language governing permissions and
1489- * limitations under the License.
1490- */
1491-
1492-// ******************************************
1493-// * *
1494-// * THIS IS A GENERATED FILE. DO NOT EDIT! *
1495-// * SEE .xml FILE WITH SAME NAME *
1496-// * *
1497-// ******************************************
1498-
1499-#include "stdafx.h"
1500-#include "zorbatypes/rchandle.h"
1501-#include "zorbatypes/zstring.h"
1502-#include "runtime/visitors/planiter_visitor.h"
1503-#include "runtime/full_text/ft_module.h"
1504-#include "system/globalenv.h"
1505-
1506-
1507-#include "store/api/iterator.h"
1508-
1509-namespace zorba {
1510-
1511-#ifndef ZORBA_NO_FULL_TEXT
1512-// <CurrentCompareOptionsIterator>
1513-SERIALIZABLE_CLASS_VERSIONS(CurrentCompareOptionsIterator)
1514-
1515-void CurrentCompareOptionsIterator::serialize(::zorba::serialization::Archiver& ar)
1516-{
1517- serialize_baseclass(ar,
1518- (NaryBaseIterator<CurrentCompareOptionsIterator, PlanIteratorState>*)this);
1519-}
1520-
1521-
1522-void CurrentCompareOptionsIterator::accept(PlanIterVisitor& v) const
1523-{
1524- v.beginVisit(*this);
1525-
1526- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1527- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1528- for ( ; lIter != lEnd; ++lIter ){
1529- (*lIter)->accept(v);
1530- }
1531-
1532- v.endVisit(*this);
1533-}
1534-
1535-CurrentCompareOptionsIterator::~CurrentCompareOptionsIterator() {}
1536-
1537-// </CurrentCompareOptionsIterator>
1538-
1539-#endif
1540-#ifndef ZORBA_NO_FULL_TEXT
1541-// <CurrentLangIterator>
1542-SERIALIZABLE_CLASS_VERSIONS(CurrentLangIterator)
1543-
1544-void CurrentLangIterator::serialize(::zorba::serialization::Archiver& ar)
1545-{
1546- serialize_baseclass(ar,
1547- (NaryBaseIterator<CurrentLangIterator, PlanIteratorState>*)this);
1548-}
1549-
1550-
1551-void CurrentLangIterator::accept(PlanIterVisitor& v) const
1552-{
1553- v.beginVisit(*this);
1554-
1555- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1556- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1557- for ( ; lIter != lEnd; ++lIter ){
1558- (*lIter)->accept(v);
1559- }
1560-
1561- v.endVisit(*this);
1562-}
1563-
1564-CurrentLangIterator::~CurrentLangIterator() {}
1565-
1566-// </CurrentLangIterator>
1567-
1568-#endif
1569-#ifndef ZORBA_NO_FULL_TEXT
1570-// <HostLangIterator>
1571-SERIALIZABLE_CLASS_VERSIONS(HostLangIterator)
1572-
1573-void HostLangIterator::serialize(::zorba::serialization::Archiver& ar)
1574-{
1575- serialize_baseclass(ar,
1576- (NaryBaseIterator<HostLangIterator, PlanIteratorState>*)this);
1577-}
1578-
1579-
1580-void HostLangIterator::accept(PlanIterVisitor& v) const
1581-{
1582- v.beginVisit(*this);
1583-
1584- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1585- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1586- for ( ; lIter != lEnd; ++lIter ){
1587- (*lIter)->accept(v);
1588- }
1589-
1590- v.endVisit(*this);
1591-}
1592-
1593-HostLangIterator::~HostLangIterator() {}
1594-
1595-// </HostLangIterator>
1596-
1597-#endif
1598-#ifndef ZORBA_NO_FULL_TEXT
1599-// <IsStemLangSupportedIterator>
1600-SERIALIZABLE_CLASS_VERSIONS(IsStemLangSupportedIterator)
1601-
1602-void IsStemLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
1603-{
1604- serialize_baseclass(ar,
1605- (NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>*)this);
1606-}
1607-
1608-
1609-void IsStemLangSupportedIterator::accept(PlanIterVisitor& v) const
1610-{
1611- v.beginVisit(*this);
1612-
1613- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1614- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1615- for ( ; lIter != lEnd; ++lIter ){
1616- (*lIter)->accept(v);
1617- }
1618-
1619- v.endVisit(*this);
1620-}
1621-
1622-IsStemLangSupportedIterator::~IsStemLangSupportedIterator() {}
1623-
1624-// </IsStemLangSupportedIterator>
1625-
1626-#endif
1627-#ifndef ZORBA_NO_FULL_TEXT
1628-// <IsStopWordIterator>
1629-SERIALIZABLE_CLASS_VERSIONS(IsStopWordIterator)
1630-
1631-void IsStopWordIterator::serialize(::zorba::serialization::Archiver& ar)
1632-{
1633- serialize_baseclass(ar,
1634- (NaryBaseIterator<IsStopWordIterator, PlanIteratorState>*)this);
1635-}
1636-
1637-
1638-void IsStopWordIterator::accept(PlanIterVisitor& v) const
1639-{
1640- v.beginVisit(*this);
1641-
1642- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1643- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1644- for ( ; lIter != lEnd; ++lIter ){
1645- (*lIter)->accept(v);
1646- }
1647-
1648- v.endVisit(*this);
1649-}
1650-
1651-IsStopWordIterator::~IsStopWordIterator() {}
1652-
1653-// </IsStopWordIterator>
1654-
1655-#endif
1656-#ifndef ZORBA_NO_FULL_TEXT
1657-// <IsStopWordLangSupportedIterator>
1658-SERIALIZABLE_CLASS_VERSIONS(IsStopWordLangSupportedIterator)
1659-
1660-void IsStopWordLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
1661-{
1662- serialize_baseclass(ar,
1663- (NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>*)this);
1664-}
1665-
1666-
1667-void IsStopWordLangSupportedIterator::accept(PlanIterVisitor& v) const
1668-{
1669- v.beginVisit(*this);
1670-
1671- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1672- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1673- for ( ; lIter != lEnd; ++lIter ){
1674- (*lIter)->accept(v);
1675- }
1676-
1677- v.endVisit(*this);
1678-}
1679-
1680-IsStopWordLangSupportedIterator::~IsStopWordLangSupportedIterator() {}
1681-
1682-// </IsStopWordLangSupportedIterator>
1683-
1684-#endif
1685-#ifndef ZORBA_NO_FULL_TEXT
1686-// <IsThesaurusLangSupportedIterator>
1687-SERIALIZABLE_CLASS_VERSIONS(IsThesaurusLangSupportedIterator)
1688-
1689-void IsThesaurusLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
1690-{
1691- serialize_baseclass(ar,
1692- (NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>*)this);
1693-}
1694-
1695-
1696-void IsThesaurusLangSupportedIterator::accept(PlanIterVisitor& v) const
1697-{
1698- v.beginVisit(*this);
1699-
1700- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1701- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1702- for ( ; lIter != lEnd; ++lIter ){
1703- (*lIter)->accept(v);
1704- }
1705-
1706- v.endVisit(*this);
1707-}
1708-
1709-IsThesaurusLangSupportedIterator::~IsThesaurusLangSupportedIterator() {}
1710-
1711-// </IsThesaurusLangSupportedIterator>
1712-
1713-#endif
1714-#ifndef ZORBA_NO_FULL_TEXT
1715-// <IsTokenizerLangSupportedIterator>
1716-SERIALIZABLE_CLASS_VERSIONS(IsTokenizerLangSupportedIterator)
1717-
1718-void IsTokenizerLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
1719-{
1720- serialize_baseclass(ar,
1721- (NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>*)this);
1722-}
1723-
1724-
1725-void IsTokenizerLangSupportedIterator::accept(PlanIterVisitor& v) const
1726-{
1727- v.beginVisit(*this);
1728-
1729- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1730- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1731- for ( ; lIter != lEnd; ++lIter ){
1732- (*lIter)->accept(v);
1733- }
1734-
1735- v.endVisit(*this);
1736-}
1737-
1738-IsTokenizerLangSupportedIterator::~IsTokenizerLangSupportedIterator() {}
1739-
1740-// </IsTokenizerLangSupportedIterator>
1741-
1742-#endif
1743-#ifndef ZORBA_NO_FULL_TEXT
1744-// <StemIterator>
1745-SERIALIZABLE_CLASS_VERSIONS(StemIterator)
1746-
1747-void StemIterator::serialize(::zorba::serialization::Archiver& ar)
1748-{
1749- serialize_baseclass(ar,
1750- (NaryBaseIterator<StemIterator, PlanIteratorState>*)this);
1751-}
1752-
1753-
1754-void StemIterator::accept(PlanIterVisitor& v) const
1755-{
1756- v.beginVisit(*this);
1757-
1758- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1759- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1760- for ( ; lIter != lEnd; ++lIter ){
1761- (*lIter)->accept(v);
1762- }
1763-
1764- v.endVisit(*this);
1765-}
1766-
1767-StemIterator::~StemIterator() {}
1768-
1769-// </StemIterator>
1770-
1771-#endif
1772-#ifndef ZORBA_NO_FULL_TEXT
1773-// <StripDiacriticsIterator>
1774-SERIALIZABLE_CLASS_VERSIONS(StripDiacriticsIterator)
1775-
1776-void StripDiacriticsIterator::serialize(::zorba::serialization::Archiver& ar)
1777-{
1778- serialize_baseclass(ar,
1779- (NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>*)this);
1780-}
1781-
1782-
1783-void StripDiacriticsIterator::accept(PlanIterVisitor& v) const
1784-{
1785- v.beginVisit(*this);
1786-
1787- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1788- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1789- for ( ; lIter != lEnd; ++lIter ){
1790- (*lIter)->accept(v);
1791- }
1792-
1793- v.endVisit(*this);
1794-}
1795-
1796-StripDiacriticsIterator::~StripDiacriticsIterator() {}
1797-
1798-// </StripDiacriticsIterator>
1799-
1800-#endif
1801-#ifndef ZORBA_NO_FULL_TEXT
1802-// <ThesaurusLookupIterator>
1803-SERIALIZABLE_CLASS_VERSIONS(ThesaurusLookupIterator)
1804-
1805-void ThesaurusLookupIterator::serialize(::zorba::serialization::Archiver& ar)
1806-{
1807- serialize_baseclass(ar,
1808- (NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>*)this);
1809-}
1810-
1811-
1812-void ThesaurusLookupIterator::accept(PlanIterVisitor& v) const
1813-{
1814- v.beginVisit(*this);
1815-
1816- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1817- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1818- for ( ; lIter != lEnd; ++lIter ){
1819- (*lIter)->accept(v);
1820- }
1821-
1822- v.endVisit(*this);
1823-}
1824-
1825-ThesaurusLookupIterator::~ThesaurusLookupIterator() {}
1826-
1827-ThesaurusLookupIteratorState::ThesaurusLookupIteratorState() {}
1828-
1829-ThesaurusLookupIteratorState::~ThesaurusLookupIteratorState() {}
1830-
1831-
1832-void ThesaurusLookupIteratorState::reset(PlanState& planState) {
1833- PlanIteratorState::reset(planState);
1834-}
1835-// </ThesaurusLookupIterator>
1836-
1837-#endif
1838-#ifndef ZORBA_NO_FULL_TEXT
1839-// <TokenizeNodeIterator>
1840-SERIALIZABLE_CLASS_VERSIONS(TokenizeNodeIterator)
1841-
1842-
1843-void TokenizeNodeIterator::accept(PlanIterVisitor& v) const
1844-{
1845- v.beginVisit(*this);
1846-
1847- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1848- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1849- for ( ; lIter != lEnd; ++lIter ){
1850- (*lIter)->accept(v);
1851- }
1852-
1853- v.endVisit(*this);
1854-}
1855-
1856-TokenizeNodeIterator::~TokenizeNodeIterator() {}
1857-
1858-TokenizeNodeIteratorState::TokenizeNodeIteratorState() {}
1859-
1860-TokenizeNodeIteratorState::~TokenizeNodeIteratorState() {}
1861-
1862-
1863-void TokenizeNodeIteratorState::reset(PlanState& planState) {
1864- PlanIteratorState::reset(planState);
1865-}
1866-// </TokenizeNodeIterator>
1867-
1868-#endif
1869-#ifndef ZORBA_NO_FULL_TEXT
1870-// <TokenizerPropertiesIterator>
1871-SERIALIZABLE_CLASS_VERSIONS(TokenizerPropertiesIterator)
1872-
1873-void TokenizerPropertiesIterator::serialize(::zorba::serialization::Archiver& ar)
1874-{
1875- serialize_baseclass(ar,
1876- (NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>*)this);
1877-}
1878-
1879-
1880-void TokenizerPropertiesIterator::accept(PlanIterVisitor& v) const
1881-{
1882- v.beginVisit(*this);
1883-
1884- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1885- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1886- for ( ; lIter != lEnd; ++lIter ){
1887- (*lIter)->accept(v);
1888- }
1889-
1890- v.endVisit(*this);
1891-}
1892-
1893-TokenizerPropertiesIterator::~TokenizerPropertiesIterator() {}
1894-
1895-// </TokenizerPropertiesIterator>
1896-
1897-#endif
1898-#ifndef ZORBA_NO_FULL_TEXT
1899-// <TokenizeStringIterator>
1900-SERIALIZABLE_CLASS_VERSIONS(TokenizeStringIterator)
1901-
1902-void TokenizeStringIterator::serialize(::zorba::serialization::Archiver& ar)
1903-{
1904- serialize_baseclass(ar,
1905- (NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>*)this);
1906-}
1907-
1908-
1909-void TokenizeStringIterator::accept(PlanIterVisitor& v) const
1910-{
1911- v.beginVisit(*this);
1912-
1913- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
1914- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
1915- for ( ; lIter != lEnd; ++lIter ){
1916- (*lIter)->accept(v);
1917- }
1918-
1919- v.endVisit(*this);
1920-}
1921-
1922-TokenizeStringIterator::~TokenizeStringIterator() {}
1923-
1924-TokenizeStringIteratorState::TokenizeStringIteratorState() {}
1925-
1926-TokenizeStringIteratorState::~TokenizeStringIteratorState() {}
1927-
1928-
1929-void TokenizeStringIteratorState::reset(PlanState& planState) {
1930- PlanIteratorState::reset(planState);
1931-}
1932-// </TokenizeStringIterator>
1933-
1934-#endif
1935-
1936-}
1937-
1938-
1939
1940=== modified file 'src/runtime/full_text/pregenerated/ft_module.h'
1941--- src/runtime/full_text/pregenerated/ft_module.h 2012-06-28 04:14:03 +0000
1942+++ src/runtime/full_text/pregenerated/ft_module.h 2012-06-29 16:57:20 +0000
1943@@ -29,6 +29,11 @@
1944
1945
1946 #include "runtime/base/narybase.h"
1947+#include <deque>
1948+#include <list>
1949+#include <stack>
1950+#include <vector>
1951+#include "runtime/full_text/ft_module_util.h"
1952 #include "runtime/full_text/ft_token_seq_iterator.h"
1953 #include "runtime/full_text/thesaurus.h"
1954
1955@@ -416,6 +421,7 @@
1956 public:
1957 store::Item_t doc_item_; //
1958 FTTokenIterator_t doc_tokens_; //
1959+ TokenQNames token_qnames_; //
1960
1961 TokenizeNodeIteratorState();
1962
1963@@ -426,13 +432,6 @@
1964
1965 class TokenizeNodeIterator : public NaryBaseIterator<TokenizeNodeIterator, TokenizeNodeIteratorState>
1966 {
1967-protected:
1968- store::Item_t token_qname_; //
1969- store::Item_t lang_qname_; //
1970- store::Item_t para_qname_; //
1971- store::Item_t sent_qname_; //
1972- store::Item_t value_qname_; //
1973- store::Item_t ref_qname_; //
1974 public:
1975 SERIALIZABLE_CLASS(TokenizeNodeIterator);
1976
1977@@ -445,12 +444,67 @@
1978 static_context* sctx,
1979 const QueryLoc& loc,
1980 std::vector<PlanIter_t>& children)
1981- ;
1982+ :
1983+ NaryBaseIterator<TokenizeNodeIterator, TokenizeNodeIteratorState>(sctx, loc, children)
1984+ {}
1985
1986 virtual ~TokenizeNodeIterator();
1987
1988-public:
1989- void initMembers();
1990+ void accept(PlanIterVisitor& v) const;
1991+
1992+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
1993+
1994+ void resetImpl(PlanState&) const;
1995+};
1996+
1997+#endif
1998+
1999+#ifndef ZORBA_NO_FULL_TEXT
2000+/**
2001+ *
2002+ * Author:
2003+ */
2004+class TokenizeNodesIteratorState : public PlanIteratorState
2005+{
2006+public:
2007+ store::Item_t doc_item_; //
2008+ FTTokenIterator_t doc_tokens_; //
2009+ TokenQNames token_qnames_; //
2010+ std::list<store::Item_t> includes_; //
2011+ std::vector<store::Item_t> excludes_; //
2012+ std::stack<Tokenizer*> tokenizers_; //
2013+ std::stack<locale::iso639_1::type> langs_; //
2014+ TokenizeNodesCallback callback_; //
2015+ Tokenizer::State t_state_; //
2016+ std::deque<FTToken> tokens_; //
2017+
2018+ TokenizeNodesIteratorState();
2019+
2020+ ~TokenizeNodesIteratorState();
2021+
2022+ void reset(PlanState&);
2023+};
2024+
2025+class TokenizeNodesIterator : public NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>
2026+{
2027+public:
2028+ SERIALIZABLE_CLASS(TokenizeNodesIterator);
2029+
2030+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(TokenizeNodesIterator,
2031+ NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>);
2032+
2033+ void serialize( ::zorba::serialization::Archiver& ar);
2034+
2035+ TokenizeNodesIterator(
2036+ static_context* sctx,
2037+ const QueryLoc& loc,
2038+ std::vector<PlanIter_t>& children)
2039+ :
2040+ NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>(sctx, loc, children)
2041+ {}
2042+
2043+ virtual ~TokenizeNodesIterator();
2044+
2045 void accept(PlanIterVisitor& v) const;
2046
2047 bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
2048
2049=== modified file 'src/runtime/full_text/tokenizer.cpp'
2050--- src/runtime/full_text/tokenizer.cpp 2012-06-28 04:14:03 +0000
2051+++ src/runtime/full_text/tokenizer.cpp 2012-06-29 16:57:20 +0000
2052@@ -21,12 +21,15 @@
2053 #include <zorba/tokenizer.h>
2054 #include <zorba/zorba_string.h>
2055
2056+#include "api/unmarshaller.h"
2057 #include "diagnostics/assert.h"
2058 #include "store/api/store.h"
2059 #include "system/globalenv.h"
2060 #include "zorbamisc/ns_consts.h"
2061 #include "zorbautils/locale.h"
2062
2063+#include "ft_util.h"
2064+
2065 using namespace zorba::locale;
2066
2067 namespace zorba {
2068@@ -38,22 +41,9 @@
2069 }
2070
2071 bool Tokenizer::find_lang_attribute( Item const &item, iso639_1::type *lang ) {
2072- bool found_lang = false;
2073- if ( item.getNodeKind() == store::StoreConsts::elementNode ) {
2074- Iterator_t i( item.getAttributes() );
2075- i->open();
2076- for ( Item attr; i->next( attr ); ) {
2077- Item qname;
2078- if ( attr.getNodeName( qname ) &&
2079- qname.getLocalName() == "lang" && qname.getNamespace() == XML_NS ) {
2080- *lang = locale::find_lang( attr.getStringValue().c_str() );
2081- found_lang = true;
2082- break;
2083- }
2084- }
2085- i->close();
2086- }
2087- return found_lang;
2088+ return zorba::find_lang_attribute(
2089+ *Unmarshaller::getInternalItem( item ), lang
2090+ );
2091 }
2092
2093 void Tokenizer::item( Item const &item, bool entering ) {
2094
2095=== modified file 'src/runtime/json/jsonml_array.cpp'
2096--- src/runtime/json/jsonml_array.cpp 2012-06-28 04:14:03 +0000
2097+++ src/runtime/json/jsonml_array.cpp 2012-06-29 16:57:20 +0000
2098@@ -30,6 +30,7 @@
2099 #include "util/omanip.h"
2100 #include "util/oseparator.h"
2101 #include "util/stl_util.h"
2102+#include "util/xml_util.h"
2103
2104 #include "jsonml_array.h"
2105
2106@@ -39,20 +40,12 @@
2107
2108 ///////////////////////////////////////////////////////////////////////////////
2109
2110-static void split_name( zstring const &name, zstring *prefix, zstring *local ) {
2111- zstring::size_type const colon = name.find( ':' );
2112- if ( colon != zstring::npos ) {
2113- *prefix = name.substr( 0, colon );
2114- *local = name.substr( colon + 1 );
2115- if ( prefix->empty() || local->empty() )
2116- throw XQUERY_EXCEPTION(
2117- zerr::ZJPE0008_ILLEGAL_QNAME,
2118- ERROR_PARAMS( name )
2119- );
2120- } else {
2121- prefix->clear();
2122- *local = name;
2123- }
2124+inline void split_name( zstring const &name, zstring *prefix, zstring *local ) {
2125+ if ( !xml::split_name( name, prefix, local ) )
2126+ throw XQUERY_EXCEPTION(
2127+ zerr::ZJPE0008_ILLEGAL_QNAME,
2128+ ERROR_PARAMS( name )
2129+ );
2130 }
2131
2132 namespace expect {
2133
2134=== modified file 'src/runtime/pregenerated/iterator_enum.h'
2135--- src/runtime/pregenerated/iterator_enum.h 2012-06-28 21:54:08 +0000
2136+++ src/runtime/pregenerated/iterator_enum.h 2012-06-29 16:57:20 +0000
2137@@ -114,6 +114,7 @@
2138 TYPE_StripDiacriticsIterator,
2139 TYPE_ThesaurusLookupIterator,
2140 TYPE_TokenizeNodeIterator,
2141+ TYPE_TokenizeNodesIterator,
2142 TYPE_TokenizerPropertiesIterator,
2143 TYPE_TokenizeStringIterator,
2144 TYPE_FunctionNameIterator,
2145
2146=== modified file 'src/runtime/spec/full_text/ft_module.xml'
2147--- src/runtime/spec/full_text/ft_module.xml 2012-06-28 04:14:03 +0000
2148+++ src/runtime/spec/full_text/ft_module.xml 2012-06-29 16:57:20 +0000
2149@@ -6,6 +6,12 @@
2150 xsi:schemaLocation="http://www.zorba-xquery.com ../runtime.xsd">
2151
2152 <zorba:header>
2153+ <zorba:include form="Angle-bracket">deque</zorba:include>
2154+ <zorba:include form="Angle-bracket">list</zorba:include>
2155+ <zorba:include form="Angle-bracket">stack</zorba:include>
2156+ <zorba:include form="Angle-bracket">vector</zorba:include>
2157+ <zorba:include form="Angle-brakcet">zorba/locale.h</zorba:include>
2158+ <zorba:include form="Quoted">runtime/full_text/ft_module_util.h</zorba:include>
2159 <zorba:include form="Quoted">runtime/full_text/ft_token_seq_iterator.h</zorba:include>
2160 <zorba:include form="Quoted">runtime/full_text/thesaurus.h</zorba:include>
2161 </zorba:header>
2162@@ -14,6 +20,8 @@
2163 <zorba:include form="Quoted">store/api/iterator.h</zorba:include>
2164 </zorba:source>
2165
2166+<!--========================================================================-->
2167+
2168 <zorba:iterator name="CurrentCompareOptionsIterator"
2169 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2170 </zorba:iterator>
2171@@ -27,6 +35,8 @@
2172 </zorba:function>
2173 </zorba:iterator>
2174
2175+<!--========================================================================-->
2176+
2177 <zorba:iterator name="HostLangIterator"
2178 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2179 <zorba:function>
2180@@ -36,6 +46,8 @@
2181 </zorba:function>
2182 </zorba:iterator>
2183
2184+<!--========================================================================-->
2185+
2186 <zorba:iterator name="IsStemLangSupportedIterator"
2187 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2188 <zorba:function>
2189@@ -46,6 +58,8 @@
2190 </zorba:function>
2191 </zorba:iterator>
2192
2193+<!--========================================================================-->
2194+
2195 <zorba:iterator name="IsStopWordIterator"
2196 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2197 <zorba:function>
2198@@ -61,6 +75,8 @@
2199 </zorba:function>
2200 </zorba:iterator>
2201
2202+<!--========================================================================-->
2203+
2204 <zorba:iterator name="IsStopWordLangSupportedIterator"
2205 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2206 <zorba:function>
2207@@ -71,6 +87,8 @@
2208 </zorba:function>
2209 </zorba:iterator>
2210
2211+<!--========================================================================-->
2212+
2213 <zorba:iterator name="IsThesaurusLangSupportedIterator"
2214 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2215 <zorba:function>
2216@@ -86,6 +104,8 @@
2217 </zorba:function>
2218 </zorba:iterator>
2219
2220+<!--========================================================================-->
2221+
2222 <zorba:iterator name="IsTokenizerLangSupportedIterator"
2223 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2224 <zorba:function>
2225@@ -96,6 +116,8 @@
2226 </zorba:function>
2227 </zorba:iterator>
2228
2229+<!--========================================================================-->
2230+
2231 <zorba:iterator name="StemIterator"
2232 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2233 <zorba:function>
2234@@ -111,6 +133,8 @@
2235 </zorba:function>
2236 </zorba:iterator>
2237
2238+<!--========================================================================-->
2239+
2240 <zorba:iterator name="StripDiacriticsIterator"
2241 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2242 <zorba:function>
2243@@ -121,6 +145,8 @@
2244 </zorba:function>
2245 </zorba:iterator>
2246
2247+<!--========================================================================-->
2248+
2249 <zorba:iterator name="ThesaurusLookupIterator"
2250 generateResetImpl="true"
2251 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2252@@ -167,56 +193,69 @@
2253 </zorba:state>
2254 </zorba:iterator>
2255
2256+<!--========================================================================-->
2257+
2258 <zorba:iterator name="TokenizeNodeIterator"
2259 generateResetImpl="true"
2260- generateSerialize="false"
2261- generateConstructor="false"
2262- preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2263-
2264- <zorba:state generateInit="use-default">
2265- <zorba:member type="store::Item_t" name="doc_item_"/>
2266- <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
2267- </zorba:state>
2268-
2269- <zorba:member type="store::Item_t" name="token_qname_"/>
2270- <zorba:member type="store::Item_t" name="lang_qname_"/>
2271- <zorba:member type="store::Item_t" name="para_qname_"/>
2272- <zorba:member type="store::Item_t" name="sent_qname_"/>
2273- <zorba:member type="store::Item_t" name="value_qname_"/>
2274- <zorba:member type="store::Item_t" name="ref_qname_"/>
2275-
2276- <zorba:method name="initMembers" return="void"/>
2277-
2278-</zorba:iterator>
2279+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2280+ <zorba:state generateInit="use-default">
2281+ <zorba:member type="store::Item_t" name="doc_item_"/>
2282+ <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
2283+ <zorba:member type="TokenQNames" name="token_qnames_"/>
2284+ </zorba:state>
2285+</zorba:iterator>
2286+
2287+<!--========================================================================-->
2288+
2289+<zorba:iterator name="TokenizeNodesIterator"
2290+ generateResetImpl="true"
2291+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2292+ <zorba:state generateInit="use-default">
2293+ <zorba:member type="store::Item_t" name="doc_item_"/>
2294+ <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
2295+
2296+ <zorba:member type="TokenQNames" name="token_qnames_"/>
2297+
2298+ <zorba:member type="std::list&lt;store::Item_t&gt;" name="includes_"/>
2299+ <zorba:member type="std::vector&lt;store::Item_t&gt;" name="excludes_"/>
2300+
2301+ <zorba:member type="std::stack&lt;Tokenizer*>" name="tokenizers_"/>
2302+ <zorba:member type="std::stack&lt;locale::iso639_1::type&gt;" name="langs_"/>
2303+ <zorba:member type="TokenizeNodesCallback" name="callback_"/>
2304+ <zorba:member type="Tokenizer::State" name="t_state_"/>
2305+ <zorba:member type="std::deque&lt;FTToken&gt;" name="tokens_"/>
2306+ </zorba:state>
2307+</zorba:iterator>
2308+
2309+<!--========================================================================-->
2310
2311 <zorba:iterator name="TokenizerPropertiesIterator"
2312 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2313 </zorba:iterator>
2314
2315+<!--========================================================================-->
2316+
2317 <zorba:iterator name="TokenizeStringIterator"
2318 generateResetImpl="true"
2319 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
2320
2321 <zorba:function>
2322-
2323 <zorba:signature localname="tokenize-string" prefix="full-text">
2324 <zorba:param>xs:string</zorba:param> <!-- string -->
2325 <zorba:output>xs:string*</zorba:output>
2326 </zorba:signature>
2327-
2328 <zorba:signature localname="tokenize-string" prefix="full-text">
2329 <zorba:param>xs:string</zorba:param> <!-- string -->
2330 <zorba:param>xs:language</zorba:param> <!-- lang -->
2331 <zorba:output>xs:string*</zorba:output>
2332 </zorba:signature>
2333-
2334 </zorba:function>
2335-
2336 <zorba:state generateInit="use-default">
2337 <zorba:member type="FTTokenSeqIterator" name="string_tokens_"/>
2338 </zorba:state>
2339-
2340 </zorba:iterator>
2341
2342+<!--========================================================================-->
2343+
2344 </zorba:iterators>
2345 <!-- vim:set et sw=2 ts=2: -->
2346
2347=== modified file 'src/runtime/visitors/pregenerated/planiter_visitor.h'
2348--- src/runtime/visitors/pregenerated/planiter_visitor.h 2012-06-28 21:54:08 +0000
2349+++ src/runtime/visitors/pregenerated/planiter_visitor.h 2012-06-29 16:57:20 +0000
2350@@ -232,6 +232,9 @@
2351 class TokenizeNodeIterator;
2352 #endif
2353 #ifndef ZORBA_NO_FULL_TEXT
2354+ class TokenizeNodesIterator;
2355+#endif
2356+#ifndef ZORBA_NO_FULL_TEXT
2357 class TokenizerPropertiesIterator;
2358 #endif
2359 #ifndef ZORBA_NO_FULL_TEXT
2360@@ -1015,6 +1018,10 @@
2361 virtual void endVisit ( const TokenizeNodeIterator& ) = 0;
2362 #endif
2363 #ifndef ZORBA_NO_FULL_TEXT
2364+ virtual void beginVisit ( const TokenizeNodesIterator& ) = 0;
2365+ virtual void endVisit ( const TokenizeNodesIterator& ) = 0;
2366+#endif
2367+#ifndef ZORBA_NO_FULL_TEXT
2368 virtual void beginVisit ( const TokenizerPropertiesIterator& ) = 0;
2369 virtual void endVisit ( const TokenizerPropertiesIterator& ) = 0;
2370 #endif
2371
2372=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.cpp'
2373--- src/runtime/visitors/pregenerated/printer_visitor.cpp 2012-06-28 21:54:08 +0000
2374+++ src/runtime/visitors/pregenerated/printer_visitor.cpp 2012-06-29 16:57:20 +0000
2375@@ -1442,6 +1442,21 @@
2376
2377 #endif
2378 #ifndef ZORBA_NO_FULL_TEXT
2379+// <TokenizeNodesIterator>
2380+void PrinterVisitor::beginVisit ( const TokenizeNodesIterator& a) {
2381+ thePrinter.startBeginVisit("TokenizeNodesIterator", ++theId);
2382+ printCommons( &a, theId );
2383+ thePrinter.endBeginVisit( theId );
2384+}
2385+
2386+void PrinterVisitor::endVisit ( const TokenizeNodesIterator& ) {
2387+ thePrinter.startEndVisit();
2388+ thePrinter.endEndVisit();
2389+}
2390+// </TokenizeNodesIterator>
2391+
2392+#endif
2393+#ifndef ZORBA_NO_FULL_TEXT
2394 // <TokenizerPropertiesIterator>
2395 void PrinterVisitor::beginVisit ( const TokenizerPropertiesIterator& a) {
2396 thePrinter.startBeginVisit("TokenizerPropertiesIterator", ++theId);
2397
2398=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.h'
2399--- src/runtime/visitors/pregenerated/printer_visitor.h 2012-06-28 21:54:08 +0000
2400+++ src/runtime/visitors/pregenerated/printer_visitor.h 2012-06-29 16:57:20 +0000
2401@@ -356,6 +356,11 @@
2402 #endif
2403
2404 #ifndef ZORBA_NO_FULL_TEXT
2405+ void beginVisit( const TokenizeNodesIterator& );
2406+ void endVisit ( const TokenizeNodesIterator& );
2407+#endif
2408+
2409+#ifndef ZORBA_NO_FULL_TEXT
2410 void beginVisit( const TokenizerPropertiesIterator& );
2411 void endVisit ( const TokenizerPropertiesIterator& );
2412 #endif
2413
2414=== modified file 'src/util/xml_util.h'
2415--- src/util/xml_util.h 2012-06-28 04:14:03 +0000
2416+++ src/util/xml_util.h 2012-06-29 16:57:20 +0000
2417@@ -40,12 +40,14 @@
2418 return o << version_string_of[ v ];
2419 }
2420
2421-////////// "James Clark notation" universal name functions ////////////////////
2422+////////// XML name handing ///////////////////////////////////////////////////
2423
2424 /**
2425 * Attempts to extract the local name from a "universal name".
2426 * See: http://www.jclark.com/xml/xmlns.htm
2427 *
2428+ * @tparam InputStringType The input string type.
2429+ * @tparam OutputStringType The output string type.
2430 * @param uname The universal name.
2431 * @param local A pointer to the string to receive the local name.
2432 * @return Returns \c true only if the extraction was successful.
2433@@ -64,6 +66,8 @@
2434 * Attempts to extract the URI from a "universal name".
2435 * See: http://www.jclark.com/xml/xmlns.htm
2436 *
2437+ * @tparam InputStringType The input string type.
2438+ * @tparam OutputStringType The output string type.
2439 * @param uname The universal name.
2440 * @param uri A pointer to the string to receive the URI.
2441 * @return Returns \c true only if the extraction was successful.
2442@@ -80,11 +84,39 @@
2443 return false;
2444 }
2445
2446+/**
2447+ * Splits an XML name at a \c : if present.
2448+ *
2449+ * @tparam InputStringType The input string type.
2450+ * @tparam PrefixStringType The output prefix string type.
2451+ * @tparam LocalStringType The output local string type.
2452+ * @param name The XML name to be split.
2453+ * @param prefix The prefix is put here, if any.
2454+ * @param local The local name is put here.
2455+ * @return If \a name contains a \c : and either \a prefix or \a local strings
2456+ * become empty, returns \c false; otherwise returns \a true.
2457+ */
2458+template<class InputStringType,class PrefixStringType,class LocalStringType>
2459+inline bool split_name( InputStringType const &name, PrefixStringType *prefix,
2460+ LocalStringType *local ) {
2461+ typename InputStringType::size_type const colon = name.find( ':' );
2462+ if ( colon != InputStringType::npos ) {
2463+ prefix->assign( name, 0, colon );
2464+ local->assign( name, colon + 1, LocalStringType::npos );
2465+ return !( prefix->empty() || local->empty() );
2466+ } else {
2467+ prefix->clear();
2468+ *local = name;
2469+ return true;
2470+ }
2471+}
2472+
2473 ////////// Character validity /////////////////////////////////////////////////
2474
2475 /**
2476 * Checks whether the given code-point is valid for the given XML version.
2477 *
2478+ * @tparam CodePointType The integral Unicode code-point type.
2479 * @param v The XML version to use.
2480 * @return Returns \c true only if the code-point is valid.
2481 */
2482@@ -196,7 +228,7 @@
2483 /**
2484 * Parses an XML entity reference.
2485 *
2486- * @tparam StringType The type of the input string.
2487+ * @tparam StringType The input string type.
2488 * @param ref The string pointing to the start of the entity reference.
2489 * @param c A pointer to the code-point result.
2490 * @return If successful, returns the number of characters parsed; otherwise
2491@@ -211,7 +243,7 @@
2492 * Parses an XML entity reference and appends the UTF-8 encoding of the
2493 * resulting code-point to the given string.
2494 *
2495- * @tparam StringType The type of the output string.
2496+ * @tparam StringType The output string type.
2497 * @param ref The C string pointing to the start of the entity reference.
2498 * @param out A string to append to.
2499 * @return If successful, returns the number of characters parsed; otherwise
2500@@ -230,8 +262,8 @@
2501 * Parses an XML entity reference and appends the UTF-8 encoding of the
2502 * resulting code-point to the given string.
2503 *
2504- * @tparam InputStringType The type of the input string.
2505- * @tparam OutputStringType The type of the output string.
2506+ * @tparam InputStringType The input string type.
2507+ * @tparam OutputStringType The output string type.
2508 * @param ref The string pointing to the start of the entity reference.
2509 * @param out A string to append to.
2510 * @return If successful, returns the number of characters parsed; otherwise
2511
2512=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res'
2513--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res 1970-01-01 00:00:00 +0000
2514+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res 2012-06-29 16:57:20 +0000
2515@@ -0,0 +1,1 @@
2516+true
2517
2518=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq'
2519--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq 1970-01-01 00:00:00 +0000
2520+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq 2012-06-29 16:57:20 +0000
2521@@ -0,0 +1,42 @@
2522+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
2523+import schema namespace fts = "http://www.zorba-xquery.com/modules/full-text";
2524+
2525+let $book :=
2526+ <book>
2527+ <title>The C++ Programming Language</title>
2528+ <authors>
2529+ <author>Bjarne Stroustrup</author>
2530+ </authors>
2531+ <chapters>
2532+ <chapter>
2533+ <title>Notes to the Reader</title>
2534+ <content>
2535+ <quote>
2536+ <content>
2537+ "The time has come," the Walrus said,
2538+ "to talk of many things."
2539+ </content>
2540+ <source>Lewis Carroll</source>
2541+ </quote>
2542+ <!-- more content -->
2543+ </content>
2544+ </chapter>
2545+ </chapters>
2546+ </book>
2547+
2548+let $includes := $book//chapter
2549+let $excludes := $book//quote
2550+
2551+let $tokens := ft:tokenize-nodes( $includes, $excludes, xs:language("en") )
2552+
2553+let $t1 := validate { $tokens[1] }
2554+let $t2 := validate { $tokens[2] }
2555+let $t3 := validate { $tokens[3] }
2556+let $t4 := validate { $tokens[4] }
2557+
2558+return $t1/@value = "Notes"
2559+ and $t2/@value = "to"
2560+ and $t3/@value = "the"
2561+ and $t4/@value = "Reader"
2562+
2563+(: vim:set et sw=2 ts=2: :)

Subscribers

People subscribed via source and target branches