Merge lp:~zorba-coders/zorba/tokenize into lp:zorba
- tokenize
- Merge into trunk
Status: | Superseded | ||||
---|---|---|---|---|---|
Proposed branch: | lp:~zorba-coders/zorba/tokenize | ||||
Merge into: | lp:zorba | ||||
Diff against target: |
603 lines (+366/-2) 25 files modified
ChangeLog (+2/-0) modules/com/zorba-xquery/www/modules/CMakeLists.txt (+1/-1) modules/com/zorba-xquery/www/modules/string.xq (+21/-1) src/functions/pregenerated/func_strings.cpp (+23/-0) src/functions/pregenerated/func_strings.h (+13/-0) src/functions/pregenerated/function_enum.h (+1/-0) src/runtime/spec/strings/strings.xml (+31/-0) src/runtime/strings/pregenerated/strings.cpp (+42/-0) src/runtime/strings/pregenerated/strings.h (+52/-0) src/runtime/strings/strings_impl.cpp (+130/-0) src/runtime/visitors/pregenerated/planiter_visitor.h (+5/-0) src/runtime/visitors/pregenerated/printer_visitor.cpp (+14/-0) src/runtime/visitors/pregenerated/printer_visitor.h (+3/-0) test/rbkt/ExpQueryResults/zorba/string/tokenize01.xml.res (+1/-0) test/rbkt/ExpQueryResults/zorba/string/tokenize02.xml.res (+1/-0) test/rbkt/ExpQueryResults/zorba/string/tokenize03.xml.res (+1/-0) test/rbkt/ExpQueryResults/zorba/string/tokenize04.xml.res (+1/-0) test/rbkt/Queries/zorba/string/token01.txt (+1/-0) test/rbkt/Queries/zorba/string/token02.txt (+1/-0) test/rbkt/Queries/zorba/string/token03.txt (+1/-0) test/rbkt/Queries/zorba/string/token04.txt (+1/-0) test/rbkt/Queries/zorba/string/tokenize01.xq (+5/-0) test/rbkt/Queries/zorba/string/tokenize02.xq (+5/-0) test/rbkt/Queries/zorba/string/tokenize03.xq (+5/-0) test/rbkt/Queries/zorba/string/tokenize04.xq (+5/-0) |
||||
To merge this branch: | bzr merge lp:~zorba-coders/zorba/tokenize | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
William Candillon | Approve | ||
Paul J. Lucas | Approve | ||
Review via email: mp+86647@code.launchpad.net |
This proposal supersedes a proposal from 2011-12-21.
This proposal has been superseded by a proposal from 2011-12-23.
Commit message
implementation of string:tokenize function that doesn't accept regular expressions but allows for streamable processing of the input (resolves bug #898074)
Description of the change
implementation of string:tokenize function that doesn't accept regular expressions but allows for streamable processing of the input (resolves bug #898074)
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal | # |
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal | # |
Validation queue job tokenize-
All tests succeeded!
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal | # |
Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1. Got: 2 Pending.
Paul J. Lucas (paul-lucas) wrote : Posted in a previous version of this proposal | # |
On line 357, why do you call assert()? An invalid byte should throw an exception, not assert and dump core.
Paul J. Lucas (paul-lucas) wrote : Posted in a previous version of this proposal | # |
> On line 357, why do you call assert()? An invalid byte should throw an
> exception, not assert and dump core.
I meant line 367.
Paul J. Lucas (paul-lucas) : Posted in a previous version of this proposal | # |
Matthias Brantner (matthias-brantner) wrote : Posted in a previous version of this proposal | # |
Once you finished the implementation of the transcoding stream buffer, I don't even want to do this check anymore. This must not happen with the stream buffer.
Paul J. Lucas (paul-lucas) wrote : Posted in a previous version of this proposal | # |
> Once you finished the implementation of the transcoding stream buffer, I don't
> even want to do this check anymore. This must not happen with the stream
> buffer.
I don't understand how it "must not happen." It can always happen. However, I think you're saying that you assume the check will happen in the transcoder. While it will be doing checks, bad input can still happen.
In the mean time, using an assert() is still too Draconian.
Matthias Brantner (matthias-brantner) wrote : | # |
I have replaced the assertion with a graceful error.
Paul J. Lucas (paul-lucas) : | # |
William Candillon (wcandillon) wrote : | # |
Is there an example that works with streaming?
I wasn't able to make the following work:
import module namespace http = "http://
declare namespace h = "http://
let $item := http:send-
for $tweet in tokenize($item,"a")
return $tweet
Where:
import module namespace http = "http://
declare namespace h = "http://
let $item := http:send-
return $item
streams fine.
What am I missing?
Matthias Brantner (matthias-brantner) wrote : | # |
As discussed in this thread, only the new tokenize function of the string module streams.
Use the following instead
import module namespace s = "http://
s:tokenize($item, "a")
William Candillon (wcandillon) wrote : | # |
Works like a charm.
Zorba Build Bot (zorba-buildbot) wrote : | # |
Attempt to merge into lp:zorba failed due to conflicts:
text conflict in ChangeLog
- 10587. By Matthias Brantner
-
renamed tokenize to split
- 10588. By Matthias Brantner
-
merge with trunk
- 10589. By Matthias Brantner
-
forgot to commit pregenerated file
Unmerged revisions
Preview Diff
1 | === modified file 'ChangeLog' | |||
2 | --- ChangeLog 2011-12-23 19:38:53 +0000 | |||
3 | +++ ChangeLog 2011-12-23 20:34:27 +0000 | |||
4 | @@ -12,6 +12,8 @@ | |||
5 | 12 | set multiple times via the c++ api). | 12 | set multiple times via the c++ api). |
6 | 13 | * Fixed bug #905050 (setting and getting the context item type via the c++ api) | 13 | * Fixed bug #905050 (setting and getting the context item type via the c++ api) |
7 | 14 | * Added createDayTimeDuration, createYearMonthDuration, createDocumentNode, createCommentNode, createPiNode to api's ItemFactory. | 14 | * Added createDayTimeDuration, createYearMonthDuration, createDocumentNode, createCommentNode, createPiNode to api's ItemFactory. |
8 | 15 | * Added split function to the string module that allows for streamable tokenization but doesn't have regular expression | ||
9 | 16 | support. | ||
10 | 15 | * zerr is not predeclared anymore to be http://www.zorba-xquery.com/errors | 17 | * zerr is not predeclared anymore to be http://www.zorba-xquery.com/errors |
11 | 16 | 18 | ||
12 | 17 | version 2.1 | 19 | version 2.1 |
13 | 18 | 20 | ||
14 | === modified file 'modules/com/zorba-xquery/www/modules/CMakeLists.txt' | |||
15 | --- modules/com/zorba-xquery/www/modules/CMakeLists.txt 2011-12-21 14:40:33 +0000 | |||
16 | +++ modules/com/zorba-xquery/www/modules/CMakeLists.txt 2011-12-23 20:34:27 +0000 | |||
17 | @@ -58,7 +58,7 @@ | |||
18 | 58 | URI "http://www.zorba-xquery.com/modules/reflection") | 58 | URI "http://www.zorba-xquery.com/modules/reflection") |
19 | 59 | DECLARE_ZORBA_MODULE(FILE schema.xq VERSION 2.0 | 59 | DECLARE_ZORBA_MODULE(FILE schema.xq VERSION 2.0 |
20 | 60 | URI "http://www.zorba-xquery.com/modules/schema") | 60 | URI "http://www.zorba-xquery.com/modules/schema") |
22 | 61 | DECLARE_ZORBA_MODULE(FILE string.xq VERSION 2.0 | 61 | DECLARE_ZORBA_MODULE(FILE string.xq VERSION 2.1 |
23 | 62 | URI "http://www.zorba-xquery.com/modules/string") | 62 | URI "http://www.zorba-xquery.com/modules/string") |
24 | 63 | DECLARE_ZORBA_MODULE(FILE xml.xq VERSION 2.0 | 63 | DECLARE_ZORBA_MODULE(FILE xml.xq VERSION 2.0 |
25 | 64 | URI "http://www.zorba-xquery.com/modules/xml") | 64 | URI "http://www.zorba-xquery.com/modules/xml") |
26 | 65 | 65 | ||
27 | === modified file 'modules/com/zorba-xquery/www/modules/string.xq' | |||
28 | --- modules/com/zorba-xquery/www/modules/string.xq 2011-08-03 15:12:40 +0000 | |||
29 | +++ modules/com/zorba-xquery/www/modules/string.xq 2011-12-23 20:34:27 +0000 | |||
30 | @@ -25,7 +25,7 @@ | |||
31 | 25 | :) | 25 | :) |
32 | 26 | module namespace string = "http://www.zorba-xquery.com/modules/string"; | 26 | module namespace string = "http://www.zorba-xquery.com/modules/string"; |
33 | 27 | declare namespace ver = "http://www.zorba-xquery.com/options/versioning"; | 27 | declare namespace ver = "http://www.zorba-xquery.com/options/versioning"; |
35 | 28 | declare option ver:module-version "2.0"; | 28 | declare option ver:module-version "2.1"; |
36 | 29 | 29 | ||
37 | 30 | (:~ | 30 | (:~ |
38 | 31 | : This function materializes a streamable string. | 31 | : This function materializes a streamable string. |
39 | @@ -63,3 +63,23 @@ | |||
40 | 63 | : | 63 | : |
41 | 64 | :) | 64 | :) |
42 | 65 | declare function string:is-streamable($s as xs:string) as xs:boolean external; | 65 | declare function string:is-streamable($s as xs:string) as xs:boolean external; |
43 | 66 | |||
44 | 67 | (:~ | ||
45 | 68 | : Returns a sequence of strings constructed by splitting the input wherever the given | ||
46 | 69 | : separator is found. | ||
47 | 70 | : | ||
48 | 71 | : The function is different from fn:tokenize. It doesn't allow | ||
49 | 72 | : the separator to be a regular expression. This restriction allows for more | ||
50 | 73 | : performant implementation. Specifically, the function processes | ||
51 | 74 | : streamable strings as input in a streamable way which is particularly useful | ||
52 | 75 | : to tokenize huge strings (e.g. if returned by the file module's read-text | ||
53 | 76 | : function). | ||
54 | 77 | : | ||
55 | 78 | : @param $s the input string to split | ||
56 | 79 | : @param $separator the separator used for splitting the input string $s | ||
57 | 80 | : | ||
58 | 81 | : @return a sequence of strings constructed by splitting the input | ||
59 | 82 | :) | ||
60 | 83 | declare function string:split( | ||
61 | 84 | $s as xs:string, | ||
62 | 85 | $separator as xs:string) as xs:string* external; | ||
63 | 66 | 86 | ||
64 | === modified file 'src/functions/pregenerated/func_strings.cpp' | |||
65 | --- src/functions/pregenerated/func_strings.cpp 2011-12-21 14:40:33 +0000 | |||
66 | +++ src/functions/pregenerated/func_strings.cpp 2011-12-23 20:34:27 +0000 | |||
67 | @@ -320,6 +320,16 @@ | |||
68 | 320 | return new StringIsStreamableIterator(sctx, loc, argv); | 320 | return new StringIsStreamableIterator(sctx, loc, argv); |
69 | 321 | } | 321 | } |
70 | 322 | 322 | ||
71 | 323 | PlanIter_t fn_zorba_string_split::codegen( | ||
72 | 324 | CompilerCB*, | ||
73 | 325 | static_context* sctx, | ||
74 | 326 | const QueryLoc& loc, | ||
75 | 327 | std::vector<PlanIter_t>& argv, | ||
76 | 328 | AnnotationHolder& ann) const | ||
77 | 329 | { | ||
78 | 330 | return new StringSplitIterator(sctx, loc, argv); | ||
79 | 331 | } | ||
80 | 332 | |||
81 | 323 | void populate_context_strings(static_context* sctx) | 333 | void populate_context_strings(static_context* sctx) |
82 | 324 | { | 334 | { |
83 | 325 | { | 335 | { |
84 | @@ -890,6 +900,19 @@ | |||
85 | 890 | 900 | ||
86 | 891 | } | 901 | } |
87 | 892 | 902 | ||
88 | 903 | |||
89 | 904 | { | ||
90 | 905 | |||
91 | 906 | |||
92 | 907 | DECL_WITH_KIND(sctx, fn_zorba_string_split, | ||
93 | 908 | (createQName("http://www.zorba-xquery.com/modules/string","","split"), | ||
94 | 909 | GENV_TYPESYSTEM.STRING_TYPE_ONE, | ||
95 | 910 | GENV_TYPESYSTEM.STRING_TYPE_ONE, | ||
96 | 911 | GENV_TYPESYSTEM.STRING_TYPE_STAR), | ||
97 | 912 | FunctionConsts::FN_ZORBA_STRING_SPLIT_2); | ||
98 | 913 | |||
99 | 914 | } | ||
100 | 915 | |||
101 | 893 | } | 916 | } |
102 | 894 | 917 | ||
103 | 895 | 918 | ||
104 | 896 | 919 | ||
105 | === modified file 'src/functions/pregenerated/func_strings.h' | |||
106 | --- src/functions/pregenerated/func_strings.h 2011-12-22 14:14:53 +0000 | |||
107 | +++ src/functions/pregenerated/func_strings.h 2011-12-23 20:34:27 +0000 | |||
108 | @@ -481,6 +481,19 @@ | |||
109 | 481 | }; | 481 | }; |
110 | 482 | 482 | ||
111 | 483 | 483 | ||
112 | 484 | //fn-zorba-string:split | ||
113 | 485 | class fn_zorba_string_split : public function | ||
114 | 486 | { | ||
115 | 487 | public: | ||
116 | 488 | fn_zorba_string_split(const signature& sig, FunctionConsts::FunctionKind kind) | ||
117 | 489 | : function(sig, kind) { | ||
118 | 490 | |||
119 | 491 | } | ||
120 | 492 | |||
121 | 493 | CODEGEN_DECL(); | ||
122 | 494 | }; | ||
123 | 495 | |||
124 | 496 | |||
125 | 484 | } //namespace zorba | 497 | } //namespace zorba |
126 | 485 | 498 | ||
127 | 486 | 499 | ||
128 | 487 | 500 | ||
129 | === modified file 'src/functions/pregenerated/function_enum.h' | |||
130 | --- src/functions/pregenerated/function_enum.h 2011-12-21 14:40:33 +0000 | |||
131 | +++ src/functions/pregenerated/function_enum.h 2011-12-23 20:34:27 +0000 | |||
132 | @@ -371,6 +371,7 @@ | |||
133 | 371 | FN_ANALYZE_STRING_3, | 371 | FN_ANALYZE_STRING_3, |
134 | 372 | FN_ZORBA_STRING_MATERIALIZE_1, | 372 | FN_ZORBA_STRING_MATERIALIZE_1, |
135 | 373 | FN_ZORBA_STRING_IS_STREAMABLE_1, | 373 | FN_ZORBA_STRING_IS_STREAMABLE_1, |
136 | 374 | FN_ZORBA_STRING_SPLIT_2, | ||
137 | 374 | FN_ZORBA_XQDOC_XQDOC_1, | 375 | FN_ZORBA_XQDOC_XQDOC_1, |
138 | 375 | FN_ZORBA_XQDOC_XQDOC_CONTENT_1, | 376 | FN_ZORBA_XQDOC_XQDOC_CONTENT_1, |
139 | 376 | 377 | ||
140 | 377 | 378 | ||
141 | === modified file 'src/runtime/spec/strings/strings.xml' | |||
142 | --- src/runtime/spec/strings/strings.xml 2011-12-21 14:40:33 +0000 | |||
143 | +++ src/runtime/spec/strings/strings.xml 2011-12-23 20:34:27 +0000 | |||
144 | @@ -729,4 +729,35 @@ | |||
145 | 729 | 729 | ||
146 | 730 | </zorba:iterator> | 730 | </zorba:iterator> |
147 | 731 | 731 | ||
148 | 732 | <!-- | ||
149 | 733 | /******************************************************************************* | ||
150 | 734 | * string:tokenize | ||
151 | 735 | ********************************************************************************/ | ||
152 | 736 | --> | ||
153 | 737 | <zorba:iterator name="StringSplitIterator"> | ||
154 | 738 | |||
155 | 739 | <zorba:description author="Matthias Brantner"> | ||
156 | 740 | string:split | ||
157 | 741 | </zorba:description> | ||
158 | 742 | |||
159 | 743 | <zorba:function> | ||
160 | 744 | <zorba:signature localname="split" prefix="fn-zorba-string"> | ||
161 | 745 | <zorba:param>xs:string</zorba:param> | ||
162 | 746 | <zorba:param>xs:string</zorba:param> | ||
163 | 747 | <zorba:output>xs:string*</zorba:output> | ||
164 | 748 | </zorba:signature> | ||
165 | 749 | </zorba:function> | ||
166 | 750 | |||
167 | 751 | <zorba:state> | ||
168 | 752 | <zorba:member type="zstring" name="theSeparator" | ||
169 | 753 | brief="separator for the tokenization"/> | ||
170 | 754 | <zorba:member type="std::istream*" name="theIStream" | ||
171 | 755 | brief="the remaining string (if the input is streamable)"/> | ||
172 | 756 | <zorba:member type="zstring" name="theInput" | ||
173 | 757 | brief="the string to tokenize (if the input is not streamable)"/> | ||
174 | 758 | <zorba:member type="size_t" name="theNextStartPos" defaultValue="0"/> | ||
175 | 759 | </zorba:state> | ||
176 | 760 | |||
177 | 761 | </zorba:iterator> | ||
178 | 762 | |||
179 | 732 | </zorba:iterators> | 763 | </zorba:iterators> |
180 | 733 | 764 | ||
181 | === modified file 'src/runtime/strings/pregenerated/strings.cpp' | |||
182 | --- src/runtime/strings/pregenerated/strings.cpp 2011-12-21 14:40:33 +0000 | |||
183 | +++ src/runtime/strings/pregenerated/strings.cpp 2011-12-23 20:34:27 +0000 | |||
184 | @@ -830,6 +830,48 @@ | |||
185 | 830 | // </StringIsStreamableIterator> | 830 | // </StringIsStreamableIterator> |
186 | 831 | 831 | ||
187 | 832 | 832 | ||
188 | 833 | // <StringSplitIterator> | ||
189 | 834 | const char* StringSplitIterator::class_name_str = "StringSplitIterator"; | ||
190 | 835 | StringSplitIterator::class_factory<StringSplitIterator> | ||
191 | 836 | StringSplitIterator::g_class_factory; | ||
192 | 837 | |||
193 | 838 | const serialization::ClassVersion | ||
194 | 839 | StringSplitIterator::class_versions[] ={{ 1, 0x000905, false}}; | ||
195 | 840 | |||
196 | 841 | const int StringSplitIterator::class_versions_count = | ||
197 | 842 | sizeof(StringSplitIterator::class_versions)/sizeof(struct serialization::ClassVersion); | ||
198 | 843 | |||
199 | 844 | void StringSplitIterator::accept(PlanIterVisitor& v) const { | ||
200 | 845 | v.beginVisit(*this); | ||
201 | 846 | |||
202 | 847 | std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin(); | ||
203 | 848 | std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end(); | ||
204 | 849 | for ( ; lIter != lEnd; ++lIter ){ | ||
205 | 850 | (*lIter)->accept(v); | ||
206 | 851 | } | ||
207 | 852 | |||
208 | 853 | v.endVisit(*this); | ||
209 | 854 | } | ||
210 | 855 | |||
211 | 856 | StringSplitIterator::~StringSplitIterator() {} | ||
212 | 857 | |||
213 | 858 | StringSplitIteratorState::StringSplitIteratorState() {} | ||
214 | 859 | |||
215 | 860 | StringSplitIteratorState::~StringSplitIteratorState() {} | ||
216 | 861 | |||
217 | 862 | |||
218 | 863 | void StringSplitIteratorState::init(PlanState& planState) { | ||
219 | 864 | PlanIteratorState::init(planState); | ||
220 | 865 | theNextStartPos = 0; | ||
221 | 866 | } | ||
222 | 867 | |||
223 | 868 | void StringSplitIteratorState::reset(PlanState& planState) { | ||
224 | 869 | PlanIteratorState::reset(planState); | ||
225 | 870 | theNextStartPos = 0; | ||
226 | 871 | } | ||
227 | 872 | // </StringSplitIterator> | ||
228 | 873 | |||
229 | 874 | |||
230 | 833 | 875 | ||
231 | 834 | } | 876 | } |
232 | 835 | 877 | ||
233 | 836 | 878 | ||
234 | === modified file 'src/runtime/strings/pregenerated/strings.h' | |||
235 | --- src/runtime/strings/pregenerated/strings.h 2011-12-21 14:40:33 +0000 | |||
236 | +++ src/runtime/strings/pregenerated/strings.h 2011-12-23 20:34:27 +0000 | |||
237 | @@ -1075,6 +1075,58 @@ | |||
238 | 1075 | }; | 1075 | }; |
239 | 1076 | 1076 | ||
240 | 1077 | 1077 | ||
241 | 1078 | /** | ||
242 | 1079 | * | ||
243 | 1080 | * string:split | ||
244 | 1081 | * | ||
245 | 1082 | * Author: Matthias Brantner | ||
246 | 1083 | */ | ||
247 | 1084 | class StringSplitIteratorState : public PlanIteratorState | ||
248 | 1085 | { | ||
249 | 1086 | public: | ||
250 | 1087 | zstring theSeparator; //separator for the tokenization | ||
251 | 1088 | std::istream* theIStream; //the remaining string (if the input is streamable) | ||
252 | 1089 | zstring theInput; //the string to tokenize (if the input is not streamable) | ||
253 | 1090 | size_t theNextStartPos; // | ||
254 | 1091 | |||
255 | 1092 | StringSplitIteratorState(); | ||
256 | 1093 | |||
257 | 1094 | ~StringSplitIteratorState(); | ||
258 | 1095 | |||
259 | 1096 | void init(PlanState&); | ||
260 | 1097 | void reset(PlanState&); | ||
261 | 1098 | }; | ||
262 | 1099 | |||
263 | 1100 | class StringSplitIterator : public NaryBaseIterator<StringSplitIterator, StringSplitIteratorState> | ||
264 | 1101 | { | ||
265 | 1102 | public: | ||
266 | 1103 | SERIALIZABLE_CLASS(StringSplitIterator); | ||
267 | 1104 | |||
268 | 1105 | SERIALIZABLE_CLASS_CONSTRUCTOR2T(StringSplitIterator, | ||
269 | 1106 | NaryBaseIterator<StringSplitIterator, StringSplitIteratorState>); | ||
270 | 1107 | |||
271 | 1108 | void serialize( ::zorba::serialization::Archiver& ar) | ||
272 | 1109 | { | ||
273 | 1110 | serialize_baseclass(ar, | ||
274 | 1111 | (NaryBaseIterator<StringSplitIterator, StringSplitIteratorState>*)this); | ||
275 | 1112 | } | ||
276 | 1113 | |||
277 | 1114 | StringSplitIterator( | ||
278 | 1115 | static_context* sctx, | ||
279 | 1116 | const QueryLoc& loc, | ||
280 | 1117 | std::vector<PlanIter_t>& children) | ||
281 | 1118 | : | ||
282 | 1119 | NaryBaseIterator<StringSplitIterator, StringSplitIteratorState>(sctx, loc, children) | ||
283 | 1120 | {} | ||
284 | 1121 | |||
285 | 1122 | virtual ~StringSplitIterator(); | ||
286 | 1123 | |||
287 | 1124 | void accept(PlanIterVisitor& v) const; | ||
288 | 1125 | |||
289 | 1126 | bool nextImpl(store::Item_t& result, PlanState& aPlanState) const; | ||
290 | 1127 | }; | ||
291 | 1128 | |||
292 | 1129 | |||
293 | 1078 | } | 1130 | } |
294 | 1079 | #endif | 1131 | #endif |
295 | 1080 | /* | 1132 | /* |
296 | 1081 | 1133 | ||
297 | === modified file 'src/runtime/strings/strings_impl.cpp' | |||
298 | --- src/runtime/strings/strings_impl.cpp 2011-12-23 06:41:43 +0000 | |||
299 | +++ src/runtime/strings/strings_impl.cpp 2011-12-23 20:34:27 +0000 | |||
300 | @@ -140,6 +140,7 @@ | |||
301 | 140 | p = ec; | 140 | p = ec; |
302 | 141 | 141 | ||
303 | 142 | if ( utf8::read( *state->theStream, ec ) == utf8::npos ) | 142 | if ( utf8::read( *state->theStream, ec ) == utf8::npos ) |
304 | 143 | { | ||
305 | 143 | if ( state->theStream->good() ) { | 144 | if ( state->theStream->good() ) { |
306 | 144 | // | 145 | // |
307 | 145 | // If read() failed but the stream state is good, it means that an | 146 | // If read() failed but the stream state is good, it means that an |
308 | @@ -165,6 +166,7 @@ | |||
309 | 165 | zerr::ZOSE0003_STREAM_READ_FAILURE, ERROR_LOC( loc ) | 166 | zerr::ZOSE0003_STREAM_READ_FAILURE, ERROR_LOC( loc ) |
310 | 166 | ); | 167 | ); |
311 | 167 | } | 168 | } |
312 | 169 | } | ||
313 | 168 | state->theResult.clear(); | 170 | state->theResult.clear(); |
314 | 169 | state->theResult.push_back( utf8::next_char( p ) ); | 171 | state->theResult.push_back( utf8::next_char( p ) ); |
315 | 170 | 172 | ||
316 | @@ -2284,5 +2286,133 @@ | |||
317 | 2284 | STACK_END(state); | 2286 | STACK_END(state); |
318 | 2285 | } | 2287 | } |
319 | 2286 | 2288 | ||
320 | 2289 | /** | ||
321 | 2290 | *______________________________________________________________________ | ||
322 | 2291 | * | ||
323 | 2292 | * http://www.zorba-xquery.com/modules/string | ||
324 | 2293 | * string:split | ||
325 | 2294 | */ | ||
326 | 2295 | bool StringSplitIterator::nextImpl( | ||
327 | 2296 | store::Item_t& result, | ||
328 | 2297 | PlanState& planState) const | ||
329 | 2298 | { | ||
330 | 2299 | store::Item_t item; | ||
331 | 2300 | size_t lNewPos = 0; | ||
332 | 2301 | zstring lToken; | ||
333 | 2302 | zstring lPartialMatch; | ||
334 | 2303 | |||
335 | 2304 | StringSplitIteratorState* state; | ||
336 | 2305 | DEFAULT_STACK_INIT(StringSplitIteratorState, state, planState); | ||
337 | 2306 | |||
338 | 2307 | // init phase, get input string and tokens | ||
339 | 2308 | consumeNext(item, theChildren[0].getp(), planState); | ||
340 | 2309 | |||
341 | 2310 | if (item->isStreamable()) | ||
342 | 2311 | { | ||
343 | 2312 | state->theIStream = &item->getStream(); | ||
344 | 2313 | } | ||
345 | 2314 | else | ||
346 | 2315 | { | ||
347 | 2316 | state->theIStream = 0; | ||
348 | 2317 | item->getStringValue2(state->theInput); | ||
349 | 2318 | } | ||
350 | 2319 | |||
351 | 2320 | consumeNext(item, theChildren[1].getp(), planState); | ||
352 | 2321 | |||
353 | 2322 | item->getStringValue2(state->theSeparator); | ||
354 | 2323 | |||
355 | 2324 | // working phase, do the tokenization | ||
356 | 2325 | if (state->theIStream) | ||
357 | 2326 | { | ||
358 | 2327 | while ( !state->theIStream->eof() ) | ||
359 | 2328 | { | ||
360 | 2329 | utf8::encoded_char_type ec; | ||
361 | 2330 | memset( ec, '\0' , sizeof(ec) ); | ||
362 | 2331 | utf8::storage_type *p; | ||
363 | 2332 | p = ec; | ||
364 | 2333 | |||
365 | 2334 | if ( utf8::read( *state->theIStream, ec ) != utf8::npos ) | ||
366 | 2335 | { | ||
367 | 2336 | if (state->theSeparator.compare(lNewPos, 1, ec) == 0) | ||
368 | 2337 | { | ||
369 | 2338 | if (++lNewPos == state->theSeparator.length()) | ||
370 | 2339 | { | ||
371 | 2340 | GENV_ITEMFACTORY->createString(result, lToken); | ||
372 | 2341 | STACK_PUSH(true, state); | ||
373 | 2342 | } | ||
374 | 2343 | else | ||
375 | 2344 | { | ||
376 | 2345 | lPartialMatch.append(ec); | ||
377 | 2346 | } | ||
378 | 2347 | } | ||
379 | 2348 | else | ||
380 | 2349 | { | ||
381 | 2350 | lToken.append(lPartialMatch); | ||
382 | 2351 | lToken.append(ec); | ||
383 | 2352 | } | ||
384 | 2353 | } | ||
385 | 2354 | else | ||
386 | 2355 | { | ||
387 | 2356 | if (state->theIStream->good()) | ||
388 | 2357 | { | ||
389 | 2358 | char buf[ 6 /* bytes at most */ * 5 /* chars per byte */ ], *b = buf; | ||
390 | 2359 | bool first = true; | ||
391 | 2360 | for ( ; *p; ++p ) { | ||
392 | 2361 | if ( first ) | ||
393 | 2362 | first = false; | ||
394 | 2363 | else | ||
395 | 2364 | *b++ = ','; | ||
396 | 2365 | ::strcpy( b, "0x" ); b += 2; | ||
397 | 2366 | ::sprintf( b, "%0hhX", *p ); b += 2; | ||
398 | 2367 | } | ||
399 | 2368 | throw XQUERY_EXCEPTION( | ||
400 | 2369 | zerr::ZXQD0006_INVALID_UTF8_BYTE_SEQUENCE, | ||
401 | 2370 | ERROR_PARAMS( buf ), | ||
402 | 2371 | ERROR_LOC( loc ) | ||
403 | 2372 | ); | ||
404 | 2373 | } | ||
405 | 2374 | if (!lToken.empty()) | ||
406 | 2375 | { | ||
407 | 2376 | GENV_ITEMFACTORY->createString(result, lToken); | ||
408 | 2377 | STACK_PUSH(true, state); | ||
409 | 2378 | } | ||
410 | 2379 | break; | ||
411 | 2380 | } | ||
412 | 2381 | } | ||
413 | 2382 | } | ||
414 | 2383 | else | ||
415 | 2384 | { | ||
416 | 2385 | while (true) | ||
417 | 2386 | { | ||
418 | 2387 | if (state->theNextStartPos == zstring::npos) | ||
419 | 2388 | { | ||
420 | 2389 | break; | ||
421 | 2390 | } | ||
422 | 2391 | |||
423 | 2392 | lNewPos = | ||
424 | 2393 | state->theInput.find(state->theSeparator, state->theNextStartPos); | ||
425 | 2394 | if (lNewPos != zstring::npos) | ||
426 | 2395 | { | ||
427 | 2396 | zstring lSubStr = state->theInput.substr( | ||
428 | 2397 | state->theNextStartPos, | ||
429 | 2398 | lNewPos - state->theNextStartPos); | ||
430 | 2399 | GENV_ITEMFACTORY->createString(result, lSubStr); | ||
431 | 2400 | state->theNextStartPos = | ||
432 | 2401 | lNewPos==state->theInput.length() - state->theSeparator.length() | ||
433 | 2402 | ? zstring::npos | ||
434 | 2403 | : lNewPos + state->theSeparator.length(); | ||
435 | 2404 | } | ||
436 | 2405 | else | ||
437 | 2406 | { | ||
438 | 2407 | zstring lSubStr = state->theInput.substr(state->theNextStartPos); | ||
439 | 2408 | GENV_ITEMFACTORY->createString(result, lSubStr); | ||
440 | 2409 | state->theNextStartPos = zstring::npos; | ||
441 | 2410 | } | ||
442 | 2411 | STACK_PUSH(true, state); | ||
443 | 2412 | } | ||
444 | 2413 | } | ||
445 | 2414 | |||
446 | 2415 | STACK_END(state); | ||
447 | 2416 | } | ||
448 | 2287 | } // namespace zorba | 2417 | } // namespace zorba |
449 | 2288 | /* vim:set et sw=2 ts=2: */ | 2418 | /* vim:set et sw=2 ts=2: */ |
450 | 2289 | 2419 | ||
451 | === modified file 'src/runtime/visitors/pregenerated/planiter_visitor.h' | |||
452 | --- src/runtime/visitors/pregenerated/planiter_visitor.h 2011-12-21 14:40:33 +0000 | |||
453 | +++ src/runtime/visitors/pregenerated/planiter_visitor.h 2011-12-23 20:34:27 +0000 | |||
454 | @@ -582,6 +582,8 @@ | |||
455 | 582 | 582 | ||
456 | 583 | class StringIsStreamableIterator; | 583 | class StringIsStreamableIterator; |
457 | 584 | 584 | ||
458 | 585 | class StringSplitIterator; | ||
459 | 586 | |||
460 | 585 | class XQDocIterator; | 587 | class XQDocIterator; |
461 | 586 | 588 | ||
462 | 587 | class XQDocContentIterator; | 589 | class XQDocContentIterator; |
463 | @@ -1423,6 +1425,9 @@ | |||
464 | 1423 | virtual void beginVisit ( const StringIsStreamableIterator& ) = 0; | 1425 | virtual void beginVisit ( const StringIsStreamableIterator& ) = 0; |
465 | 1424 | virtual void endVisit ( const StringIsStreamableIterator& ) = 0; | 1426 | virtual void endVisit ( const StringIsStreamableIterator& ) = 0; |
466 | 1425 | 1427 | ||
467 | 1428 | virtual void beginVisit ( const StringSplitIterator& ) = 0; | ||
468 | 1429 | virtual void endVisit ( const StringSplitIterator& ) = 0; | ||
469 | 1430 | |||
470 | 1426 | virtual void beginVisit ( const XQDocIterator& ) = 0; | 1431 | virtual void beginVisit ( const XQDocIterator& ) = 0; |
471 | 1427 | virtual void endVisit ( const XQDocIterator& ) = 0; | 1432 | virtual void endVisit ( const XQDocIterator& ) = 0; |
472 | 1428 | 1433 | ||
473 | 1429 | 1434 | ||
474 | === modified file 'src/runtime/visitors/pregenerated/printer_visitor.cpp' | |||
475 | --- src/runtime/visitors/pregenerated/printer_visitor.cpp 2011-12-21 14:40:33 +0000 | |||
476 | +++ src/runtime/visitors/pregenerated/printer_visitor.cpp 2011-12-23 20:34:27 +0000 | |||
477 | @@ -3961,6 +3961,20 @@ | |||
478 | 3961 | // </StringIsStreamableIterator> | 3961 | // </StringIsStreamableIterator> |
479 | 3962 | 3962 | ||
480 | 3963 | 3963 | ||
481 | 3964 | // <StringSplitIterator> | ||
482 | 3965 | void PrinterVisitor::beginVisit ( const StringSplitIterator& a) { | ||
483 | 3966 | thePrinter.startBeginVisit("StringSplitIterator", ++theId); | ||
484 | 3967 | printCommons( &a, theId ); | ||
485 | 3968 | thePrinter.endBeginVisit( theId ); | ||
486 | 3969 | } | ||
487 | 3970 | |||
488 | 3971 | void PrinterVisitor::endVisit ( const StringSplitIterator& ) { | ||
489 | 3972 | thePrinter.startEndVisit(); | ||
490 | 3973 | thePrinter.endEndVisit(); | ||
491 | 3974 | } | ||
492 | 3975 | // </StringSplitIterator> | ||
493 | 3976 | |||
494 | 3977 | |||
495 | 3964 | // <XQDocIterator> | 3978 | // <XQDocIterator> |
496 | 3965 | void PrinterVisitor::beginVisit ( const XQDocIterator& a) { | 3979 | void PrinterVisitor::beginVisit ( const XQDocIterator& a) { |
497 | 3966 | thePrinter.startBeginVisit("XQDocIterator", ++theId); | 3980 | thePrinter.startBeginVisit("XQDocIterator", ++theId); |
498 | 3967 | 3981 | ||
499 | === modified file 'src/runtime/visitors/pregenerated/printer_visitor.h' | |||
500 | --- src/runtime/visitors/pregenerated/printer_visitor.h 2011-12-21 14:40:33 +0000 | |||
501 | +++ src/runtime/visitors/pregenerated/printer_visitor.h 2011-12-23 20:34:27 +0000 | |||
502 | @@ -876,6 +876,9 @@ | |||
503 | 876 | void beginVisit( const StringIsStreamableIterator& ); | 876 | void beginVisit( const StringIsStreamableIterator& ); |
504 | 877 | void endVisit ( const StringIsStreamableIterator& ); | 877 | void endVisit ( const StringIsStreamableIterator& ); |
505 | 878 | 878 | ||
506 | 879 | void beginVisit( const StringSplitIterator& ); | ||
507 | 880 | void endVisit ( const StringSplitIterator& ); | ||
508 | 881 | |||
509 | 879 | void beginVisit( const XQDocIterator& ); | 882 | void beginVisit( const XQDocIterator& ); |
510 | 880 | void endVisit ( const XQDocIterator& ); | 883 | void endVisit ( const XQDocIterator& ); |
511 | 881 | 884 | ||
512 | 882 | 885 | ||
513 | === added file 'test/rbkt/ExpQueryResults/zorba/string/tokenize01.xml.res' | |||
514 | --- test/rbkt/ExpQueryResults/zorba/string/tokenize01.xml.res 1970-01-01 00:00:00 +0000 | |||
515 | +++ test/rbkt/ExpQueryResults/zorba/string/tokenize01.xml.res 2011-12-23 20:34:27 +0000 | |||
516 | @@ -0,0 +1,1 @@ | |||
517 | 1 | a d a d | ||
518 | 0 | 2 | ||
519 | === added file 'test/rbkt/ExpQueryResults/zorba/string/tokenize02.xml.res' | |||
520 | --- test/rbkt/ExpQueryResults/zorba/string/tokenize02.xml.res 1970-01-01 00:00:00 +0000 | |||
521 | +++ test/rbkt/ExpQueryResults/zorba/string/tokenize02.xml.res 2011-12-23 20:34:27 +0000 | |||
522 | @@ -0,0 +1,1 @@ | |||
523 | 1 | a a | ||
524 | 0 | 2 | ||
525 | === added file 'test/rbkt/ExpQueryResults/zorba/string/tokenize03.xml.res' | |||
526 | --- test/rbkt/ExpQueryResults/zorba/string/tokenize03.xml.res 1970-01-01 00:00:00 +0000 | |||
527 | +++ test/rbkt/ExpQueryResults/zorba/string/tokenize03.xml.res 2011-12-23 20:34:27 +0000 | |||
528 | @@ -0,0 +1,1 @@ | |||
529 | 1 | d d | ||
530 | 0 | 2 | ||
531 | === added file 'test/rbkt/ExpQueryResults/zorba/string/tokenize04.xml.res' | |||
532 | --- test/rbkt/ExpQueryResults/zorba/string/tokenize04.xml.res 1970-01-01 00:00:00 +0000 | |||
533 | +++ test/rbkt/ExpQueryResults/zorba/string/tokenize04.xml.res 2011-12-23 20:34:27 +0000 | |||
534 | @@ -0,0 +1,1 @@ | |||
535 | 1 | abcd abcd | ||
536 | 0 | 2 | ||
537 | === added file 'test/rbkt/Queries/zorba/string/token01.txt' | |||
538 | --- test/rbkt/Queries/zorba/string/token01.txt 1970-01-01 00:00:00 +0000 | |||
539 | +++ test/rbkt/Queries/zorba/string/token01.txt 2011-12-23 20:34:27 +0000 | |||
540 | @@ -0,0 +1,1 @@ | |||
541 | 1 | abcd | ||
542 | 0 | \ No newline at end of file | 2 | \ No newline at end of file |
543 | 1 | 3 | ||
544 | === added file 'test/rbkt/Queries/zorba/string/token02.txt' | |||
545 | --- test/rbkt/Queries/zorba/string/token02.txt 1970-01-01 00:00:00 +0000 | |||
546 | +++ test/rbkt/Queries/zorba/string/token02.txt 2011-12-23 20:34:27 +0000 | |||
547 | @@ -0,0 +1,1 @@ | |||
548 | 1 | abc | ||
549 | 0 | \ No newline at end of file | 2 | \ No newline at end of file |
550 | 1 | 3 | ||
551 | === added file 'test/rbkt/Queries/zorba/string/token03.txt' | |||
552 | --- test/rbkt/Queries/zorba/string/token03.txt 1970-01-01 00:00:00 +0000 | |||
553 | +++ test/rbkt/Queries/zorba/string/token03.txt 2011-12-23 20:34:27 +0000 | |||
554 | @@ -0,0 +1,1 @@ | |||
555 | 1 | bcd | ||
556 | 0 | \ No newline at end of file | 2 | \ No newline at end of file |
557 | 1 | 3 | ||
558 | === added file 'test/rbkt/Queries/zorba/string/token04.txt' | |||
559 | --- test/rbkt/Queries/zorba/string/token04.txt 1970-01-01 00:00:00 +0000 | |||
560 | +++ test/rbkt/Queries/zorba/string/token04.txt 2011-12-23 20:34:27 +0000 | |||
561 | @@ -0,0 +1,1 @@ | |||
562 | 1 | abcd | ||
563 | 0 | \ No newline at end of file | 2 | \ No newline at end of file |
564 | 1 | 3 | ||
565 | === added file 'test/rbkt/Queries/zorba/string/tokenize01.xq' | |||
566 | --- test/rbkt/Queries/zorba/string/tokenize01.xq 1970-01-01 00:00:00 +0000 | |||
567 | +++ test/rbkt/Queries/zorba/string/tokenize01.xq 2011-12-23 20:34:27 +0000 | |||
568 | @@ -0,0 +1,5 @@ | |||
569 | 1 | import module namespace f = "http://expath.org/ns/file"; | ||
570 | 2 | import module namespace s = "http://www.zorba-xquery.com/modules/string"; | ||
571 | 3 | |||
572 | 4 | s:split(f:read-text(fn:resolve-uri("token01.txt")), "bc"), | ||
573 | 5 | s:split(s:materialize(f:read-text(fn:resolve-uri("token01.txt"))), "bc") | ||
574 | 0 | 6 | ||
575 | === added file 'test/rbkt/Queries/zorba/string/tokenize02.xq' | |||
576 | --- test/rbkt/Queries/zorba/string/tokenize02.xq 1970-01-01 00:00:00 +0000 | |||
577 | +++ test/rbkt/Queries/zorba/string/tokenize02.xq 2011-12-23 20:34:27 +0000 | |||
578 | @@ -0,0 +1,5 @@ | |||
579 | 1 | import module namespace f = "http://expath.org/ns/file"; | ||
580 | 2 | import module namespace s = "http://www.zorba-xquery.com/modules/string"; | ||
581 | 3 | |||
582 | 4 | s:split(f:read-text(fn:resolve-uri("token02.txt")), "bc"), | ||
583 | 5 | s:split(s:materialize(f:read-text(fn:resolve-uri("token02.txt"))), "bc") | ||
584 | 0 | 6 | ||
585 | === added file 'test/rbkt/Queries/zorba/string/tokenize03.xq' | |||
586 | --- test/rbkt/Queries/zorba/string/tokenize03.xq 1970-01-01 00:00:00 +0000 | |||
587 | +++ test/rbkt/Queries/zorba/string/tokenize03.xq 2011-12-23 20:34:27 +0000 | |||
588 | @@ -0,0 +1,5 @@ | |||
589 | 1 | import module namespace f = "http://expath.org/ns/file"; | ||
590 | 2 | import module namespace s = "http://www.zorba-xquery.com/modules/string"; | ||
591 | 3 | |||
592 | 4 | s:split(f:read-text(fn:resolve-uri("token03.txt")), "bc"), | ||
593 | 5 | s:split(s:materialize(f:read-text(fn:resolve-uri("token03.txt"))), "bc") | ||
594 | 0 | 6 | ||
595 | === added file 'test/rbkt/Queries/zorba/string/tokenize04.xq' | |||
596 | --- test/rbkt/Queries/zorba/string/tokenize04.xq 1970-01-01 00:00:00 +0000 | |||
597 | +++ test/rbkt/Queries/zorba/string/tokenize04.xq 2011-12-23 20:34:27 +0000 | |||
598 | @@ -0,0 +1,5 @@ | |||
599 | 1 | import module namespace f = "http://expath.org/ns/file"; | ||
600 | 2 | import module namespace s = "http://www.zorba-xquery.com/modules/string"; | ||
601 | 3 | |||
602 | 4 | s:split(f:read-text(fn:resolve-uri("token04.txt")), "f"), | ||
603 | 5 | s:split(s:materialize(f:read-text(fn:resolve-uri("token04.txt"))), "f") |
Validation queue starting for merge proposal. zorbatest. lambda. nu:8080/ remotequeue/ tokenize- 2011-12- 21T21-46- 05.289Z/ log.html
Log at: http://