Merge lp:~paul-lucas/zorba/bug-898075 into lp:zorba

Proposed by Paul J. Lucas
Status: Merged
Approved by: Matthias Brantner
Approved revision: 10581
Merged at revision: 10584
Proposed branch: lp:~paul-lucas/zorba/bug-898075
Merge into: lp:zorba
Diff against target: 279 lines (+132/-17)
10 files modified
include/zorba/pregenerated/diagnostic_list.h (+2/-0)
modules/com/zorba-xquery/www/modules/pregenerated/errors.xq (+4/-0)
src/diagnostics/diagnostic_en.xml (+4/-0)
src/diagnostics/pregenerated/diagnostic_list.cpp (+3/-0)
src/diagnostics/pregenerated/dict_en.cpp (+1/-0)
src/runtime/spec/strings/strings.xml (+2/-1)
src/runtime/strings/pregenerated/strings.h (+1/-0)
src/runtime/strings/strings_impl.cpp (+71/-16)
src/util/utf8_util.cpp (+17/-0)
src/util/utf8_util_base.h (+27/-0)
To merge this branch: bzr merge lp:~paul-lucas/zorba/bug-898075
Reviewer Review Type Date Requested Status
William Candillon Approve
Paul J. Lucas Approve
Matthias Brantner Pending
Review via email: mp+85410@code.launchpad.net

Commit message

Applied William's patch; patched William's patch to handle UTF-8 properly.

Description of the change

Applied William's patch; patched William's patch to handle UTF-8 properly.

To post a comment you must log in.
Revision history for this message
Paul J. Lucas (paul-lucas) :
review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

The attempt to merge lp:~paul-lucas/zorba/bug-898075 into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job bug-898075-2011-12-12T23-21-17.547Z is finished. The
  final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
William Candillon (wcandillon) wrote :

I tested with a streaming and non streaming string, worked great.

review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job bug-898075-2011-12-15T19-46-14.188Z is finished. The final status was:

All tests succeeded!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'include/zorba/pregenerated/diagnostic_list.h'
2--- include/zorba/pregenerated/diagnostic_list.h 2011-11-15 08:23:20 +0000
3+++ include/zorba/pregenerated/diagnostic_list.h 2011-12-12 23:19:26 +0000
4@@ -458,6 +458,8 @@
5
6 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQD0005_INVALID_KEY_FOR_MAP;
7
8+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQD0006_INVALID_UTF8_BYTE_SEQUENCE;
9+
10 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZAPI0002_XQUERY_COMPILATION_FAILED;
11
12 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZAPI0003_XQUERY_NOT_COMPILED;
13
14=== modified file 'modules/com/zorba-xquery/www/modules/pregenerated/errors.xq'
15--- modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2011-11-15 08:23:20 +0000
16+++ modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2011-12-12 23:19:26 +0000
17@@ -217,6 +217,10 @@
18
19 (:~
20 :)
21+declare variable $zerr:ZXQD0006 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQD0006");
22+
23+(:~
24+:)
25 declare variable $zerr:ZAPI0002 as xs:QName := fn:QName($zerr:NS, "zerr:ZAPI0002");
26
27 (:~
28
29=== modified file 'src/diagnostics/diagnostic_en.xml'
30--- src/diagnostics/diagnostic_en.xml 2011-12-07 20:46:23 +0000
31+++ src/diagnostics/diagnostic_en.xml 2011-12-12 23:19:26 +0000
32@@ -1722,6 +1722,10 @@
33 <value>key with type $1 not subtype or castable to target type $2 of map ($3)</value>
34 </diagnostic>
35
36+ <diagnostic code="ZXQD0006" name="INVALID_UTF8_BYTE_SEQUENCE">
37+ <value>"$1": invalid UTF-8 byte sequence</value>
38+ </diagnostic>
39+
40 <!--////////// Zorba API Errors ////////////////////////////////////////-->
41
42 <diagnostic code="ZAPI0002" name="XQUERY_COMPILATION_FAILED">
43
44=== modified file 'src/diagnostics/pregenerated/diagnostic_list.cpp'
45--- src/diagnostics/pregenerated/diagnostic_list.cpp 2011-11-15 08:23:20 +0000
46+++ src/diagnostics/pregenerated/diagnostic_list.cpp 2011-12-12 23:19:26 +0000
47@@ -666,6 +666,9 @@
48 ZorbaErrorCode ZXQD0005_INVALID_KEY_FOR_MAP( "ZXQD0005" );
49
50
51+ZorbaErrorCode ZXQD0006_INVALID_UTF8_BYTE_SEQUENCE( "ZXQD0006" );
52+
53+
54 ZorbaErrorCode ZAPI0002_XQUERY_COMPILATION_FAILED( "ZAPI0002" );
55
56
57
58=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
59--- src/diagnostics/pregenerated/dict_en.cpp 2011-12-01 16:19:52 +0000
60+++ src/diagnostics/pregenerated/dict_en.cpp 2011-12-12 23:19:26 +0000
61@@ -365,6 +365,7 @@
62 { "ZXQD0003", "inconsistent options to the parse-xml-fragment() function: $1" },
63 { "ZXQD0004", "invalid parameter: $1" },
64 { "ZXQD0005", "key with type $1 not subtype or castable to target type $2 of map ($3)" },
65+ { "ZXQD0006", "\"$1\": invalid UTF-8 byte sequence" },
66 { "ZXQP0000", "no error" },
67 { "ZXQP0001", "dynamic runtime error${: 1}" },
68 { "ZXQP0002", "\"$1\": assertion failed" },
69
70=== modified file 'src/runtime/spec/strings/strings.xml'
71--- src/runtime/spec/strings/strings.xml 2011-12-01 11:02:25 +0000
72+++ src/runtime/spec/strings/strings.xml 2011-12-12 23:19:26 +0000
73@@ -57,7 +57,8 @@
74 <zorba:member type="xs_unsignedInt" name="theIterator"
75 brief="the current iterator"/>
76 <zorba:member type="checked_vector&lt;xs_unsignedInt&gt;" name="theResult"
77- brief="the resulting vector"/>
78+ brief="the resulting vector"/>
79+ <zorba:member type="std::istream*" name="theStream" />
80 </zorba:state>
81
82 </zorba:iterator>
83
84=== modified file 'src/runtime/strings/pregenerated/strings.h'
85--- src/runtime/strings/pregenerated/strings.h 2011-12-01 11:02:25 +0000
86+++ src/runtime/strings/pregenerated/strings.h 2011-12-12 23:19:26 +0000
87@@ -82,6 +82,7 @@
88 public:
89 xs_unsignedInt theIterator; //the current iterator
90 checked_vector<xs_unsignedInt> theResult; //the resulting vector
91+ std::istream* theStream; //
92
93 StringToCodepointsIteratorState();
94
95
96=== modified file 'src/runtime/strings/strings_impl.cpp'
97--- src/runtime/strings/strings_impl.cpp 2011-12-01 16:19:52 +0000
98+++ src/runtime/strings/strings_impl.cpp 2011-12-12 23:19:26 +0000
99@@ -120,22 +120,76 @@
100
101 if (consumeNext(item, theChildren [0].getp(), planState ))
102 {
103- item->getStringValue2(inputStr);
104-
105- if (!inputStr.empty())
106- {
107- utf8::to_codepoints(inputStr, &state->theResult);
108-
109- while (state->theIterator < state->theResult.size())
110- {
111- GENV_ITEMFACTORY->createInteger(
112- result,
113- Integer(state->theResult[state->theIterator])
114- );
115-
116- STACK_PUSH(true, state );
117- state->theIterator = state->theIterator + 1;
118- }
119+ if(!item->isStreamable())
120+ {
121+ item->getStringValue2(inputStr);
122+ }
123+ else
124+ {
125+ state->theStream = &item->getStream();
126+ }
127+ }
128+
129+ if ( state->theStream )
130+ {
131+ while ( !state->theStream->eof() )
132+ {
133+ utf8::encoded_char_type ec;
134+ ::bzero( ec, sizeof( ec ) );
135+ utf8::storage_type *p;
136+ p = ec;
137+
138+ if ( utf8::read( *state->theStream, ec ) == utf8::npos )
139+ if ( state->theStream->good() ) {
140+ //
141+ // If read() failed but the stream state is good, it means that an
142+ // invalid byte was encountered.
143+ //
144+ char buf[ 6 /* bytes at most */ * 5 /* chars per byte */ ], *b = buf;
145+ bool first = true;
146+ for ( ; *p; ++p ) {
147+ if ( first )
148+ first = false;
149+ else
150+ *b++ = ',';
151+ ::strcpy( b, "0x" ); b += 2;
152+ ::sprintf( b, "%0hhX", *p ); b += 2;
153+ }
154+ throw XQUERY_EXCEPTION(
155+ zerr::ZXQD0006_INVALID_UTF8_BYTE_SEQUENCE,
156+ ERROR_PARAMS( buf ),
157+ ERROR_LOC( loc )
158+ );
159+ } else {
160+ throw XQUERY_EXCEPTION(
161+ zerr::ZOSE0003_STREAM_READ_FAILURE, ERROR_LOC( loc )
162+ );
163+ }
164+ state->theResult.clear();
165+ state->theResult.push_back( utf8::next_char( p ) );
166+
167+ GENV_ITEMFACTORY->createInteger(
168+ result,
169+ Integer(state->theResult[0])
170+ );
171+
172+ STACK_PUSH(true, state );
173+ state->theIterator = state->theIterator + 1;
174+ }
175+ }
176+ else if (!inputStr.empty())
177+ {
178+ utf8::to_codepoints(inputStr, &state->theResult);
179+
180+ while (state->theIterator < state->theResult.size())
181+ {
182+ GENV_ITEMFACTORY->createInteger(
183+ result,
184+ Integer(state->theResult[state->theIterator])
185+ );
186+
187+ STACK_PUSH(true, state );
188+ state->theIterator = state->theIterator + 1;
189 }
190 }
191 STACK_END (state);
192@@ -146,6 +200,7 @@
193 {
194 PlanIteratorState::init(planState);
195 theIterator = 0;
196+ theStream = 0;
197 theResult.clear();
198 }
199
200
201=== modified file 'src/util/utf8_util.cpp'
202--- src/util/utf8_util.cpp 2011-07-17 00:10:56 +0000
203+++ src/util/utf8_util.cpp 2011-12-12 23:19:26 +0000
204@@ -22,6 +22,7 @@
205 #include "cxx_util.h"
206 #include "utf8_util.h"
207
208+using namespace std;
209 #ifndef ZORBA_NO_UNICODE
210 U_NAMESPACE_USE
211 #endif /* ZORBA_NO_UNICODE */
212@@ -152,6 +153,22 @@
213 return len;
214 }
215
216+size_type read( istream &i, storage_type **ps ) {
217+ char c = i.get();
218+ if ( !i.good() || !is_start_byte( c ) )
219+ return npos;
220+ storage_type *&p = *ps;
221+ *p++ = c;
222+ size_type const len = char_length( c );
223+ for ( size_type n = 1; n < len; ++n ) {
224+ c = i.get();
225+ if ( !i.good() || !is_continuation_byte( c ) )
226+ return npos;
227+ *p++ = c;
228+ }
229+ return len;
230+}
231+
232 #ifndef ZORBA_NO_UNICODE
233
234 bool to_string( unicode::char_type const *in, unicode::size_type in_len,
235
236=== modified file 'src/util/utf8_util_base.h'
237--- src/util/utf8_util_base.h 2011-12-01 16:19:52 +0000
238+++ src/util/utf8_util_base.h 2011-12-12 23:19:26 +0000
239@@ -18,6 +18,7 @@
240 #define ZORBA_UTF8_UTIL_BASE_H
241
242 #include <cstddef>
243+#include <iostream>
244 #include <iterator>
245 #include <stdexcept>
246
247@@ -164,6 +165,32 @@
248 template<class OctetIterator>
249 unicode::code_point prev_char( OctetIterator &i );
250
251+/**
252+ * Reads bytes from an istream until an entire UTF-8 character has been read.
253+ *
254+ * @param i The istream to read from.
255+ * @param ps A pointer to a pointer to what will be the first byte of a UTF-8
256+ * byte sequence. The pointer is advanced to one byte past the newly read
257+ * character.
258+ * @return Returns the number of bytes comprising the UTF-8 character (which
259+ * equals the number of bytes read) or \c npos if either EOF was reached or the
260+ * bytes read are an invalid UTF-8 byte sequence.
261+ */
262+size_type read( std::istream &i, storage_type **ps );
263+
264+/**
265+ * Reads bytes from an istream until an entire UTF-8 character has been read.
266+ *
267+ * @param i The istream to read from.
268+ * @param p A pointer to what will be the first byte of a UTF-8 byte sequence.
269+ * @return Returns the number of bytes comprising the UTF-8 character (which
270+ * equals the number of bytes read) or \c npos if either EOF was reached or the
271+ * bytes read are an invalid UTF-8 byte sequence.
272+ */
273+inline size_type read( std::istream &i, storage_type *p ) {
274+ return read( i, &p );
275+}
276+
277 ////////// Character access ///////////////////////////////////////////////////
278
279 /**

Subscribers

People subscribed via source and target branches