Merge lp:~zorba-coders/zorba/feature-transcode_streambuf into lp:zorba

Proposed by Paul J. Lucas
Status: Merged
Approved by: Matthias Brantner
Approved revision: 10660
Merged at revision: 10663
Proposed branch: lp:~zorba-coders/zorba/feature-transcode_streambuf
Merge into: lp:zorba
Diff against target: 2967 lines (+1874/-555)
37 files modified
ChangeLog (+4/-0)
include/zorba/internal/proxy.h (+48/-0)
include/zorba/pregenerated/diagnostic_list.h (+4/-0)
include/zorba/transcode_stream.h (+213/-0)
modules/ExternalModules.conf (+1/-1)
modules/com/zorba-xquery/www/modules/http-client.xq (+2/-2)
modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.cpp (+337/-338)
modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.h (+164/-143)
modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.cpp (+71/-21)
modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.h (+10/-6)
modules/com/zorba-xquery/www/modules/pregenerated/errors.xq (+8/-0)
modules/org/expath/ns/file.xq.src/file.cpp (+25/-10)
modules/org/expath/ns/file.xq.src/file_function.cpp (+0/-5)
modules/org/expath/ns/file.xq.src/file_function.h (+5/-9)
modules/org/expath/ns/file.xq.src/file_module.cpp (+2/-5)
modules/org/expath/ns/file.xq.src/file_module.h (+13/-6)
src/api/CMakeLists.txt (+1/-0)
src/api/transcode_streambuf.cpp (+102/-0)
src/diagnostics/diagnostic_en.xml (+8/-0)
src/diagnostics/pregenerated/diagnostic_list.cpp (+6/-0)
src/diagnostics/pregenerated/dict_en.cpp (+2/-0)
src/unit_tests/CMakeLists.txt (+4/-6)
src/unit_tests/test_icu_streambuf.cpp (+151/-0)
src/unit_tests/unit_test_list.h (+5/-0)
src/unit_tests/unit_tests.cpp (+3/-0)
src/util/CMakeLists.txt (+6/-1)
src/util/icu_streambuf.cpp (+300/-0)
src/util/icu_streambuf.h (+140/-0)
src/util/passthru_streambuf.cpp (+105/-0)
src/util/passthru_streambuf.h (+76/-0)
src/util/transcode_streambuf.h (+47/-0)
test/rbkt/ExpQueryResults/zorba/file/cp1252.xml.res (+1/-0)
test/rbkt/Queries/zorba/file/cp1252.txt (+1/-0)
test/rbkt/Queries/zorba/file/cp1252.xq (+3/-0)
test/rbkt/Queries/zorba/file/invalid_encoding.spec (+1/-0)
test/rbkt/Queries/zorba/file/invalid_encoding.xq (+3/-0)
test/rbkt/Queries/zorba/http-client/send-request/http2-read-svg.xq (+2/-2)
To merge this branch: bzr merge lp:~zorba-coders/zorba/feature-transcode_streambuf
Reviewer Review Type Date Requested Status
Matthias Brantner Approve
Paul J. Lucas Approve
Review via email: mp+93327@code.launchpad.net

This proposal supersedes a proposal from 2012-02-08.

Commit message

- Added transcode_streambuf
- file:read-text now respects encodings
- http:send-request now respects encodings

Description of the change

Added transcode_streambuf.

To post a comment you must log in.
Revision history for this message
Paul J. Lucas (paul-lucas) : Posted in a previous version of this proposal
review: Approve
Revision history for this message
Matthias Brantner (matthias-brantner) : Posted in a previous version of this proposal
review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

The attempt to merge lp:~zorba-coders/zorba/feature-transcode_streambuf into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:274 (message):
  Validation queue job feature-transcode_streambuf-2012-02-08T19-21-05.882Z
  is finished. The final status was:

  3 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Paul J. Lucas (paul-lucas) : Posted in a previous version of this proposal
review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

The attempt to merge lp:~zorba-coders/zorba/feature-transcode_streambuf into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:274 (message):
  Validation queue job feature-transcode_streambuf-2012-02-15T16-29-00.272Z
  is finished. The final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

Attempt to merge into lp:zorba failed due to conflicts:

text conflict in src/unit_tests/unit_test_list.h
text conflict in src/unit_tests/unit_tests.cpp

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote : Posted in a previous version of this proposal

There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.

Revision history for this message
Paul J. Lucas (paul-lucas) :
review: Approve
Revision history for this message
Matthias Brantner (matthias-brantner) :
review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job feature-transcode_streambuf-2012-02-16T03-54-11Z is finished. The final status was:

All tests succeeded!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog'
2--- ChangeLog 2012-02-16 00:52:25 +0000
3+++ ChangeLog 2012-02-16 02:19:18 +0000
4@@ -34,6 +34,10 @@
5 * zerr is not predeclared anymore to be http://www.zorba-xquery.com/errors
6 * Add new XQuery interface for the PHP bindings.
7 * Added API method Item::getNamespaceBindings().
8+ * Added a transcoding streambuffer to the API which allows transcoding arbitrary encodings
9+ from and to UTF-8
10+ * file:read-text is able to handle arbitrary encodings (fixes bug #867159)
11+ * http:send-request is able to handle arbitrary encodings
12 * Fixed bug #917981 (disallow declaring same module twice).
13 * Added API method StaticContext::getNamespaceBindings() (see bug #905035)
14 * Deprecated StaticContext:getNamespaceURIByPrefix()
15
16=== added file 'include/zorba/internal/proxy.h'
17--- include/zorba/internal/proxy.h 1970-01-01 00:00:00 +0000
18+++ include/zorba/internal/proxy.h 2012-02-16 02:19:18 +0000
19@@ -0,0 +1,48 @@
20+/*
21+ * Copyright 2006-2008 The FLWOR Foundation.
22+ *
23+ * Licensed under the Apache License, Version 2.0 (the "License");
24+ * you may not use this file except in compliance with the License.
25+ * You may obtain a copy of the License at
26+ *
27+ * http://www.apache.org/licenses/LICENSE-2.0
28+ *
29+ * Unless required by applicable law or agreed to in writing, software
30+ * distributed under the License is distributed on an "AS IS" BASIS,
31+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
32+ * See the License for the specific language governing permissions and
33+ * limitations under the License.
34+ */
35+
36+#ifndef ZORBA_INTERNAL_PROXY_H
37+#define ZORBA_INTERNAL_PROXY_H
38+
39+namespace zorba {
40+namespace internal {
41+namespace ztd {
42+
43+///////////////////////////////////////////////////////////////////////////////
44+
45+/**
46+ * \internal
47+ * A %proxy<T> is-a \c T that also contains a T* -- a pointer to the original.
48+ */
49+template<class OriginalType>
50+class proxy : public OriginalType {
51+public:
52+ proxy( OriginalType *p ) : original_( p ) { }
53+
54+ OriginalType* original() const {
55+ return original_;
56+ }
57+private:
58+ OriginalType *original_;
59+};
60+
61+///////////////////////////////////////////////////////////////////////////////
62+
63+} // namespace ztd
64+} // namespace internal
65+} // namespace zorba
66+#endif /* ZORBA_INTERNAL_PROXY_H */
67+/* vim:set et sw=2 ts=2: */
68
69=== modified file 'include/zorba/pregenerated/diagnostic_list.h'
70--- include/zorba/pregenerated/diagnostic_list.h 2012-01-26 01:35:11 +0000
71+++ include/zorba/pregenerated/diagnostic_list.h 2012-02-16 02:19:18 +0000
72@@ -392,6 +392,8 @@
73
74 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP0005_NOT_ENABLED;
75
76+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP0006_UNKNOWN_ENCODING;
77+
78 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP0007_FUNCTION_SIGNATURE_NOT_EQUAL;
79
80 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP0008_FUNCTION_IMPL_NOT_FOUND;
81@@ -684,6 +686,8 @@
82
83 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZOSE0005_DLL_LOAD_FAILED;
84
85+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZOSE0006_TRANSCODING_ERROR;
86+
87 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZSTR0001_INDEX_ALREADY_EXISTS;
88
89 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZSTR0002_INDEX_DOES_NOT_EXIST;
90
91=== added file 'include/zorba/transcode_stream.h'
92--- include/zorba/transcode_stream.h 1970-01-01 00:00:00 +0000
93+++ include/zorba/transcode_stream.h 2012-02-16 02:19:18 +0000
94@@ -0,0 +1,213 @@
95+/*
96+ * Copyright 2006-2008 The FLWOR Foundation.
97+ *
98+ * Licensed under the Apache License, Version 2.0 (the "License");
99+ * you may not use this file except in compliance with the License.
100+ * You may obtain a copy of the License at
101+ *
102+ * http://www.apache.org/licenses/LICENSE-2.0
103+ *
104+ * Unless required by applicable law or agreed to in writing, software
105+ * distributed under the License is distributed on an "AS IS" BASIS,
106+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
107+ * See the License for the specific language governing permissions and
108+ * limitations under the License.
109+ */
110+
111+#ifndef ZORBA_TRANSCODE_STREAM_API_H
112+#define ZORBA_TRANSCODE_STREAM_API_H
113+
114+#include <stdexcept>
115+#include <streambuf>
116+#include <string>
117+
118+#include <zorba/config.h>
119+#include <zorba/internal/proxy.h>
120+#include <zorba/internal/unique_ptr.h>
121+
122+namespace zorba {
123+
124+typedef internal::ztd::proxy<std::streambuf> proxy_streambuf;
125+
126+namespace transcode {
127+
128+///////////////////////////////////////////////////////////////////////////////
129+
130+/**
131+ * A %transcode::streambuf is-a std::streambuf for transcoding character
132+ * encodings from/to UTF-8 on-the-fly.
133+ *
134+ * To use it, replace a stream's streambuf:
135+ * \code
136+ * istream is;
137+ * // ...
138+ * transcode::streambuf tbuf( "ISO-8859-1", is.rdbuf() );
139+ * is.ios::rdbuf( &tbuf );
140+ * \endcode
141+ * Note that the %transcode::streambuf must exist for as long as it's being used
142+ * by the stream. If you are replacing the streabuf for a stream you did not
143+ * create, you should set it back to the original streambuf:
144+ * \code
145+ * void f( ostream &os ) {
146+ * transcode::streambuf tbuf( "ISO-8859-1", os.rdbuf() );
147+ * try {
148+ * os.ios::rdbuf( &tbuf );
149+ * // ...
150+ * }
151+ * catch ( ... ) {
152+ * os.ios::rdbuf( tbuf.orig_streambuf() );
153+ * throw;
154+ * }
155+ * }
156+ * \endcode
157+ *
158+ * While %transcode::streambuf does support seeking, the positions are relative
159+ * to the original byte stream.
160+ */
161+class ZORBA_DLL_PUBLIC streambuf : public std::streambuf {
162+public:
163+ /**
164+ * Constructs a %transcode::streambuf.
165+ *
166+ * @param charset The name of the character encoding to convert from/to.
167+ * @param orig The original streambuf to read/write from/to.
168+ * @throws std::invalid_argument if either \a charset is not supported or
169+ * \a orig is null.
170+ */
171+ streambuf( char const *charset, std::streambuf *orig );
172+
173+ /**
174+ * Destructs a %transcode::streambuf.
175+ */
176+ ~streambuf();
177+
178+ /**
179+ * Gets the original streambuf.
180+ *
181+ * @return said streambuf.
182+ */
183+ std::streambuf* orig_streambuf() const {
184+ return proxy_buf_->original();
185+ }
186+
187+protected:
188+ void imbue( std::locale const& );
189+ pos_type seekoff( off_type, std::ios_base::seekdir, std::ios_base::openmode );
190+ pos_type seekpos( pos_type, std::ios_base::openmode );
191+ std::streambuf* setbuf( char_type*, std::streamsize );
192+ std::streamsize showmanyc();
193+ int sync();
194+ int_type overflow( int_type );
195+ int_type pbackfail( int_type );
196+ int_type uflow();
197+ int_type underflow();
198+ std::streamsize xsgetn( char_type*, std::streamsize );
199+ std::streamsize xsputn( char_type const*, std::streamsize );
200+
201+private:
202+ std::unique_ptr<proxy_streambuf> proxy_buf_;
203+
204+ // forbid
205+ streambuf( streambuf const& );
206+ streambuf& operator=( streambuf const& );
207+};
208+
209+///////////////////////////////////////////////////////////////////////////////
210+
211+/**
212+ * A %transcode::stream is used to wrap a C++ standard I/O stream with a
213+ * transcode::streambuf so that transcoding and the management of the streambuf
214+ * happens automatically.
215+ *
216+ * @tparam StreamType The I/O stream class type to wrap. It must be a concrete
217+ * stream class.
218+ */
219+template<class StreamType>
220+class stream : public StreamType {
221+public:
222+ /**
223+ * Constructs a %transcode::stream.
224+ *
225+ * @param charset The name of the character encoding to convert from/to.
226+ * @throws std::invalid_argument if \a charset is not supported.
227+ */
228+ stream( char const *charset ) :
229+ tbuf_( charset, this->rdbuf() )
230+ {
231+ init();
232+ }
233+
234+ /**
235+ * Constructs a %stream.
236+ *
237+ * @tparam StreamArgType The type of the first argument of \a StreamType's
238+ * constructor.
239+ * @param charset The name of the character encoding to convert from/to.
240+ * @param stream_arg The argument to pass as the first argument to
241+ * \a StreamType's constructor.
242+ * @throws std::invalid_argument if \a charset is not supported.
243+ */
244+ template<typename StreamArgType>
245+ stream( char const *charset, StreamArgType stream_arg ) :
246+ StreamType( stream_arg ),
247+ tbuf_( charset, this->rdbuf() )
248+ {
249+ init();
250+ }
251+
252+ /**
253+ * Constructs a %transcode::stream.
254+ *
255+ * @tparam StreamArgType The type of the first argument of \a StreamType's
256+ * constructor.
257+ * @param charset The name of the character encoding to convert from/to.
258+ * @param stream_arg The argument to pass as the first argument to
259+ * \a StreamType's constructor.
260+ * @param mode The open-mode to pass to \a StreamType's constructor.
261+ * @throws std::invalid_argument if \a charset is not supported.
262+ */
263+ template<typename StreamArgType>
264+ stream( char const *charset, StreamArgType stream_arg,
265+ std::ios_base::openmode mode ) :
266+ StreamType( stream_arg, mode ),
267+ tbuf_( charset, this->rdbuf() )
268+ {
269+ init();
270+ }
271+
272+private:
273+ streambuf tbuf_;
274+
275+ void init() {
276+ this->std::ios::rdbuf( &tbuf_ );
277+ }
278+};
279+
280+///////////////////////////////////////////////////////////////////////////////
281+
282+/**
283+ * Checks whether it would be necessary to transcode from the given character
284+ * encoding to UTF-8.
285+ *
286+ * @param charset The name of the character encoding to check.
287+ * @return \c true only if it would be necessary to transcode from the given
288+ * character encoding to UTF-8.
289+ */
290+ZORBA_DLL_PUBLIC
291+bool is_necessary( char const *charset );
292+
293+/**
294+ * Checks whether the given character set is supported for transcoding.
295+ *
296+ * @param charset The name of the character encoding to check.
297+ * @return \c true only if the character encoding is supported.
298+ */
299+ZORBA_DLL_PUBLIC
300+bool is_supported( char const *charset );
301+
302+///////////////////////////////////////////////////////////////////////////////
303+
304+} // namespace transcode
305+} // namespace zorba
306+#endif /* ZORBA_TRANSCODE_STREAM_API_H */
307+/* vim:set et sw=2 ts=2: */
308
309=== modified file 'modules/ExternalModules.conf'
310--- modules/ExternalModules.conf 2012-02-16 00:52:25 +0000
311+++ modules/ExternalModules.conf 2012-02-16 02:19:18 +0000
312@@ -32,7 +32,7 @@
313 email bzr lp:zorba/email-module zorba-2.1
314 excel bzr lp:zorba/excel-module zorba-2.1
315 geo bzr lp:zorba/geo-module zorba-2.1
316-http-client bzr lp:zorba/http-client-module 1.0
317+http-client bzr lp:zorba/http-client-module
318 image bzr lp:zorba/image-module zorba-2.1
319 languages bzr lp:zorba/languages-module zorba-2.1
320 oauth bzr lp:zorba/oauth-module zorba-2.1
321
322=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq'
323--- modules/com/zorba-xquery/www/modules/http-client.xq 2011-08-26 23:36:24 +0000
324+++ modules/com/zorba-xquery/www/modules/http-client.xq 2012-02-16 02:19:18 +0000
325@@ -354,7 +354,7 @@
326 :)
327 declare %ann:nondeterministic function http:get-node($href as xs:string) as item()+
328 {
329- http:http-nondeterministic-impl(validate {<http-schema:request method="GET" href="{$href}" follow-redirect="true" override-media-type="text/xml"/>}, (), ())
330+ http:http-nondeterministic-impl(validate {<http-schema:request method="GET" href="{$href}" follow-redirect="true" override-media-type="text/xml; charset=utf-8"/>}, (), ())
331 };
332
333 (:~
334@@ -374,7 +374,7 @@
335 :)
336 declare %ann:nondeterministic function http:get-text($href as xs:string) as item()+
337 {
338- http:http-nondeterministic-impl(validate {<http-schema:request method="GET" href="{$href}" follow-redirect="true" override-media-type="text/plain"/>}, (), ())
339+ http:http-nondeterministic-impl(validate {<http-schema:request method="GET" href="{$href}" follow-redirect="true" override-media-type="text/plain; charset=utf-8"/>}, (), ())
340 };
341
342 (:~
343
344=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.cpp'
345--- modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.cpp 2011-07-29 08:12:36 +0000
346+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.cpp 2012-02-16 02:19:18 +0000
347@@ -21,6 +21,7 @@
348 #include <iostream>
349 #include <cassert>
350 #ifndef WIN32
351+#include <cerrno>
352 #include <sys/time.h>
353 #endif /* WIN32 */
354
355@@ -32,349 +33,347 @@
356 using namespace std;
357
358 namespace zorba {
359- namespace curl {
360-
361- ///////////////////////////////////////////////////////////////////////////////
362-
363+namespace curl {
364+
365+///////////////////////////////////////////////////////////////////////////////
366+
367 #define ZORBA_CURL_ASSERT(expr) \
368-do { \
369-if ( CURLcode const code##__LINE__ = (expr) ) \
370-throw exception( #expr, "", code##__LINE__ ); \
371-} while (0)
372-
373+ do { \
374+ if ( CURLcode const code##__LINE__ = (expr) ) \
375+ throw exception( #expr, "", code##__LINE__ ); \
376+ } while (0)
377+
378 #define ZORBA_CURLM_ASSERT(expr) \
379-do { \
380-if ( CURLMcode const code##__LINE__ = (expr) ) \
381-if ( code##__LINE__ != CURLM_CALL_MULTI_PERFORM ) \
382-throw exception( #expr, "", code##__LINE__ ); \
383-} while (0)
384-
385- exception::exception( char const *function, char const *uri, char const *msg ) :
386- std::exception(), theMessage(msg)
387- {
388- }
389-
390- exception::exception( char const *function, char const *uri, CURLcode code ) :
391- std::exception(), theMessage(curl_easy_strerror(code))
392- {
393- }
394-
395- exception::exception( char const *function, char const *uri, CURLMcode code ) :
396- std::exception(), theMessage(curl_multi_strerror(code))
397- {
398- }
399-
400- const char* exception::what() const throw() {
401- return theMessage;
402- }
403-
404-
405- ///////////////////////////////////////////////////////////////////////////////
406-
407- CURL* create( char const *uri, write_fn_t fn, void *data ) {
408- //
409- // Having cURL initialization wrapped by a class and using a singleton static
410- // instance guarantees that cURL is initialized exactly once before use and
411- // and also is cleaned-up at program termination (when destructors for static
412- // objects are called).
413- //
414- struct curl_initializer {
415- curl_initializer() {
416- ZORBA_CURL_ASSERT( curl_global_init( CURL_GLOBAL_ALL ) );
417- }
418- ~curl_initializer() {
419- curl_global_cleanup();
420- }
421- };
422- static curl_initializer initializer;
423-
424- CURL *const curl = curl_easy_init();
425- if ( !curl )
426- throw exception( "curl_easy_init()", uri, "" );
427-
428- try {
429- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_URL, uri ) );
430- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEDATA, data ) );
431- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, fn ) );
432-
433- // Tells cURL to follow redirects. CURLOPT_MAXREDIRS is by default set to -1
434- // thus cURL will do an infinite number of redirects.
435- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_FOLLOWLOCATION, 1 ) );
436-
437+ do { \
438+ if ( CURLMcode const code##__LINE__ = (expr) ) \
439+ if ( code##__LINE__ != CURLM_CALL_MULTI_PERFORM ) \
440+ throw exception( #expr, "", code##__LINE__ ); \
441+ } while (0)
442+
443+exception::exception( char const *function, char const *uri, char const *msg ) :
444+ std::exception(), msg_( msg )
445+{
446+}
447+
448+exception::exception( char const *function, char const *uri, CURLcode code ) :
449+ std::exception(),
450+ msg_( curl_easy_strerror( code ) )
451+{
452+}
453+
454+exception::exception( char const *function, char const *uri, CURLMcode code ) :
455+ std::exception(),
456+ msg_( curl_multi_strerror( code ) )
457+{
458+}
459+
460+exception::~exception() throw() {
461+ // out-of-line since it's virtual
462+}
463+
464+const char* exception::what() const throw() {
465+ return msg_.c_str();
466+}
467+
468+///////////////////////////////////////////////////////////////////////////////
469+
470+CURL* create( char const *uri, write_fn_t fn, void *data ) {
471+ //
472+ // Having cURL initialization wrapped by a class and using a singleton static
473+ // instance guarantees that cURL is initialized exactly once before use and
474+ // and also is cleaned-up at program termination (when destructors for static
475+ // objects are called).
476+ //
477+ struct curl_initializer {
478+ curl_initializer() {
479+ ZORBA_CURL_ASSERT( curl_global_init( CURL_GLOBAL_ALL ) );
480+ }
481+ ~curl_initializer() {
482+ curl_global_cleanup();
483+ }
484+ };
485+ static curl_initializer initializer;
486+
487+ CURL *const curl = curl_easy_init();
488+ if ( !curl )
489+ throw exception( "curl_easy_init()", uri, "" );
490+
491+ try {
492+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_URL, uri ) );
493+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEDATA, data ) );
494+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, fn ) );
495+
496+ // Tells cURL to follow redirects. CURLOPT_MAXREDIRS is by default set to -1
497+ // thus cURL will do an infinite number of redirects.
498+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_FOLLOWLOCATION, 1 ) );
499+
500 #ifndef ZORBA_VERIFY_PEER_SSL_CERTIFICATE
501- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_SSL_VERIFYPEER, 0 ) );
502- //
503- // CURLOPT_SSL_VERIFYHOST is left default, value 2, meaning verify that the
504- // Common Name or Subject Alternate Name field in the certificate matches
505- // the name of the server.
506- //
507- // Tested with https://www.npr.org/rss/rss.php?id=1001
508- // About using SSL certs in curl: http://curl.haxx.se/docs/sslcerts.html
509+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_SSL_VERIFYPEER, 0 ) );
510+ //
511+ // CURLOPT_SSL_VERIFYHOST is left default, value 2, meaning verify that the
512+ // Common Name or Subject Alternate Name field in the certificate matches
513+ // the name of the server.
514+ //
515+ // Tested with https://www.npr.org/rss/rss.php?id=1001
516+ // About using SSL certs in curl: http://curl.haxx.se/docs/sslcerts.html
517 #else
518 # ifdef WIN32
519- // set the root CA certificates file path
520- if ( GENV.g_curl_root_CA_certificates_path[0] )
521- ZORBA_CURL_ASSERT(
522- curl_easy_setopt(
523- curl, CURLOPT_CAINFO, GENV.g_curl_root_CA_certificates_path
524- )
525- );
526+ // set the root CA certificates file path
527+ if ( GENV.g_curl_root_CA_certificates_path[0] )
528+ ZORBA_CURL_ASSERT(
529+ curl_easy_setopt(
530+ curl, CURLOPT_CAINFO, GENV.g_curl_root_CA_certificates_path
531+ )
532+ );
533 # endif /* WIN32 */
534 #endif /* ZORBA_VERIFY_PEER_SSL_CERTIFICATE */
535-
536- //
537- // Some servers don't like requests that are made without a user-agent
538- // field, so we provide one.
539- //
540- ZORBA_CURL_ASSERT(
541- curl_easy_setopt( curl, CURLOPT_USERAGENT, "libcurl-agent/1.0" )
542- );
543-
544- return curl;
545- }
546- catch ( ... ) {
547- destroy( curl );
548- throw;
549- }
550- }
551-
552- void destroy( CURL *curl ) {
553- if ( curl ) {
554- curl_easy_reset( curl );
555- curl_easy_cleanup( curl );
556- }
557- }
558-
559- ///////////////////////////////////////////////////////////////////////////////
560-
561- streambuf::streambuf() : theInformer(0), theOwnInformer(false) {
562-#ifdef WIN32
563- theDummySocket = socket(AF_INET, SOCK_DGRAM, 0);
564- if (theDummySocket == CURL_SOCKET_BAD || theDummySocket == INVALID_SOCKET) {
565- std::cerr << "creating the socket failed" << std::endl;
566- }
567-#endif
568- init();
569- }
570-
571- streambuf::streambuf( char const *uri ) : theInformer(0), theOwnInformer(false) {
572-#ifdef WIN32
573- theDummySocket = socket(AF_INET, SOCK_DGRAM, 0);
574- if (theDummySocket == CURL_SOCKET_BAD || theDummySocket == INVALID_SOCKET) {
575- std::cerr << "creating the socket failed" << std::endl;
576- }
577-#endif
578- init();
579- open( uri );
580- }
581-
582- int streambuf::multi_perform() {
583- underflow();
584- CURLMsg* msg;
585- int msgInQueue;
586- int error = 0;
587- while ((msg = curl_multi_info_read(curlm_, &msgInQueue))) {
588- if (msg->msg == CURLMSG_DONE) {
589- error = msg->data.result;
590- }
591- }
592- return error;
593- }
594-
595- streambuf::streambuf( CURL* aCurl) : theInformer(0), theOwnInformer(false) {
596-#ifdef WIN32
597- theDummySocket = socket(AF_INET, SOCK_DGRAM, 0);
598- if (theDummySocket == CURL_SOCKET_BAD || theDummySocket == INVALID_SOCKET) {
599- std::cerr << "creating the socket failed" << std::endl;
600- }
601-#endif
602- init();
603- curl_ = aCurl;
604- ZORBA_CURL_ASSERT( curl_easy_setopt( aCurl, CURLOPT_WRITEDATA, this ) );
605- ZORBA_CURL_ASSERT( curl_easy_setopt( aCurl, CURLOPT_WRITEFUNCTION, curl_write_callback ) );
606-
607- init_curlm();
608- }
609-
610- streambuf::~streambuf() {
611- free( buf_ );
612- close();
613-#ifdef WIN32
614- closesocket(theDummySocket);
615-#endif
616- // If we have been assigned memory ownership of theInformer, delete it now.
617- if (theOwnInformer)
618- delete theInformer;
619- }
620-
621- void streambuf::close() {
622- if ( curl_ ) {
623- if ( curlm_ ) {
624- curl_multi_remove_handle( curlm_, curl_ );
625- curl_multi_cleanup( curlm_ );
626- curlm_ = 0;
627- }
628- destroy( curl_ );
629- curl_ = 0;
630- }
631- }
632-
633- void streambuf::curl_read() {
634- buf_len_ = 0;
635- while ( curl_running_ && !buf_len_ ) {
636- fd_set fd_read, fd_write, fd_except;
637- FD_ZERO( &fd_read );
638- FD_ZERO( &fd_write );
639- FD_ZERO( &fd_except );
640- int max_fd = -1;
641-#ifdef WIN32
642- // Windows does not like a call to select where all arguments are 0. So
643- // we just add a dummy socket to make the call to select happy.
644- FD_SET (theDummySocket, &fd_read);
645-#endif
646- ZORBA_CURLM_ASSERT(
647- curl_multi_fdset( curlm_, &fd_read, &fd_write, &fd_except, &max_fd )
648- );
649-
650- //
651- // Note that the fopen.c sample code is unnecessary at best or wrong at
652- // worst; see: http://curl.haxx.se/mail/lib-2011-05/0011.html
653- //
654- timeval timeout;
655- long curl_timeout_ms;
656- ZORBA_CURLM_ASSERT( curl_multi_timeout( curlm_, &curl_timeout_ms ) );
657- if ( curl_timeout_ms > 0 ) {
658- timeout.tv_sec = curl_timeout_ms / 1000;
659- timeout.tv_usec = curl_timeout_ms % 1000 * 1000;
660- } else {
661- //
662- // From curl_multi_timeout(3):
663- //
664- // Note: if libcurl returns a -1 timeout here, it just means that
665- // libcurl currently has no stored timeout value. You must not wait
666- // too long (more than a few seconds perhaps) before you call
667- // curl_multi_perform() again.
668- //
669- // So we just pick some not-too-long default.
670- //
671- timeout.tv_sec = 1;
672- timeout.tv_usec = 0;
673- }
674-
675- switch ( select( max_fd + 1, &fd_read, &fd_write, &fd_except, &timeout ) ) {
676- case -1: // select error
677-#ifdef WIN32
678- std::cout << "Error = " << WSAGetLastError() << std::endl;
679-#endif
680- throw exception( "select()", "" );
681- case 0: // timeout
682- // no break;
683- default:
684- CURLMcode code;
685- do {
686- code = curl_multi_perform( curlm_, &curl_running_ );
687- } while ( code == CURLM_CALL_MULTI_PERFORM );
688- ZORBA_CURLM_ASSERT( code );
689- }
690- }
691- if (theInformer) {
692- theInformer->afterRead();
693- }
694- }
695-
696- size_t streambuf::curl_write_callback( void *ptr, size_t size, size_t nmemb,
697- void *data ) {
698- size *= nmemb;
699- streambuf *const that = static_cast<streambuf*>( data );
700-
701- std::streamoff buf_free = that->buf_capacity_ - that->buf_len_;
702- if (that->theInformer) {
703- that->theInformer->beforeRead();
704- }
705- if ( size > buf_free ) {
706- std::streamoff new_capacity = that->buf_capacity_ + size - buf_free;
707- if ( void *const new_buf = realloc( that->buf_, static_cast<size_t>(new_capacity) ) ) {
708- that->buf_ = static_cast<char*>( new_buf );
709- that->buf_capacity_ = new_capacity;
710- } else
711- throw exception( "realloc()", "" );
712- }
713- ::memcpy( that->buf_ + that->buf_len_, ptr, size );
714- that->buf_len_ += size;
715- return size;
716- }
717-
718- void streambuf::init() {
719- buf_ = 0;
720- buf_capacity_ = 0;
721- buf_len_ = 0;
722- curl_ = 0;
723- curlm_ = 0;
724- curl_running_ = 0;
725- }
726-
727- void streambuf::init_curlm() {
728- //
729- // Lie about cURL running initially so the while-loop in curl_read() will run
730- // at least once.
731- //
732- curl_running_ = 1;
733-
734- //
735- // Set the "get" pointer to the end (gptr() == egptr()) so a call to
736- // underflow() and initial data read will be triggered.
737- //
738- buf_len_ = buf_capacity_;
739- setg( buf_, buf_ + buf_len_, buf_ + buf_capacity_ );
740-
741- //
742- // Clean-up has to be done here with try/catch (as opposed to relying on the
743- // destructor) because open() can be called from the constructor. If an
744- // exception is thrown, the constructor will not have completed, hence the
745- // object will not have been fully constructed; therefore the destructor will
746- // not be called.
747- //
748- try {
749- if ( !(curlm_ = curl_multi_init()) )
750- throw exception( "curl_multi_init()", "" );
751- try {
752- ZORBA_CURLM_ASSERT( curl_multi_add_handle( curlm_, curl_ ) );
753- }
754- catch ( ... ) {
755- curl_multi_cleanup( curlm_ );
756- curlm_ = 0;
757- throw;
758- }
759- }
760- catch ( ... ) {
761- destroy( curl_ );
762- curl_ = 0;
763- throw;
764- }
765- }
766-
767- void streambuf::open( char const *uri ) {
768- curl_ = create( uri, curl_write_callback, this );
769-
770- init_curlm();
771- }
772-
773- streamsize streambuf::showmanyc() {
774- return egptr() - gptr();
775- }
776-
777- streambuf::int_type streambuf::underflow() {
778- while ( true ) {
779- if ( gptr() < egptr() )
780- return traits_type::to_int_type( *gptr() );
781- curl_read();
782- if ( !buf_len_ )
783- return traits_type::eof();
784- setg( buf_, buf_, buf_ + buf_len_ );
785- }
786- }
787-
788- ///////////////////////////////////////////////////////////////////////////////
789-
790- } // namespace curl
791+
792+ //
793+ // Some servers don't like requests that are made without a user-agent
794+ // field, so we provide one.
795+ //
796+ ZORBA_CURL_ASSERT(
797+ curl_easy_setopt( curl, CURLOPT_USERAGENT, "libcurl-agent/1.0" )
798+ );
799+
800+ return curl;
801+ }
802+ catch ( ... ) {
803+ destroy( curl );
804+ throw;
805+ }
806+}
807+
808+void destroy( CURL *curl ) {
809+ if ( curl ) {
810+ curl_easy_reset( curl );
811+ curl_easy_cleanup( curl );
812+ }
813+}
814+
815+///////////////////////////////////////////////////////////////////////////////
816+
817+streambuf::streambuf() {
818+ init();
819+}
820+
821+streambuf::streambuf( char const *uri ) {
822+ init();
823+ open( uri );
824+}
825+
826+streambuf::streambuf( CURL *curl ) {
827+ init();
828+ curl_ = curl;
829+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEDATA, this ) );
830+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, curl_write_callback ) );
831+ init_curlm();
832+}
833+
834+streambuf::~streambuf() {
835+ free( buf_ );
836+ close();
837+#ifdef WIN32
838+ closesocket( dummy_socket_ );
839+#endif
840+ // If we have been assigned memory ownership of theInformer, delete it now.
841+ if ( theOwnInformer )
842+ delete theInformer;
843+}
844+
845+void streambuf::close() {
846+ if ( curl_ ) {
847+ if ( curlm_ ) {
848+ curl_multi_remove_handle( curlm_, curl_ );
849+ curl_multi_cleanup( curlm_ );
850+ curlm_ = 0;
851+ }
852+ destroy( curl_ );
853+ curl_ = 0;
854+ }
855+}
856+
857+void streambuf::curl_read() {
858+ buf_len_ = 0;
859+ while ( curl_running_ && !buf_len_ ) {
860+ fd_set fd_read, fd_write, fd_except;
861+ FD_ZERO( &fd_read );
862+ FD_ZERO( &fd_write );
863+ FD_ZERO( &fd_except );
864+ int max_fd = -1;
865+#ifdef WIN32
866+ //
867+ // Windows does not like a call to select where all arguments are 0, so we
868+ // just add a dummy socket to make the call to select happy.
869+ //
870+ FD_SET( dummy_socket_, &fd_read );
871+#endif /* WIN32 */
872+ ZORBA_CURLM_ASSERT(
873+ curl_multi_fdset( curlm_, &fd_read, &fd_write, &fd_except, &max_fd )
874+ );
875+
876+ //
877+ // Note that the fopen.c sample code is unnecessary at best or wrong at
878+ // worst; see: http://curl.haxx.se/mail/lib-2011-05/0011.html
879+ //
880+ timeval timeout;
881+ long curl_timeout_ms;
882+ ZORBA_CURLM_ASSERT( curl_multi_timeout( curlm_, &curl_timeout_ms ) );
883+ if ( curl_timeout_ms > 0 ) {
884+ timeout.tv_sec = curl_timeout_ms / 1000;
885+ timeout.tv_usec = curl_timeout_ms % 1000 * 1000;
886+ } else {
887+ //
888+ // From curl_multi_timeout(3):
889+ //
890+ // Note: if libcurl returns a -1 timeout here, it just means that
891+ // libcurl currently has no stored timeout value. You must not wait
892+ // too long (more than a few seconds perhaps) before you call
893+ // curl_multi_perform() again.
894+ //
895+ // So we just pick some not-too-long default.
896+ //
897+ timeout.tv_sec = 1;
898+ timeout.tv_usec = 0;
899+ }
900+
901+ switch ( select( max_fd + 1, &fd_read, &fd_write, &fd_except, &timeout ) ) {
902+ case -1: // select error
903+#ifdef WIN32
904+ char err_buf[8];
905+ sprintf( err_buf, "%d", WSAGetLastError() );
906+ throw exception( "select()", "", err_buf );
907+#else
908+ throw exception( "select()", "", strerror( errno ) );
909+#endif
910+ case 0: // timeout
911+ // no break;
912+ default:
913+ CURLMcode code;
914+ do {
915+ code = curl_multi_perform( curlm_, &curl_running_ );
916+ } while ( code == CURLM_CALL_MULTI_PERFORM );
917+ ZORBA_CURLM_ASSERT( code );
918+ }
919+ }
920+ if ( theInformer )
921+ theInformer->afterRead();
922+}
923+
924+size_t streambuf::curl_write_callback( void *ptr, size_t size, size_t nmemb,
925+ void *data ) {
926+ size *= nmemb;
927+ streambuf *const that = static_cast<streambuf*>( data );
928+
929+ if ( that->theInformer )
930+ that->theInformer->beforeRead();
931+
932+ size_t const buf_free = that->buf_capacity_ - that->buf_len_;
933+ if ( size > buf_free ) {
934+ streamoff new_capacity = that->buf_capacity_ + size - buf_free;
935+ if ( void *const new_buf =
936+ realloc( that->buf_, static_cast<size_t>( new_capacity ) ) ) {
937+ that->buf_ = static_cast<char*>( new_buf );
938+ that->buf_capacity_ = new_capacity;
939+ } else
940+ throw exception( "realloc()", "" );
941+ }
942+ ::memcpy( that->buf_ + that->buf_len_, ptr, size );
943+ that->buf_len_ += size;
944+ return size;
945+}
946+
947+void streambuf::init() {
948+ buf_ = 0;
949+ buf_capacity_ = 0;
950+ buf_len_ = 0;
951+ curl_ = 0;
952+ curlm_ = 0;
953+ curl_running_ = 0;
954+ theInformer = 0;
955+ theOwnInformer = false;
956+#ifdef WIN32
957+ dummy_socket_ = socket( AF_INET, SOCK_DGRAM, 0 );
958+ if ( dummy_socket_ == CURL_SOCKET_BAD || dummy_socket_ == INVALID_SOCKET )
959+ throw exception( "socket()", "" );
960+#endif /* WIN32 */
961+}
962+
963+void streambuf::init_curlm() {
964+ //
965+ // Lie about cURL running initially so the while-loop in curl_read() will run
966+ // at least once.
967+ //
968+ curl_running_ = 1;
969+
970+ //
971+ // Set the "get" pointer to the end (gptr() == egptr()) so a call to
972+ // underflow() and initial data read will be triggered.
973+ //
974+ buf_len_ = buf_capacity_;
975+ setg( buf_, buf_ + buf_len_, buf_ + buf_capacity_ );
976+
977+ //
978+ // Clean-up has to be done here with try/catch (as opposed to relying on the
979+ // destructor) because open() can be called from the constructor. If an
980+ // exception is thrown, the constructor will not have completed, hence the
981+ // object will not have been fully constructed; therefore the destructor will
982+ // not be called.
983+ //
984+ try {
985+ if ( !(curlm_ = curl_multi_init()) )
986+ throw exception( "curl_multi_init()", "" );
987+ try {
988+ ZORBA_CURLM_ASSERT( curl_multi_add_handle( curlm_, curl_ ) );
989+ }
990+ catch ( ... ) {
991+ curl_multi_cleanup( curlm_ );
992+ curlm_ = 0;
993+ throw;
994+ }
995+ }
996+ catch ( ... ) {
997+ destroy( curl_ );
998+ curl_ = 0;
999+ throw;
1000+ }
1001+}
1002+
1003+int streambuf::multi_perform() {
1004+ underflow();
1005+ CURLMsg *msg;
1006+ int msgInQueue;
1007+ int error = 0;
1008+ while ( (msg = curl_multi_info_read( curlm_, &msgInQueue )) ) {
1009+ if ( msg->msg == CURLMSG_DONE )
1010+ error = msg->data.result;
1011+ }
1012+ return error;
1013+}
1014+
1015+void streambuf::open( char const *uri ) {
1016+ curl_ = create( uri, curl_write_callback, this );
1017+
1018+ init_curlm();
1019+}
1020+
1021+streamsize streambuf::showmanyc() {
1022+ return egptr() - gptr();
1023+}
1024+
1025+streambuf::int_type streambuf::underflow() {
1026+ while ( true ) {
1027+ if ( gptr() < egptr() )
1028+ return traits_type::to_int_type( *gptr() );
1029+ curl_read();
1030+ if ( !buf_len_ )
1031+ return traits_type::eof();
1032+ setg( buf_, buf_, buf_ + buf_len_ );
1033+ }
1034+}
1035+
1036+///////////////////////////////////////////////////////////////////////////////
1037+
1038+} // namespace curl
1039 } // namespace zorba
1040+/* vim:set et sw=2 ts=2: */
1041
1042=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.h'
1043--- modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.h 2011-07-29 08:12:36 +0000
1044+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.h 2012-02-16 02:19:18 +0000
1045@@ -19,154 +19,175 @@
1046
1047 #include <zorba/config.h>
1048
1049+#include <exception>
1050 #include <istream>
1051-#include <exception>
1052 #include <streambuf>
1053+#include <string>
1054 #include <curl/curl.h>
1055
1056 namespace zorba {
1057-
1058- namespace http_client {
1059- class InformDataRead;
1060- }
1061-
1062- namespace curl {
1063-
1064- class exception : public std::exception {
1065- public:
1066- exception( char const *function, char const *uri, char const *msg = 0 );
1067- exception( char const *function, char const *uri, CURLcode code );
1068- exception( char const *function, char const *uri, CURLMcode code );
1069- public:
1070- virtual const char* what() const throw();
1071- private:
1072- const char* theMessage;
1073- };
1074-
1075-
1076-
1077- ////////// create & destroy ///////////////////////////////////////////////////
1078-
1079- /**
1080- * The signature type of cURL's write function callback.
1081- */
1082- typedef size_t (*write_fn_t)( void*, size_t, size_t, void* );
1083-
1084- /**
1085- * Creates a new, initialized cURL instance.
1086- *
1087- * @throws exception upon failure.
1088- */
1089- CURL* create( char const *uri, write_fn_t fn, void *data );
1090-
1091- /**
1092- * Destroys a cURL instance.
1093- *
1094- * @param instance A cURL instance. If \c NULL, does nothing.
1095- */
1096- void destroy( CURL *instance );
1097-
1098- ////////// streambuf //////////////////////////////////////////////////////////
1099-
1100- /**
1101- * A curl::streambuf is-a std::streambuf for streaming the contents of URI
1102- * using cURL. However, do not use this class directly. Use uri::streambuf
1103- * instead.
1104- */
1105- class streambuf : public std::streambuf {
1106- public:
1107- /**
1108- * Constructs a %streambuf.
1109- */
1110- streambuf();
1111-
1112- /**
1113- * Constructs a %streambuf and opens a connection to the server hosting the
1114- * given URI for subsequent streaming.
1115- *
1116- * @param uri The URI to stream.
1117- */
1118- streambuf( char const *uri );
1119-
1120- /**
1121- * In case we already have a curl object, which was set up somewhere else, we
1122- * take it here as an arument. This takes ownership over the object.
1123- */
1124- streambuf( CURL* aCurl );
1125-
1126- /**
1127- * Destroys a %streambuf.
1128- */
1129- ~streambuf();
1130-
1131- /**
1132- * Opens a connection to the server hosting the given URI for subsequent
1133- * streaming.
1134- *
1135- * @param uri The URI to stream.
1136- * @throws exception upon failure.
1137- */
1138- void open( char const *uri );
1139-
1140- /**
1141- * Tests whether the buffer is open.
1142- *
1143- * @return Returns \c true only if the buffer is open.
1144- */
1145- bool is_open() const {
1146- return !!curl_;
1147- }
1148-
1149- /**
1150- * Closes this %streambuf.
1151- */
1152- void close();
1153-
1154- /**
1155- * Provide a InformDataRead that will get callbacks about read events.
1156- */
1157- void setInformer(::zorba::http_client::InformDataRead* aInformer) { theInformer = aInformer; }
1158-
1159- /**
1160- * Specify whether this streambuf has memory ownership over the
1161- * InformDataRead it has been passed. You can use this if, for example,
1162- * the lifetime of the streambuf will extend past the lifetime of the
1163- * object which created the InformDataRead.
1164- */
1165- void setOwnInformer(bool aOwnInformer) { theOwnInformer = aOwnInformer; }
1166-
1167- int multi_perform();
1168-
1169- protected:
1170- // inherited
1171- std::streamsize showmanyc();
1172- int_type underflow();
1173-
1174- private:
1175- void curl_read();
1176- static size_t curl_write_callback( void*, size_t, size_t, void* );
1177-
1178- void init();
1179- void init_curlm();
1180-
1181- char *buf_;
1182- std::streamsize buf_capacity_;
1183- std::streamoff buf_len_;
1184-
1185- CURL *curl_;
1186- CURLM *curlm_;
1187- int curl_running_;
1188- ::zorba::http_client::InformDataRead* theInformer;
1189- bool theOwnInformer;
1190-
1191- // forbid
1192- streambuf( streambuf const& );
1193- streambuf& operator=( streambuf const& );
1194+
1195+namespace http_client {
1196+ class InformDataRead;
1197+}
1198+
1199+namespace curl {
1200+
1201+///////////////////////////////////////////////////////////////////////////////
1202+
1203+class exception : public std::exception {
1204+public:
1205+ exception( char const *function, char const *uri, char const *msg = 0 );
1206+ exception( char const *function, char const *uri, CURLcode code );
1207+ exception( char const *function, char const *uri, CURLMcode code );
1208+ ~exception() throw();
1209+
1210+ virtual const char* what() const throw();
1211+
1212+private:
1213+ std::string msg_;
1214+};
1215+
1216+////////// create & destroy ///////////////////////////////////////////////////
1217+
1218+/**
1219+ * The signature type of cURL's write function callback.
1220+ */
1221+typedef size_t (*write_fn_t)( void*, size_t, size_t, void* );
1222+
1223+/**
1224+ * Creates a new, initialized cURL instance.
1225+ *
1226+ * @throws exception upon failure.
1227+ */
1228+CURL* create( char const *uri, write_fn_t fn, void *data );
1229+
1230+/**
1231+ * Destroys a cURL instance.
1232+ *
1233+ * @param instance A cURL instance. If \c NULL, does nothing.
1234+ */
1235+void destroy( CURL *instance );
1236+
1237+////////// streambuf //////////////////////////////////////////////////////////
1238+
1239+/**
1240+ * A curl::streambuf is-a std::streambuf for streaming the contents of URI
1241+ * using cURL. However, do not use this class directly. Use uri::streambuf
1242+ * instead.
1243+ */
1244+class streambuf : public std::streambuf {
1245+public:
1246+ /**
1247+ * Constructs a %streambuf.
1248+ */
1249+ streambuf();
1250+
1251+ /**
1252+ * Constructs a %streambuf and opens a connection to the server hosting the
1253+ * given URI for subsequent streaming.
1254+ *
1255+ * @param uri The URI to stream.
1256+ */
1257+ streambuf( char const *uri );
1258+
1259+ /**
1260+ * Constructs a %streambuf using an existing CURL object.
1261+ *
1262+ * @param curl The CURL object to use. This %streambuf takes ownership of
1263+ * it.
1264+ */
1265+ streambuf( CURL *curl );
1266+
1267+ /**
1268+ * Destroys a %streambuf.
1269+ */
1270+ ~streambuf();
1271+
1272+ /**
1273+ * Opens a connection to the server hosting the given URI for subsequent
1274+ * streaming.
1275+ *
1276+ * @param uri The URI to stream.
1277+ * @throws exception upon failure.
1278+ */
1279+ void open( char const *uri );
1280+
1281+ /**
1282+ * Tests whether the buffer is open.
1283+ *
1284+ * @return Returns \c true only if the buffer is open.
1285+ */
1286+ bool is_open() const {
1287+ return !!curl_;
1288+ }
1289+
1290+ /**
1291+ * Closes this %streambuf.
1292+ */
1293+ void close();
1294+
1295+ /**
1296+ * Gets the CURL object in use.
1297+ *
1298+ * @return Return said CURL object.
1299+ */
1300+ CURL* curl() const {
1301+ return curl_;
1302+ }
1303+
1304+ /**
1305+ * Provide a InformDataRead that will get callbacks about read events.
1306+ */
1307+ void setInformer( http_client::InformDataRead *aInformer ) {
1308+ theInformer = aInformer;
1309+ }
1310+
1311+ /**
1312+ * Specify whether this streambuf has memory ownership over the
1313+ * InformDataRead it has been passed. You can use this if, for example,
1314+ * the lifetime of the streambuf will extend past the lifetime of the
1315+ * object which created the InformDataRead.
1316+ */
1317+ void setOwnInformer( bool aOwnInformer ) {
1318+ theOwnInformer = aOwnInformer;
1319+ }
1320+
1321+ int multi_perform();
1322+
1323+protected:
1324+ // inherited
1325+ std::streamsize showmanyc();
1326+ int_type underflow();
1327+
1328+private:
1329+ void curl_read();
1330+ static size_t curl_write_callback( void*, size_t, size_t, void* );
1331+
1332+ void init();
1333+ void init_curlm();
1334+
1335+ char *buf_;
1336+ std::streamsize buf_capacity_;
1337+ std::streamoff buf_len_;
1338+
1339+ CURL *curl_;
1340+ CURLM *curlm_;
1341+ int curl_running_;
1342+ http_client::InformDataRead *theInformer;
1343+ bool theOwnInformer;
1344+
1345+ // forbid
1346+ streambuf( streambuf const& );
1347+ streambuf& operator=( streambuf const& );
1348 #ifdef WIN32
1349- SOCKET theDummySocket;
1350-#endif
1351- };
1352-
1353- } // namespace curl
1354+ SOCKET dummy_socket_;
1355+#endif /* WIN32 */
1356+};
1357+
1358+///////////////////////////////////////////////////////////////////////////////
1359+
1360+} // namespace curl
1361 } // namespace zorba
1362 #endif /* ZORBA_CURL_UTIL_H */
1363+/* vim:set et sw=2 ts=2: */
1364
1365=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.cpp'
1366--- modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.cpp 2011-07-29 08:12:36 +0000
1367+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.cpp 2012-02-16 02:19:18 +0000
1368@@ -26,12 +26,44 @@
1369 #include <zorba/error.h>
1370 #include <zorba/xquery_exception.h>
1371 #include <zorba/xquery_functions.h>
1372+#include <zorba/transcode_stream.h>
1373
1374 #include "http_response_parser.h"
1375 #include "http_request_handler.h"
1376 #include "curl_stream_buffer.h"
1377
1378-namespace zorba { namespace http_client {
1379+namespace zorba {
1380+
1381+static bool parse_content_type( std::string const &s, std::string *mime_type,
1382+ std::string *charset ) {
1383+ std::string::size_type pos = s.find( ';' );
1384+ *mime_type = s.substr( 0, pos );
1385+
1386+ if ( pos != std::string::npos ) {
1387+ //
1388+ // Parse: charset="?XXXXX"?[ (comment)]
1389+ //
1390+ if ( (pos = s.find( '=' )) != std::string::npos ) {
1391+ std::string t = s.substr( pos + 1 );
1392+ if ( !t.empty() ) {
1393+ if ( t[0] == '"' ) {
1394+ t.erase( 0, 1 );
1395+ if ( (pos = t.find( '"' )) != std::string::npos )
1396+ t.erase( pos );
1397+ } else {
1398+ if ( (pos = t.find( ' ' )) != std::string::npos )
1399+ t.erase( pos );
1400+ }
1401+ *charset = t;
1402+ }
1403+ }
1404+ } else {
1405+ // The HTTP/1.1 spec says that the default charset is ISO-8859-1.
1406+ *charset = "ISO-8859-1";
1407+ }
1408+}
1409+
1410+namespace http_client {
1411
1412 HttpResponseParser::HttpResponseParser(RequestHandler& aHandler, CURL* aCurl,
1413 ErrorThrower& aErrorThrower,
1414@@ -60,19 +92,30 @@
1415 if (lCode)
1416 return lCode;
1417 if (!theStatusOnly) {
1418- std::auto_ptr<std::istream> lStream(new std::istream(theStreamBuffer));
1419+
1420+ if (!theOverridenContentType.empty()) {
1421+ parse_content_type(
1422+ theOverridenContentType, &theCurrentContentType, &theCurrentCharset
1423+ );
1424+ }
1425+
1426+ std::auto_ptr<std::istream> lStream;
1427+ if ( transcode::is_necessary( theCurrentCharset.c_str() ) ) {
1428+ lStream.reset(
1429+ new transcode::stream<std::istream>(
1430+ theCurrentCharset.c_str(), theStreamBuffer
1431+ )
1432+ );
1433+ } else
1434+ lStream.reset(new std::istream(theStreamBuffer));
1435+
1436 Item lItem;
1437- if (theOverridenContentType != "") {
1438- theCurrentContentType = theOverridenContentType;
1439- }
1440 if (theCurrentContentType == "text/xml" ||
1441 theCurrentContentType == "application/xml" ||
1442 theCurrentContentType == "text/xml-external-parsed-entity" ||
1443 theCurrentContentType == "application/xml-external-parsed-entity" ||
1444 theCurrentContentType.find("+xml") == theCurrentContentType.size()-4) {
1445 lItem = createXmlItem(*lStream.get());
1446- } else if (theCurrentContentType.find("text/html") == 0) {
1447- lItem = createTextItem(lStream.release());
1448 } else if (theCurrentContentType.find("text/") == 0) {
1449 lItem = createTextItem(lStream.release());
1450 } else {
1451@@ -106,8 +149,8 @@
1452 }
1453 theInsideRead = true;
1454 theHandler.beginResponse(theStatus, theMessage);
1455- std::vector<std::pair<std::string, std::string> >::iterator lIter;
1456- for (lIter = theHeaders.begin(); lIter != theHeaders.end(); ++lIter) {
1457+ for ( headers_type::const_iterator
1458+ lIter = theHeaders.begin(); lIter != theHeaders.end(); ++lIter) {
1459 theHandler.header(lIter->first, lIter->second);
1460 }
1461 if (!theStatusOnly)
1462@@ -120,23 +163,20 @@
1463
1464 void HttpResponseParser::registerHandler()
1465 {
1466- curl_easy_setopt(theCurl, CURLOPT_HEADERFUNCTION,
1467- &HttpResponseParser::headerfunction);
1468+ curl_easy_setopt(theCurl, CURLOPT_HEADERFUNCTION, &curl_headerfunction);
1469 curl_easy_setopt(theCurl, CURLOPT_HEADERDATA, this);
1470 }
1471
1472- size_t HttpResponseParser::headerfunction(void *ptr,
1473- size_t size,
1474- size_t nmemb,
1475- void *stream)
1476+ size_t HttpResponseParser::curl_headerfunction( void *ptr, size_t size,
1477+ size_t nmemb, void *data )
1478 {
1479 size_t lSize = size*nmemb;
1480 size_t lResult = lSize;
1481- HttpResponseParser* lParser = static_cast<HttpResponseParser*>(stream);
1482+ HttpResponseParser* lParser = static_cast<HttpResponseParser*>(data);
1483 if (lParser->theInsideRead) {
1484 lParser->theHandler.endBody();
1485+ lParser->theInsideRead = false;
1486 }
1487- lParser->theInsideRead = false;
1488 const char* lDataChar = (const char*) ptr;
1489 while (lSize != 0 && (lDataChar[lSize - 1] == 10
1490 || lDataChar[lSize - 1] == 13)) {
1491@@ -173,7 +213,9 @@
1492 }
1493 String lNameS = fn::lower_case( lName );
1494 if (lNameS == "content-type") {
1495- lParser->theCurrentContentType = lValue.substr(0, lValue.find(';'));
1496+ parse_content_type(
1497+ lValue, &lParser->theCurrentContentType, &lParser->theCurrentCharset
1498+ );
1499 } else if (lNameS == "content-id") {
1500 lParser->theId = lValue;
1501 } else if (lNameS == "content-description") {
1502@@ -184,7 +226,7 @@
1503 return lResult;
1504 }
1505
1506- void HttpResponseParser::parseStatusAndMessage(std::string aHeader)
1507+ void HttpResponseParser::parseStatusAndMessage(std::string const &aHeader)
1508 {
1509 std::string::size_type lPos = aHeader.find(' ');
1510 assert(lPos != std::string::npos);
1511@@ -215,7 +257,12 @@
1512 static void streamReleaser(std::istream* aStream)
1513 {
1514 // This istream contains our curl stream buffer, so we have to delete it too
1515- delete aStream->rdbuf();
1516+ std::streambuf *const sbuf = aStream->rdbuf();
1517+ if ( transcode::streambuf *tbuf =
1518+ dynamic_cast<transcode::streambuf*>( sbuf ) )
1519+ delete tbuf->orig_streambuf();
1520+ else
1521+ delete sbuf;
1522 delete aStream;
1523 }
1524
1525@@ -265,4 +312,7 @@
1526 return Item();
1527 }
1528 }
1529-}}
1530+
1531+} // namespace http_client
1532+} // namespace zorba
1533+/* vim:set et sw=2 ts=2: */
1534
1535=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.h'
1536--- modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.h 2011-07-29 08:12:36 +0000
1537+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.h 2012-02-16 02:19:18 +0000
1538@@ -31,6 +31,7 @@
1539 namespace curl {
1540 class streambuf;
1541 }
1542+
1543 namespace http_client {
1544 class RequestHandler;
1545
1546@@ -40,7 +41,9 @@
1547 CURL* theCurl;
1548 ErrorThrower& theErrorThrower;
1549 std::string theCurrentContentType;
1550- std::vector<std::pair<std::string, std::string> > theHeaders;
1551+ std::string theCurrentCharset;
1552+ typedef std::vector<std::pair<std::string, std::string> > headers_type;
1553+ headers_type theHeaders;
1554 int theStatus;
1555 std::string theMessage;
1556 zorba::curl::streambuf* theStreamBuffer;
1557@@ -74,15 +77,16 @@
1558 virtual void afterRead();
1559 private:
1560 void registerHandler();
1561- void parseStatusAndMessage(std::string aHeader);
1562+ void parseStatusAndMessage(std::string const &aHeader);
1563 Item createXmlItem(std::istream& aStream);
1564 Item createHtmlItem(std::istream& aStream);
1565 Item createTextItem(std::istream* aStream);
1566 Item createBase64Item(std::istream& aStream);
1567- public: //Handler
1568- static size_t headerfunction( void *ptr, size_t size, size_t nmemb,
1569- void *stream);
1570+
1571+ static size_t curl_headerfunction( void*, size_t, size_t, void* );
1572 };
1573-}} // namespace zorba, http_client
1574+
1575+} // namespace http_client
1576+} // namespace zorba
1577
1578 #endif //HTTP_RESPONSE_PARSER_H
1579
1580=== modified file 'modules/com/zorba-xquery/www/modules/pregenerated/errors.xq'
1581--- modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2012-01-26 01:35:11 +0000
1582+++ modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2012-02-16 02:19:18 +0000
1583@@ -81,6 +81,10 @@
1584
1585 (:~
1586 :)
1587+declare variable $zerr:ZXQP0006 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP0006");
1588+
1589+(:~
1590+:)
1591 declare variable $zerr:ZXQP0007 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP0007");
1592
1593 (:~
1594@@ -664,6 +668,10 @@
1595
1596 (:~
1597 :)
1598+declare variable $zerr:ZOSE0006 as xs:QName := fn:QName($zerr:NS, "zerr:ZOSE0006");
1599+
1600+(:~
1601+:)
1602 declare variable $zerr:ZSTR0001 as xs:QName := fn:QName($zerr:NS, "zerr:ZSTR0001");
1603
1604 (:~
1605
1606=== modified file 'modules/org/expath/ns/file.xq.src/file.cpp'
1607--- modules/org/expath/ns/file.xq.src/file.cpp 2011-07-22 08:12:31 +0000
1608+++ modules/org/expath/ns/file.xq.src/file.cpp 2012-02-16 02:19:18 +0000
1609@@ -28,6 +28,7 @@
1610 #include <zorba/singleton_item_sequence.h>
1611 #include <zorba/util/path.h>
1612 #include <zorba/user_exception.h>
1613+#include <zorba/transcode_stream.h>
1614
1615 #include "file_module.h"
1616
1617@@ -188,6 +189,7 @@
1618 {
1619 String lFileStr = getFilePathString(aArgs, 0);
1620 File_t lFile = File::createFile(lFileStr.c_str());
1621+ String lEncoding("UTF-8");
1622
1623 // preconditions
1624 if (!lFile->exists()) {
1625@@ -198,18 +200,30 @@
1626 }
1627
1628 if (aArgs.size() == 2) {
1629- // since Zorba currently only supports UTF-8 we only call this function
1630- // to reject any other encoding requested bu the user
1631- getEncodingArg(aArgs, 1);
1632+ lEncoding = getEncodingArg(aArgs, 1);
1633 }
1634
1635- std::auto_ptr<StreamableItemSequence> lSeq(new StreamableItemSequence());
1636- lFile->openInputStream(*lSeq->theStream, false, true);
1637-
1638- lSeq->theItem = theModule->getItemFactory()->createStreamableString(
1639- *lSeq->theStream, &StreamableItemSequence::streamReleaser);
1640-
1641- return ItemSequence_t(lSeq.release());
1642+ zorba::Item lResult;
1643+ std::unique_ptr<std::ifstream> lInStream;
1644+ if ( transcode::is_necessary( lEncoding.c_str() ) )
1645+ {
1646+ try {
1647+ lInStream.reset( new transcode::stream<std::ifstream>(lEncoding.c_str()) );
1648+ } catch (std::invalid_argument const& e)
1649+ {
1650+ raiseFileError("FOFL0006", "Unsupported encoding", lEncoding.c_str());
1651+ }
1652+ }
1653+ else
1654+ {
1655+ lInStream.reset( new std::ifstream() );
1656+ }
1657+ lFile->openInputStream(*lInStream.get(), false, true);
1658+ lResult = theModule->getItemFactory()->createStreamableString(
1659+ *lInStream.release(), &FileModule::streamReleaser
1660+ );
1661+ return ItemSequence_t(new SingletonItemSequence(lResult));
1662+
1663 }
1664
1665 //*****************************************************************************
1666@@ -722,3 +736,4 @@
1667 extern "C" DLL_EXPORT zorba::ExternalModule* createModule() {
1668 return new zorba::filemodule::FileModule();
1669 }
1670+/* vim:set et sw=2 ts=2: */
1671
1672=== modified file 'modules/org/expath/ns/file.xq.src/file_function.cpp'
1673--- modules/org/expath/ns/file.xq.src/file_function.cpp 2011-07-13 01:56:45 +0000
1674+++ modules/org/expath/ns/file.xq.src/file_function.cpp 2012-02-16 02:19:18 +0000
1675@@ -141,11 +141,6 @@
1676 arg_iter->close();
1677 }
1678
1679- if (!(lEncoding == "UTF-8" || lEncoding == "UTF8")) {
1680- // the rest are not supported encodings
1681- raiseFileError("FOFL0006", "Unsupported encoding", lEncoding.c_str());
1682- }
1683-
1684 return lEncoding;
1685 }
1686
1687
1688=== modified file 'modules/org/expath/ns/file.xq.src/file_function.h'
1689--- modules/org/expath/ns/file.xq.src/file_function.h 2011-07-22 08:12:31 +0000
1690+++ modules/org/expath/ns/file.xq.src/file_function.h 2012-02-16 02:19:18 +0000
1691@@ -25,7 +25,9 @@
1692
1693 #include <fstream>
1694
1695-namespace zorba { namespace filemodule {
1696+namespace zorba {
1697+
1698+ namespace filemodule {
1699
1700 class FileModule;
1701
1702@@ -136,18 +138,12 @@
1703 next(Item& aResult);
1704 };
1705
1706- Item theItem;
1707- std::ifstream* theStream;
1708+ Item theItem;
1709+ std::ifstream* theStream;
1710
1711 StreamableItemSequence()
1712 : theStream(new std::ifstream()) {}
1713
1714- static void
1715- streamReleaser(std::istream* stream)
1716- {
1717- delete stream;
1718- }
1719-
1720 Iterator_t getIterator()
1721 {
1722 return new InternalIterator(this);
1723
1724=== modified file 'modules/org/expath/ns/file.xq.src/file_module.cpp'
1725--- modules/org/expath/ns/file.xq.src/file_module.cpp 2011-06-08 18:37:56 +0000
1726+++ modules/org/expath/ns/file.xq.src/file_module.cpp 2012-02-16 02:19:18 +0000
1727@@ -17,11 +17,10 @@
1728 #include "file.h"
1729 #include "file_module.h"
1730 #include "file_function.h"
1731+#include <cassert>
1732
1733 namespace zorba { namespace filemodule {
1734
1735- ItemFactory* FileModule::theFactory = 0;
1736-
1737 const char* FileModule::theNamespace = "http://expath.org/ns/file";
1738
1739
1740@@ -39,9 +38,7 @@
1741 {
1742 ExternalFunction*& lFunc = theFunctions[aLocalname];
1743 if (!lFunc) {
1744- if (1 == 0) {
1745-
1746- } else if (aLocalname == "create-directory") {
1747+ if (aLocalname == "create-directory") {
1748 lFunc = new CreateDirectoryFunction(this);
1749 } else if (aLocalname == "delete-file-impl") {
1750 lFunc = new DeleteFileImplFunction(this);
1751
1752=== modified file 'modules/org/expath/ns/file.xq.src/file_module.h'
1753--- modules/org/expath/ns/file.xq.src/file_module.h 2011-06-08 18:37:56 +0000
1754+++ modules/org/expath/ns/file.xq.src/file_module.h 2012-02-16 02:19:18 +0000
1755@@ -27,7 +27,7 @@
1756 class FileModule : public ExternalModule
1757 {
1758 private:
1759- static ItemFactory* theFactory;
1760+ mutable ItemFactory* theFactory;
1761
1762 public:
1763 static const char* theNamespace;
1764@@ -43,10 +43,17 @@
1765 };
1766
1767 typedef std::map<String, ExternalFunction*, ltstr> FuncMap_t;
1768-
1769 FuncMap_t theFunctions;
1770-
1771+
1772 public:
1773+ static void
1774+ streamReleaser(std::istream* stream)
1775+ {
1776+ delete stream;
1777+ }
1778+
1779+ FileModule() : theFactory(0) {}
1780+
1781 virtual ~FileModule();
1782
1783 virtual String
1784@@ -58,10 +65,10 @@
1785 virtual void
1786 destroy();
1787
1788- static ItemFactory*
1789- getItemFactory()
1790+ ItemFactory*
1791+ getItemFactory() const
1792 {
1793- if(!theFactory)
1794+ if (!theFactory)
1795 {
1796 theFactory = Zorba::getInstance(0)->getItemFactory();
1797 }
1798
1799=== modified file 'src/api/CMakeLists.txt'
1800--- src/api/CMakeLists.txt 2011-08-31 13:17:59 +0000
1801+++ src/api/CMakeLists.txt 2012-02-16 02:19:18 +0000
1802@@ -55,6 +55,7 @@
1803 zorba_functions.cpp
1804 annotationimpl.cpp
1805 auditimpl.cpp
1806+ transcode_streambuf.cpp
1807 )
1808
1809 IF (NOT ZORBA_NO_FULL_TEXT)
1810
1811=== added file 'src/api/transcode_streambuf.cpp'
1812--- src/api/transcode_streambuf.cpp 1970-01-01 00:00:00 +0000
1813+++ src/api/transcode_streambuf.cpp 2012-02-16 02:19:18 +0000
1814@@ -0,0 +1,102 @@
1815+/*
1816+ * Copyright 2006-2008 The FLWOR Foundation.
1817+ *
1818+ * Licensed under the Apache License, Version 2.0 (the "License");
1819+ * you may not use this file except in compliance with the License.
1820+ * You may obtain a copy of the License at
1821+ *
1822+ * http://www.apache.org/licenses/LICENSE-2.0
1823+ *
1824+ * Unless required by applicable law or agreed to in writing, software
1825+ * distributed under the License is distributed on an "AS IS" BASIS,
1826+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1827+ * See the License for the specific language governing permissions and
1828+ * limitations under the License.
1829+ */
1830+
1831+#include <zorba/transcode_stream.h>
1832+
1833+#include "util/transcode_streambuf.h"
1834+
1835+using namespace std;
1836+
1837+namespace zorba {
1838+namespace transcode {
1839+
1840+///////////////////////////////////////////////////////////////////////////////
1841+
1842+streambuf::streambuf( char const *charset, std::streambuf *orig ) :
1843+ proxy_buf_( new internal::transcode::streambuf( charset, orig ) )
1844+{
1845+}
1846+
1847+streambuf::~streambuf() {
1848+ // out-of-line since it's virtual
1849+}
1850+
1851+void streambuf::imbue( std::locale const &loc ) {
1852+ proxy_buf_->pubimbue( loc );
1853+}
1854+
1855+streambuf::pos_type streambuf::seekoff( off_type o, ios_base::seekdir d,
1856+ ios_base::openmode m ) {
1857+ return proxy_buf_->pubseekoff( o, d, m );
1858+}
1859+
1860+streambuf::pos_type streambuf::seekpos( pos_type p, ios_base::openmode m ) {
1861+ return proxy_buf_->pubseekpos( p, m );
1862+}
1863+
1864+std::streambuf* streambuf::setbuf( char_type *p, streamsize s ) {
1865+ proxy_buf_->pubsetbuf( p, s );
1866+ return this;
1867+}
1868+
1869+streamsize streambuf::showmanyc() {
1870+ return proxy_buf_->in_avail();
1871+}
1872+
1873+int streambuf::sync() {
1874+ return proxy_buf_->pubsync();
1875+}
1876+
1877+streambuf::int_type streambuf::overflow( int_type c ) {
1878+ return proxy_buf_->sputc( c );
1879+}
1880+
1881+streambuf::int_type streambuf::pbackfail( int_type c ) {
1882+ return proxy_buf_->sputbackc( traits_type::to_char_type( c ) );
1883+}
1884+
1885+streambuf::int_type streambuf::uflow() {
1886+ return proxy_buf_->sbumpc();
1887+}
1888+
1889+streambuf::int_type streambuf::underflow() {
1890+ return proxy_buf_->sgetc();
1891+}
1892+
1893+streamsize streambuf::xsgetn( char_type *to, streamsize size ) {
1894+ return proxy_buf_->sgetn( to, size );
1895+}
1896+
1897+streamsize streambuf::xsputn( char_type const *from,
1898+ streamsize size ) {
1899+ return proxy_buf_->sputn( from, size );
1900+}
1901+
1902+///////////////////////////////////////////////////////////////////////////////
1903+
1904+bool is_necessary( char const *charset ) {
1905+ return internal::transcode::streambuf::is_necessary( charset );
1906+}
1907+
1908+bool is_supported( char const *charset ) {
1909+ return internal::transcode::streambuf::is_supported( charset );
1910+}
1911+
1912+///////////////////////////////////////////////////////////////////////////////
1913+
1914+} // namespace transcode
1915+} // namespace zorba
1916+/* vim:set et sw=2 ts=2: */
1917
1918=== modified file 'src/diagnostics/diagnostic_en.xml'
1919--- src/diagnostics/diagnostic_en.xml 2012-02-16 00:52:25 +0000
1920+++ src/diagnostics/diagnostic_en.xml 2012-02-16 02:19:18 +0000
1921@@ -1581,6 +1581,10 @@
1922 <value>"$1": feature not enabled</value>
1923 </diagnostic>
1924
1925+ <diagnostic code="ZXQP0006" name="UNKNOWN_ENCODING">
1926+ <value>"$1": unknown character encoding</value>
1927+ </diagnostic>
1928+
1929 <diagnostic code="ZXQP0007" name="FUNCTION_SIGNATURE_NOT_EQUAL">
1930 <value>"$1": function signature does not match declaration</value>
1931 </diagnostic>
1932@@ -2193,6 +2197,10 @@
1933 <value>"$1": error loading dynamic library${: 2}</value>
1934 </diagnostic>
1935
1936+ <diagnostic code="ZOSE0006" name="TRANSCODING_ERROR">
1937+ <value>stream transcoding error ($1)</value>
1938+ </diagnostic>
1939+
1940 <!--////////// Zorba Store Errors //////////////////////////////////////-->
1941
1942 <diagnostic code="ZSTR0001" name="INDEX_ALREADY_EXISTS">
1943
1944=== modified file 'src/diagnostics/pregenerated/diagnostic_list.cpp'
1945--- src/diagnostics/pregenerated/diagnostic_list.cpp 2012-01-26 01:35:11 +0000
1946+++ src/diagnostics/pregenerated/diagnostic_list.cpp 2012-02-16 02:19:18 +0000
1947@@ -568,6 +568,9 @@
1948 ZorbaErrorCode ZXQP0005_NOT_ENABLED( "ZXQP0005" );
1949
1950
1951+ZorbaErrorCode ZXQP0006_UNKNOWN_ENCODING( "ZXQP0006" );
1952+
1953+
1954 ZorbaErrorCode ZXQP0007_FUNCTION_SIGNATURE_NOT_EQUAL( "ZXQP0007" );
1955
1956
1957@@ -1004,6 +1007,9 @@
1958 ZorbaErrorCode ZOSE0005_DLL_LOAD_FAILED( "ZOSE0005" );
1959
1960
1961+ZorbaErrorCode ZOSE0006_TRANSCODING_ERROR( "ZOSE0006" );
1962+
1963+
1964 ZorbaErrorCode ZSTR0001_INDEX_ALREADY_EXISTS( "ZSTR0001" );
1965
1966
1967
1968=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
1969--- src/diagnostics/pregenerated/dict_en.cpp 2012-02-16 00:52:25 +0000
1970+++ src/diagnostics/pregenerated/dict_en.cpp 2012-02-16 02:19:18 +0000
1971@@ -354,6 +354,7 @@
1972 { "ZOSE0003", "stream read failure" },
1973 { "ZOSE0004", "${\"1\": }I/O error${: 2}" },
1974 { "ZOSE0005", "\"$1\": error loading dynamic library${: 2}" },
1975+ { "ZOSE0006", "stream transcoding error ($1)" },
1976 { "ZSTR0001", "\"$1\": index already exists" },
1977 { "ZSTR0002", "\"$1\": index does not exist" },
1978 { "ZSTR0003", "\"$1\": partial key insertion into index \"$2\"" },
1979@@ -392,6 +393,7 @@
1980 { "ZXQP0003", "internal error${: 1}" },
1981 { "ZXQP0004", "not yet implemented: $1" },
1982 { "ZXQP0005", "\"$1\": feature not enabled" },
1983+ { "ZXQP0006", "\"$1\": unknown character encoding" },
1984 { "ZXQP0007", "\"$1\": function signature does not match declaration" },
1985 { "ZXQP0008", "\"$1\": function implementation not found" },
1986 { "ZXQP0009", "\"$1\": function referred to by this local-name has the local-name \"$2\" instead" },
1987
1988=== modified file 'src/unit_tests/CMakeLists.txt'
1989--- src/unit_tests/CMakeLists.txt 2012-02-02 16:38:39 +0000
1990+++ src/unit_tests/CMakeLists.txt 2012-02-16 02:19:18 +0000
1991@@ -11,7 +11,6 @@
1992 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1993 # See the License for the specific language governing permissions and
1994 # limitations under the License.
1995-
1996
1997 SET(UNIT_TEST_SRCS
1998 string_instantiate.cpp
1999@@ -30,10 +29,9 @@
2000 tokenizer.cpp)
2001 ENDIF (NOT ZORBA_NO_FULL_TEXT)
2002
2003-IF(ZORBA_WITH_DEBUGGER)
2004- LIST(APPEND UNIT_TEST_SRCS
2005-# test_debugger_protocol.cpp
2006- )
2007-ENDIF(ZORBA_WITH_DEBUGGER)
2008+IF (NOT ZORBA_NO_UNICODE)
2009+ LIST (APPEND UNIT_TEST_SRCS
2010+ test_icu_streambuf.cpp)
2011+ENDIF (NOT ZORBA_NO_UNICODE)
2012
2013 # vim:set et sw=2 tw=2:
2014
2015=== added file 'src/unit_tests/test_icu_streambuf.cpp'
2016--- src/unit_tests/test_icu_streambuf.cpp 1970-01-01 00:00:00 +0000
2017+++ src/unit_tests/test_icu_streambuf.cpp 2012-02-16 02:19:18 +0000
2018@@ -0,0 +1,151 @@
2019+/*
2020+ * Copyright 2006-2008 The FLWOR Foundation.
2021+ *
2022+ * Licensed under the Apache License, Version 2.0 (the "License");
2023+ * you may not use this file except in compliance with the License.
2024+ * You may obtain a copy of the License at
2025+ *
2026+ * http://www.apache.org/licenses/LICENSE-2.0
2027+ *
2028+ * Unless required by applicable law or agreed to in writing, software
2029+ * distributed under the License is distributed on an "AS IS" BASIS,
2030+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2031+ * See the License for the specific language governing permissions and
2032+ * limitations under the License.
2033+ */
2034+
2035+#include <fstream>
2036+#include <iostream>
2037+#include <sstream>
2038+
2039+#include "util/transcode_streambuf.h"
2040+
2041+using namespace std;
2042+using namespace zorba;
2043+
2044+#define COPYRIGHT_ISO "\xA9"
2045+#define COPYRIGHT_UTF8 "\xC2\xA9"
2046+
2047+#define ONE_THIRD_UTF8 "\xE2\x85\x93"
2048+#define ONE_THIRD_UTF16BE "\x21\x53"
2049+
2050+struct test {
2051+ char const *ext_charset;
2052+ char const *ext_str;
2053+ int ext_len;
2054+ char const *utf8_str;
2055+};
2056+
2057+static test const tests[] = {
2058+ /* 0 */ { "ISO-8859-1", "Copyright " COPYRIGHT_ISO " 2011", 0, "Copyright " COPYRIGHT_UTF8 " 2011" },
2059+ /* 1 */ { "UTF-16BE", ONE_THIRD_UTF16BE "\0 \0c\0u\0p", 10, ONE_THIRD_UTF8 " cup" },
2060+ { 0, 0, 0, 0 }
2061+};
2062+
2063+static string make_ext_str( test const *t ) {
2064+ if ( t->ext_len )
2065+ return string( t->ext_str, t->ext_len );
2066+ return string( t->ext_str );
2067+}
2068+
2069+///////////////////////////////////////////////////////////////////////////////
2070+
2071+static int failures;
2072+
2073+static bool assert_true( int no, char const *expr, int line, bool result ) {
2074+ if ( !result ) {
2075+ cout << '#' << no << " FAILED, line " << line << ": " << expr << endl;
2076+ ++failures;
2077+ }
2078+ return result;
2079+}
2080+
2081+static void print_exception( int no, char const *expr, int line,
2082+ std::exception const &e ) {
2083+ assert_true( no, expr, line, false );
2084+ cout << "+ exception: " << e.what() << endl;
2085+}
2086+
2087+#define ASSERT_TRUE( NO, EXPR ) assert_true( NO, #EXPR, __LINE__, !!(EXPR) )
2088+
2089+#define ASSERT_TRUE_AND_NO_EXCEPTION( NO, EXPR ) \
2090+ try { ASSERT_TRUE( NO, EXPR ); } \
2091+ catch ( std::exception const &e ) { print_exception( NO, #EXPR, __LINE__, e ); }
2092+
2093+///////////////////////////////////////////////////////////////////////////////
2094+
2095+static bool test_getline( test const *t ) {
2096+ string const ext_str( make_ext_str( t ) );
2097+ istringstream iss( ext_str );
2098+ icu_streambuf xbuf( t->ext_charset, iss.rdbuf() );
2099+ iss.ios::rdbuf( &xbuf );
2100+
2101+ char utf8_buf[ 1024 ];
2102+ iss.getline( utf8_buf, sizeof utf8_buf );
2103+ if ( iss.gcount() ) {
2104+ string const utf8_str( utf8_buf );
2105+ return utf8_str == t->utf8_str;
2106+ }
2107+ return false;
2108+}
2109+
2110+static bool test_read( test const *t ) {
2111+ string const ext_str( make_ext_str( t ) );
2112+ istringstream iss( ext_str );
2113+ icu_streambuf xbuf( t->ext_charset, iss.rdbuf() );
2114+ iss.ios::rdbuf( &xbuf );
2115+
2116+ char utf8_buf[ 1024 ];
2117+ iss.read( utf8_buf, sizeof utf8_buf );
2118+ if ( iss.gcount() ) {
2119+ string const utf8_str( utf8_buf, iss.gcount() );
2120+ return utf8_str == t->utf8_str;
2121+ }
2122+ return false;
2123+}
2124+
2125+static bool test_insertion( test const *t ) {
2126+ ostringstream oss;
2127+ icu_streambuf xbuf( t->ext_charset, oss.rdbuf() );
2128+ oss.ios::rdbuf( &xbuf );
2129+
2130+ oss << t->utf8_str << flush;
2131+ string const ext_str( oss.str() );
2132+
2133+ string const expected_ext_str( make_ext_str( t ) );
2134+ return ext_str == expected_ext_str;
2135+}
2136+
2137+static bool test_put( test const *t ) {
2138+ ostringstream oss;
2139+ icu_streambuf xbuf( t->ext_charset, oss.rdbuf() );
2140+ oss.ios::rdbuf( &xbuf );
2141+
2142+ for ( char const *c = t->utf8_str; *c; ++c )
2143+ oss.put( *c );
2144+ string const ext_str( oss.str() );
2145+
2146+ string const expected_ext_str( make_ext_str( t ) );
2147+ return ext_str == expected_ext_str;
2148+}
2149+
2150+///////////////////////////////////////////////////////////////////////////////
2151+
2152+namespace zorba {
2153+namespace UnitTests {
2154+
2155+int test_icu_streambuf( int, char*[] ) {
2156+ int test_no = 0;
2157+ for ( test const *t = tests; t->utf8_str; ++t, ++test_no ) {
2158+ ASSERT_TRUE_AND_NO_EXCEPTION( test_no, test_getline( t ) );
2159+ ASSERT_TRUE_AND_NO_EXCEPTION( test_no, test_read( t ) );
2160+ ASSERT_TRUE_AND_NO_EXCEPTION( test_no, test_insertion( t ) );
2161+ ASSERT_TRUE_AND_NO_EXCEPTION( test_no, test_put( t ) );
2162+ }
2163+ cout << failures << " test(s) failed\n";
2164+ return failures ? 1 : 0;
2165+}
2166+
2167+} // namespace UnitTests
2168+} // namespace zorba
2169+/* vim:set et sw=2 ts=2: */
2170
2171=== modified file 'src/unit_tests/unit_test_list.h'
2172--- src/unit_tests/unit_test_list.h 2012-02-02 16:38:39 +0000
2173+++ src/unit_tests/unit_test_list.h 2012-02-16 02:19:18 +0000
2174@@ -17,6 +17,8 @@
2175 #ifndef ZORBA_UNIT_TEST_LIST_H
2176 #define ZORBA_UNIT_TEST_LIST_H
2177
2178+#include <iostream>
2179+
2180 #include <zorba/config.h>
2181
2182 namespace zorba {
2183@@ -34,6 +36,9 @@
2184 /**
2185 * ADD NEW UNIT TESTS HERE
2186 */
2187+#ifndef ZORBA_NO_UNICODE
2188+ int test_icu_streambuf( int, char*[] );
2189+#endif /* ZORBA_NO_UNICODE */
2190 int json_parser( int, char*[] );
2191
2192 void initializeTestList();
2193
2194=== modified file 'src/unit_tests/unit_tests.cpp'
2195--- src/unit_tests/unit_tests.cpp 2012-02-02 16:38:39 +0000
2196+++ src/unit_tests/unit_tests.cpp 2012-02-16 02:19:18 +0000
2197@@ -39,6 +39,9 @@
2198 void initializeTestList() {
2199 libunittests["string"] = test_string;
2200 libunittests["uri"] = runUriTest;
2201+#ifndef ZORBA_NO_UNICODE
2202+ libunittests["icu_streambuf"] = test_icu_streambuf;
2203+#endif /* ZORBA_NO_UNICODE */
2204 libunittests["json_parser"] = json_parser;
2205 libunittests["unique_ptr"] = test_unique_ptr;
2206 #ifndef ZORBA_NO_FULL_TEXT
2207
2208=== modified file 'src/util/CMakeLists.txt'
2209--- src/util/CMakeLists.txt 2011-12-20 18:29:15 +0000
2210+++ src/util/CMakeLists.txt 2012-02-16 02:19:18 +0000
2211@@ -41,7 +41,12 @@
2212 ENDIF(ZORBA_WITH_FILE_ACCESS)
2213
2214 IF(ZORBA_NO_UNICODE)
2215- LIST(APPEND UTIL_SRCS regex_ascii.cpp)
2216+ LIST(APPEND UTIL_SRCS
2217+ regex_ascii.cpp
2218+ passthru_streambuf.cpp)
2219+ELSE(ZORBA_NO_UNICODE)
2220+ LIST(APPEND UTIL_SRCS
2221+ icu_streambuf.cpp)
2222 ENDIF(ZORBA_NO_UNICODE)
2223
2224 HEADER_GROUP_SUBFOLDER(UTIL_SRCS fx)
2225
2226=== added file 'src/util/icu_streambuf.cpp'
2227--- src/util/icu_streambuf.cpp 1970-01-01 00:00:00 +0000
2228+++ src/util/icu_streambuf.cpp 2012-02-16 02:19:18 +0000
2229@@ -0,0 +1,300 @@
2230+/*
2231+ * Copyright 2006-2008 The FLWOR Foundation.
2232+ *
2233+ * Licensed under the Apache License, Version 2.0 (the "License");
2234+ * you may not use this file except in compliance with the License.
2235+ * You may obtain a copy of the License at
2236+ *
2237+ * http://www.apache.org/licenses/LICENSE-2.0
2238+ *
2239+ * Unless required by applicable law or agreed to in writing, software
2240+ * distributed under the License is distributed on an "AS IS" BASIS,
2241+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2242+ * See the License for the specific language governing permissions and
2243+ * limitations under the License.
2244+ */
2245+
2246+#define ZORBA_DEBUG_ICU_STREAMBUF 0
2247+
2248+#ifdef ZORBA_DEBUG_ICU_STREAMBUF
2249+# include <stdio.h>
2250+#endif
2251+
2252+#include <algorithm>
2253+#include <cassert>
2254+
2255+#include <zorba/diagnostic_list.h>
2256+
2257+#include "diagnostics/assert.h"
2258+#include "diagnostics/diagnostic.h"
2259+#include "diagnostics/zorba_exception.h"
2260+#include "util/cxx_util.h"
2261+#include "util/string_util.h"
2262+#include "util/utf8_util.h"
2263+
2264+#include "icu_streambuf.h"
2265+
2266+using namespace std;
2267+
2268+namespace zorba {
2269+
2270+int const Small_External_Buf_Size = 6;
2271+int const Large_External_Buf_Size = 4096;
2272+
2273+///////////////////////////////////////////////////////////////////////////////
2274+
2275+inline void icu_streambuf::buf_type_base::reset() {
2276+ pivot_source_ = pivot_target_ = pivot_buf_;
2277+}
2278+
2279+inline void icu_streambuf::resetg() {
2280+ setg(
2281+ g_.utf8_char_, g_.utf8_char_ + sizeof g_.utf8_char_,
2282+ g_.utf8_char_ + sizeof g_.utf8_char_
2283+ );
2284+}
2285+
2286+icu_streambuf::icu_streambuf( char const *charset, streambuf *orig ) :
2287+ proxy_streambuf( orig ),
2288+ no_conv_( !is_necessary( charset ) ),
2289+ external_conv_( no_conv_ ? nullptr : create_conv( charset ) ),
2290+ utf8_conv_( no_conv_ ? nullptr : create_conv( "UTF-8" ) )
2291+{
2292+ if ( !orig )
2293+ throw invalid_argument( "null streambuf" );
2294+ resetg();
2295+}
2296+
2297+icu_streambuf::~icu_streambuf() {
2298+ if ( external_conv_ )
2299+ ucnv_close( external_conv_ );
2300+ if ( utf8_conv_ )
2301+ ucnv_close( utf8_conv_ );
2302+}
2303+
2304+void icu_streambuf::clear() {
2305+ if ( !no_conv_ ) {
2306+ ucnv_reset( external_conv_ );
2307+ ucnv_reset( utf8_conv_ );
2308+ g_.reset();
2309+ p_.reset();
2310+ resetg();
2311+ }
2312+}
2313+
2314+UConverter* icu_streambuf::create_conv( char const *charset ) {
2315+ UErrorCode err = U_ZERO_ERROR;
2316+ UConverter *const conv = ucnv_open( charset, &err );
2317+ ucnv_setFromUCallBack(
2318+ conv, UCNV_FROM_U_CALLBACK_STOP, nullptr, nullptr, nullptr, &err
2319+ );
2320+ ucnv_setToUCallBack(
2321+ conv, UCNV_TO_U_CALLBACK_STOP, nullptr, nullptr, nullptr, &err
2322+ );
2323+ if ( !conv || U_FAILURE( err ) ) {
2324+ if ( conv )
2325+ ucnv_close( conv );
2326+ throw invalid_argument( charset );
2327+ }
2328+ return conv;
2329+}
2330+
2331+bool icu_streambuf::is_necessary( char const *charset ) {
2332+ //
2333+ // Checking for "US-ASCII" explicitly isn't necessary since ICU knows about
2334+ // aliases.
2335+ //
2336+ return ucnv_compareNames( charset, "ASCII" )
2337+ && ucnv_compareNames( charset, "UTF-8" );
2338+}
2339+
2340+bool icu_streambuf::is_supported( char const *charset ) {
2341+ try {
2342+ ucnv_close( create_conv( charset ) );
2343+ return true;
2344+ }
2345+ catch ( invalid_argument const& ) {
2346+ return false;
2347+ }
2348+}
2349+
2350+icu_streambuf::pos_type icu_streambuf::seekoff( off_type o, ios_base::seekdir d,
2351+ ios_base::openmode m ) {
2352+ clear();
2353+ return original()->pubseekoff( o, d, m );
2354+}
2355+
2356+icu_streambuf::pos_type icu_streambuf::seekpos( pos_type p,
2357+ ios_base::openmode m ) {
2358+ clear();
2359+ return original()->pubseekpos( p, m );
2360+}
2361+
2362+streambuf* icu_streambuf::setbuf( char_type *p, streamsize s ) {
2363+ original()->pubsetbuf( p, s );
2364+ return this;
2365+}
2366+
2367+int icu_streambuf::sync() {
2368+ return original()->pubsync();
2369+}
2370+
2371+icu_streambuf::int_type icu_streambuf::overflow( int_type c ) {
2372+#if ZORBA_DEBUG_ICU_STREAMBUF
2373+ printf( "overflow()\n" );
2374+#endif
2375+ if ( no_conv_ )
2376+ return original()->sputc( c );
2377+
2378+ if ( traits_type::eq_int_type( c, traits_type::eof() ) )
2379+ return traits_type::eof();
2380+
2381+ char_type const utf8_byte = traits_type::to_char_type( c );
2382+ char_type const *from = &utf8_byte;
2383+ char ebuf[ Small_External_Buf_Size ], *to = ebuf;
2384+
2385+ bool const ok = to_external( &from, from + 1, &to, to + sizeof ebuf );
2386+ assert( ok );
2387+ if ( streamsize const n = to - ebuf ) {
2388+ original()->sputn( ebuf, n );
2389+ p_.reset();
2390+ }
2391+
2392+ return c;
2393+}
2394+
2395+bool icu_streambuf::to_external( char_type const **from,
2396+ char_type const *from_end, char **to,
2397+ char const *to_end, bool flush ) {
2398+ UErrorCode err = U_ZERO_ERROR;
2399+ ucnv_convertEx(
2400+ external_conv_, utf8_conv_, to, to_end, from, from_end,
2401+ p_.pivot_buf_, &p_.pivot_source_, &p_.pivot_target_,
2402+ p_.pivot_buf_ + sizeof p_.pivot_buf_,
2403+ /*reset*/ false, flush, &err
2404+ );
2405+ if ( err == U_TRUNCATED_CHAR_FOUND || err == U_BUFFER_OVERFLOW_ERROR )
2406+ return false;
2407+ if ( U_FAILURE( err ) )
2408+ throw ZORBA_EXCEPTION(
2409+ zerr::ZOSE0006_TRANSCODING_ERROR, ERROR_PARAMS( u_errorName( err ) )
2410+ );
2411+ return true;
2412+}
2413+
2414+bool icu_streambuf::to_utf8( char const **from, char const *from_end,
2415+ char_type **to, char_type const *to_end,
2416+ bool flush ) {
2417+ UErrorCode err = U_ZERO_ERROR;
2418+ ucnv_convertEx(
2419+ utf8_conv_, external_conv_, to, to_end, from, from_end,
2420+ g_.pivot_buf_, &g_.pivot_source_, &g_.pivot_target_,
2421+ g_.pivot_buf_ + sizeof g_.pivot_buf_,
2422+ /*reset*/ false, flush, &err
2423+ );
2424+ if ( err == U_TRUNCATED_CHAR_FOUND || err == U_BUFFER_OVERFLOW_ERROR )
2425+ return false;
2426+ if ( U_FAILURE( err ) )
2427+ throw ZORBA_EXCEPTION(
2428+ zerr::ZOSE0006_TRANSCODING_ERROR, ERROR_PARAMS( u_errorName( err ) )
2429+ );
2430+ return true;
2431+}
2432+
2433+icu_streambuf::int_type icu_streambuf::underflow() {
2434+#if ZORBA_DEBUG_ICU_STREAMBUF
2435+ printf( "underflow()\n" );
2436+#endif
2437+ if ( no_conv_ )
2438+ return original()->sgetc();
2439+
2440+ if ( gptr() >= egptr() ) {
2441+ utf8::storage_type *to = g_.utf8_char_;
2442+ utf8::storage_type const *const to_end = to + sizeof g_.utf8_char_;
2443+
2444+ while ( true ) {
2445+ int_type const c = original()->sbumpc();
2446+ if ( traits_type::eq_int_type( c, traits_type::eof() ) )
2447+ return traits_type::eof();
2448+
2449+ char const ebyte = traits_type::to_char_type( c );
2450+ char const *from = &ebyte;
2451+
2452+ to_utf8( &from, from + 1, &to, to_end );
2453+ if ( to > g_.utf8_char_ ) {
2454+ setg( g_.utf8_char_, g_.utf8_char_, to );
2455+ g_.reset();
2456+ break;
2457+ }
2458+ }
2459+ }
2460+ return traits_type::to_int_type( *gptr() );
2461+}
2462+
2463+streamsize icu_streambuf::xsgetn( char_type *to, streamsize size ) {
2464+#if ZORBA_DEBUG_ICU_STREAMBUF
2465+ printf( "xsgetn()\n" );
2466+#endif
2467+ if ( no_conv_ )
2468+ return original()->sgetn( to, size );
2469+
2470+ streamsize return_size = 0;
2471+ char_type *const to_end = to + size;
2472+
2473+ if ( streamsize const gsize = egptr() - gptr() ) {
2474+ // must first get any chars in g_.utf8_char_
2475+ streamsize const n = min( gsize, size );
2476+ traits_type::copy( to, gptr(), n );
2477+ gbump( n );
2478+ to += n;
2479+ size -= n, return_size += n;
2480+ }
2481+
2482+ while ( size > 0 ) {
2483+ char ebuf[ Large_External_Buf_Size ];
2484+ streamsize const get = min( (streamsize)(sizeof ebuf), size );
2485+ if ( streamsize const got = original()->sgetn( ebuf, get ) ) {
2486+ char const *from = ebuf;
2487+ char_type const *const to_orig = to;
2488+ int_type const peek = original()->sgetc();
2489+ bool const flush = traits_type::eq_int_type( peek, traits_type::eof() );
2490+ to_utf8( &from, from + got, &to, to_end, flush );
2491+ streamsize const n = to - to_orig;
2492+ size -= n, return_size += n;
2493+ if ( flush )
2494+ break;
2495+ } else
2496+ break;
2497+ }
2498+ return return_size;
2499+}
2500+
2501+streamsize icu_streambuf::xsputn( char_type const *from, streamsize size ) {
2502+#if ZORBA_DEBUG_ICU_STREAMBUF
2503+ printf( "xsputn()\n" );
2504+#endif
2505+ if ( no_conv_ )
2506+ return original()->sputn( from, size );
2507+
2508+ streamsize return_size = 0;
2509+ char_type const *const from_end = from + size;
2510+ char ebuf[ Large_External_Buf_Size ], *to = ebuf;
2511+ char const *const to_end = to + sizeof ebuf;
2512+
2513+ while ( size > 0 ) {
2514+ char_type const *const from_orig = from;
2515+ to_external( &from, from_end, &to, to_end );
2516+ streamsize n = to - ebuf;
2517+ if ( n && !original()->sputn( ebuf, n ) )
2518+ break;
2519+ to = ebuf;
2520+ n = from - from_orig;
2521+ size -= n, return_size += n;
2522+ }
2523+ return return_size;
2524+}
2525+
2526+///////////////////////////////////////////////////////////////////////////////
2527+
2528+} // namespace zorba
2529+/* vim:set et sw=2 ts=2: */
2530
2531=== added file 'src/util/icu_streambuf.h'
2532--- src/util/icu_streambuf.h 1970-01-01 00:00:00 +0000
2533+++ src/util/icu_streambuf.h 2012-02-16 02:19:18 +0000
2534@@ -0,0 +1,140 @@
2535+/*
2536+ * Copyright 2006-2008 The FLWOR Foundation.
2537+ *
2538+ * Licensed under the Apache License, Version 2.0 (the "License");
2539+ * you may not use this file except in compliance with the License.
2540+ * You may obtain a copy of the License at
2541+ *
2542+ * http://www.apache.org/licenses/LICENSE-2.0
2543+ *
2544+ * Unless required by applicable law or agreed to in writing, software
2545+ * distributed under the License is distributed on an "AS IS" BASIS,
2546+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2547+ * See the License for the specific language governing permissions and
2548+ * limitations under the License.
2549+ */
2550+
2551+#ifndef ZORBA_ICU_STREAMBUF_H
2552+#define ZORBA_ICU_STREAMBUF_H
2553+
2554+#include <zorba/transcode_stream.h>
2555+
2556+#include "util/utf8_util.h"
2557+
2558+namespace zorba {
2559+
2560+///////////////////////////////////////////////////////////////////////////////
2561+
2562+/**
2563+ * An %icu_streambuf is-a std::streambuf for transcoding character encodings
2564+ * from/to UTF-8 on-the-fly.
2565+ *
2566+ * To use it, replace a stream's streambuf:
2567+ * \code
2568+ * istream is;
2569+ * // ...
2570+ * icu_streambuf xbuf( "ISO-8859-1", is.rdbuf() );
2571+ * is.ios::rdbuf( &xbuf );
2572+ * \endcode
2573+ * Note that the %icu_streambuf must exist for as long as it's being used by
2574+ * the stream. If you are replacing the streabuf for a stream you did not
2575+ * create, you should set it back to the original streambuf:
2576+ * \code
2577+ * void f( ostream &os ) {
2578+ * icu_streambuf xbuf( "ISO-8859-1", os.rdbuf() );
2579+ * try {
2580+ * os.ios::rdbuf( &xbuf );
2581+ * // ...
2582+ * }
2583+ * catch ( ... ) {
2584+ * os.ios::rdbuf( xbuf.original() );
2585+ * throw;
2586+ * }
2587+ * }
2588+ * \endcode
2589+ *
2590+ * While %icu_streambuf does support seeking, the positions are relative to the
2591+ * original byte stream.
2592+ */
2593+class icu_streambuf : public proxy_streambuf {
2594+public:
2595+ /**
2596+ * Constructs an %icu_streambuf.
2597+ *
2598+ * @param charset The name of the character encoding to convert from/to.
2599+ * @param orig The original streambuf to read/write from/to.
2600+ */
2601+ icu_streambuf( char const *charset, std::streambuf *orig );
2602+
2603+ /**
2604+ * Destructs an %icu_streambuf.
2605+ */
2606+ ~icu_streambuf();
2607+
2608+ /**
2609+ * Checks whether it would be necessary to transcode from the given character
2610+ * encoding to UTF-8.
2611+ *
2612+ * @param charset The name of the character encoding to check.
2613+ * @return \c true only if t would be necessary to transcode from the given
2614+ * character encoding to UTF-8.
2615+ */
2616+ static bool is_necessary( char const *charset );
2617+
2618+ /**
2619+ * Checks whether the given character set is supported for transcoding.
2620+ *
2621+ * @param charset The name of the character encoding to check.
2622+ * @return \c true only if the character encoding is supported.
2623+ */
2624+ static bool is_supported( char const *charset );
2625+
2626+protected:
2627+ pos_type seekoff( off_type, std::ios_base::seekdir, std::ios_base::openmode );
2628+ pos_type seekpos( pos_type, std::ios_base::openmode );
2629+ std::streambuf* setbuf( char_type*, std::streamsize );
2630+ int sync();
2631+ int_type overflow( int_type );
2632+ int_type underflow();
2633+ std::streamsize xsgetn( char_type*, std::streamsize );
2634+ std::streamsize xsputn( char_type const*, std::streamsize );
2635+
2636+private:
2637+ struct buf_type_base {
2638+ UChar pivot_buf_[ 4096 ], *pivot_source_, *pivot_target_;
2639+
2640+ buf_type_base() { reset(); }
2641+ void reset();
2642+ };
2643+
2644+ struct gbuf_type : buf_type_base {
2645+ utf8::encoded_char_type utf8_char_;
2646+ };
2647+ gbuf_type g_;
2648+
2649+ typedef buf_type_base pbuf_type;
2650+ pbuf_type p_;
2651+
2652+ bool const no_conv_; // true = no conversion needed
2653+ UConverter *const external_conv_, *const utf8_conv_;
2654+
2655+ void clear();
2656+ static UConverter* create_conv( char const *charset );
2657+ void resetg();
2658+
2659+ bool to_external( char_type const **from, char_type const *from_end,
2660+ char **to, char const *to_end, bool flush = false );
2661+
2662+ bool to_utf8( char const **from, char const *from_end, char_type **to,
2663+ char_type const *to_end, bool flush = false );
2664+
2665+ // forbid
2666+ icu_streambuf( icu_streambuf const& );
2667+ icu_streambuf& operator=( icu_streambuf const& );
2668+};
2669+
2670+///////////////////////////////////////////////////////////////////////////////
2671+
2672+} // namespace zorba
2673+#endif /* ZORBA_ICU_STREAMBUF_H */
2674+/* vim:set et sw=2 ts=2: */
2675
2676=== added file 'src/util/passthru_streambuf.cpp'
2677--- src/util/passthru_streambuf.cpp 1970-01-01 00:00:00 +0000
2678+++ src/util/passthru_streambuf.cpp 2012-02-16 02:19:18 +0000
2679@@ -0,0 +1,105 @@
2680+/*
2681+ * Copyright 2006-2008 The FLWOR Foundation.
2682+ *
2683+ * Licensed under the Apache License, Version 2.0 (the "License");
2684+ * you may not use this file except in compliance with the License.
2685+ * You may obtain a copy of the License at
2686+ *
2687+ * http://www.apache.org/licenses/LICENSE-2.0
2688+ *
2689+ * Unless required by applicable law or agreed to in writing, software
2690+ * distributed under the License is distributed on an "AS IS" BASIS,
2691+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2692+ * See the License for the specific language governing permissions and
2693+ * limitations under the License.
2694+ */
2695+
2696+#include "passthru_streambuf.h"
2697+
2698+using namespace std;
2699+
2700+namespace zorba {
2701+
2702+///////////////////////////////////////////////////////////////////////////////
2703+
2704+passthru_streambuf::passthru_streambuf( char const*, streambuf *orig ) :
2705+ proxy_streambuf( orig )
2706+{
2707+ if ( !orig )
2708+ throw invalid_argument( "null streambuf" );
2709+}
2710+
2711+passthru_streambuf::~passthru_streambuf() {
2712+ // out-of-line since it's virtual
2713+}
2714+
2715+void passthru_streambuf::imbue( std::locale const &loc ) {
2716+ original()->pubimbue( loc );
2717+}
2718+
2719+bool passthru_streambuf::is_necessary( char const *cc_charset ) {
2720+ zstring charset( cc_charset );
2721+ ascii::trim_whitespace( charset );
2722+ ascii::to_upper( charset );
2723+ return charset != "ASCII"
2724+ && charset != "US-ASCII"
2725+ && charset != "UTF-8";
2726+}
2727+
2728+bool passthru_streambuf::is_supported( char const *cc_charset ) {
2729+ return !is_necessary( charset );
2730+}
2731+
2732+passthru_streambuf::pos_type
2733+passthru_streambuf::seekoff( off_type o, ios_base::seekdir d,
2734+ ios_base::openmode m ) {
2735+ return original()->pubseekoff( o, d, m );
2736+}
2737+
2738+passthru_streambuf::pos_type
2739+passthru_streambuf::seekpos( pos_type p, ios_base::openmode m ) {
2740+ return original()->pubseekpos( p, m );
2741+}
2742+
2743+streambuf* passthru_streambuf::setbuf( char_type *p, streamsize s ) {
2744+ original()->pubsetbuf( p, s );
2745+ return this;
2746+}
2747+
2748+streamsize passthru_streambuf::showmanyc() {
2749+ return original()->in_avail();
2750+}
2751+
2752+int passthru_streambuf::sync() {
2753+ return original()->pubsync();
2754+}
2755+
2756+passthru_streambuf::int_type passthru_streambuf::overflow( int_type c ) {
2757+ return original()->sputc( c );
2758+}
2759+
2760+passthru_streambuf::int_type passthru_streambuf::pbackfail( int_type c ) {
2761+ return original()->sputbackc( traits_type::to_char_type( c ) );
2762+}
2763+
2764+passthru_streambuf::int_type passthru_streambuf::uflow() {
2765+ return original()->sbumpc();
2766+}
2767+
2768+passthru_streambuf::int_type passthru_streambuf::underflow() {
2769+ return original()->sgetc();
2770+}
2771+
2772+streamsize passthru_streambuf::xsgetn( char_type *to, streamsize size ) {
2773+ return original()->sgetn( to, size );
2774+}
2775+
2776+streamsize passthru_streambuf::xsputn( char_type const *from,
2777+ streamsize size ) {
2778+ return original()->sputn( from, size );
2779+}
2780+
2781+///////////////////////////////////////////////////////////////////////////////
2782+
2783+} // namespace zorba
2784+/* vim:set et sw=2 ts=2: */
2785
2786=== added file 'src/util/passthru_streambuf.h'
2787--- src/util/passthru_streambuf.h 1970-01-01 00:00:00 +0000
2788+++ src/util/passthru_streambuf.h 2012-02-16 02:19:18 +0000
2789@@ -0,0 +1,76 @@
2790+/*
2791+ * Copyright 2006-2008 The FLWOR Foundation.
2792+ *
2793+ * Licensed under the Apache License, Version 2.0 (the "License");
2794+ * you may not use this file except in compliance with the License.
2795+ * You may obtain a copy of the License at
2796+ *
2797+ * http://www.apache.org/licenses/LICENSE-2.0
2798+ *
2799+ * Unless required by applicable law or agreed to in writing, software
2800+ * distributed under the License is distributed on an "AS IS" BASIS,
2801+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2802+ * See the License for the specific language governing permissions and
2803+ * limitations under the License.
2804+ */
2805+
2806+#ifndef ZORBA_PASSTHRU_STREAMBUF_H
2807+#define ZORBA_PASSTHRU_STREAMBUF_H
2808+
2809+#include <zorba/transcode_streambuf.h>
2810+
2811+namespace zorba {
2812+
2813+///////////////////////////////////////////////////////////////////////////////
2814+
2815+/**
2816+ * A %passthru_streambuf is-a std::streambuf TODO
2817+ */
2818+class passthru_streambuf : public proxy_streambuf {
2819+public:
2820+ /**
2821+ * Constructs an %passthru_streambuf.
2822+ *
2823+ * @param charset The name of the character encoding to convert from/to.
2824+ * @param orig The original streambuf to read/write from/to.
2825+ */
2826+ passthru_streambuf( char const *charset, std::streambuf *orig );
2827+
2828+ /**
2829+ * Destructs an %passthru_streambuf.
2830+ */
2831+ ~passthru_streambuf();
2832+
2833+ /**
2834+ * Checks whether the given character set is supported for transcoding.
2835+ *
2836+ * @param charset The name of the character encoding to check.
2837+ * @return \c true only if the character encoding is supported.
2838+ */
2839+ static bool is_supported( char const *charset );
2840+
2841+protected:
2842+ void imbue( std::locale const& );
2843+ pos_type seekoff( off_type, std::ios_base::seekdir, std::ios_base::openmode );
2844+ pos_type seekpos( pos_type, std::ios_base::openmode );
2845+ std::streambuf* setbuf( char_type*, std::streamsize );
2846+ std::streamsize showmanyc();
2847+ int sync();
2848+ int_type overflow( int_type );
2849+ int_type pbackfail( int_type );
2850+ int_type uflow();
2851+ int_type underflow();
2852+ std::streamsize xsgetn( char_type*, std::streamsize );
2853+ std::streamsize xsputn( char_type const*, std::streamsize );
2854+
2855+private:
2856+ // forbid
2857+ passthru_streambuf( passthru_streambuf const& );
2858+ passthru_streambuf& operator=( passthru_streambuf const& );
2859+};
2860+
2861+///////////////////////////////////////////////////////////////////////////////
2862+
2863+} // namespace zorba
2864+#endif /* ZORBA_PASSTHRU_STREAMBUF_H */
2865+/* vim:set et sw=2 ts=2: */
2866
2867=== added file 'src/util/transcode_streambuf.h'
2868--- src/util/transcode_streambuf.h 1970-01-01 00:00:00 +0000
2869+++ src/util/transcode_streambuf.h 2012-02-16 02:19:18 +0000
2870@@ -0,0 +1,47 @@
2871+/*
2872+ * Copyright 2006-2008 The FLWOR Foundation.
2873+ *
2874+ * Licensed under the Apache License, Version 2.0 (the "License");
2875+ * you may not use this file except in compliance with the License.
2876+ * You may obtain a copy of the License at
2877+ *
2878+ * http://www.apache.org/licenses/LICENSE-2.0
2879+ *
2880+ * Unless required by applicable law or agreed to in writing, software
2881+ * distributed under the License is distributed on an "AS IS" BASIS,
2882+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2883+ * See the License for the specific language governing permissions and
2884+ * limitations under the License.
2885+ */
2886+
2887+#ifndef ZORBA_TRANSCODE_STREAMBUF_H
2888+#define ZORBA_TRANSCODE_STREAMBUF_H
2889+
2890+#include <zorba/config.h>
2891+
2892+///////////////////////////////////////////////////////////////////////////////
2893+
2894+#ifdef ZORBA_NO_UNICODE
2895+# include "passthru_streambuf.h"
2896+#else
2897+# include "icu_streambuf.h"
2898+#endif /* ZORBA_NO_UNICODE */
2899+
2900+namespace zorba {
2901+namespace internal {
2902+namespace transcode {
2903+
2904+#ifdef ZORBA_NO_UNICODE
2905+typedef passthru_streambuf streambuf;
2906+#else
2907+typedef icu_streambuf streambuf;
2908+#endif /* ZORBA_NO_UNICODE */
2909+
2910+} // namespace transcode
2911+} // namespace internal
2912+} // namespace zorba
2913+
2914+///////////////////////////////////////////////////////////////////////////////
2915+
2916+#endif /* ZORBA_TRANSCODE_STREAMBUF_H */
2917+/* vim:set et sw=2 ts=2: */
2918
2919=== added file 'test/rbkt/ExpQueryResults/zorba/file/cp1252.xml.res'
2920--- test/rbkt/ExpQueryResults/zorba/file/cp1252.xml.res 1970-01-01 00:00:00 +0000
2921+++ test/rbkt/ExpQueryResults/zorba/file/cp1252.xml.res 2012-02-16 02:19:18 +0000
2922@@ -0,0 +1,1 @@
2923+üäö
2924
2925=== added file 'test/rbkt/Queries/zorba/file/cp1252.txt'
2926--- test/rbkt/Queries/zorba/file/cp1252.txt 1970-01-01 00:00:00 +0000
2927+++ test/rbkt/Queries/zorba/file/cp1252.txt 2012-02-16 02:19:18 +0000
2928@@ -0,0 +1,1 @@
2929+üäö
2930
2931=== added file 'test/rbkt/Queries/zorba/file/cp1252.xq'
2932--- test/rbkt/Queries/zorba/file/cp1252.xq 1970-01-01 00:00:00 +0000
2933+++ test/rbkt/Queries/zorba/file/cp1252.xq 2012-02-16 02:19:18 +0000
2934@@ -0,0 +1,3 @@
2935+import module namespace f = "http://expath.org/ns/file";
2936+
2937+f:read-text(fn:resolve-uri("cp1252.txt"), "CP1252")
2938
2939=== added file 'test/rbkt/Queries/zorba/file/invalid_encoding.spec'
2940--- test/rbkt/Queries/zorba/file/invalid_encoding.spec 1970-01-01 00:00:00 +0000
2941+++ test/rbkt/Queries/zorba/file/invalid_encoding.spec 2012-02-16 02:19:18 +0000
2942@@ -0,0 +1,1 @@
2943+Error: http://expath.org/ns/file:FOFL0006
2944
2945=== added file 'test/rbkt/Queries/zorba/file/invalid_encoding.xq'
2946--- test/rbkt/Queries/zorba/file/invalid_encoding.xq 1970-01-01 00:00:00 +0000
2947+++ test/rbkt/Queries/zorba/file/invalid_encoding.xq 2012-02-16 02:19:18 +0000
2948@@ -0,0 +1,3 @@
2949+import module namespace f = "http://expath.org/ns/file";
2950+
2951+f:read-text(fn:resolve-uri("cp1252.txt"), "FOO")
2952
2953=== modified file 'test/rbkt/Queries/zorba/http-client/send-request/http2-read-svg.xq'
2954--- test/rbkt/Queries/zorba/http-client/send-request/http2-read-svg.xq 2011-08-23 07:11:31 +0000
2955+++ test/rbkt/Queries/zorba/http-client/send-request/http2-read-svg.xq 2012-02-16 02:19:18 +0000
2956@@ -7,9 +7,9 @@
2957 auth-method="Basic"
2958 send-authorization="true"
2959 username="zorba"
2960- password="blub"/>;
2961+ password="blub"
2962+ override-media-type="application/xml; charset=utf-8"/>;
2963
2964 variable $http-res := http:send-request($req, (), ());
2965
2966 $http-res[2]
2967-

Subscribers

People subscribed via source and target branches