Merge lp:~zorba-coders/zorba/feature-transcode_streambuf into lp:zorba

Proposed by Matthias Brantner on 2012-02-08
Status: Superseded
Proposed branch: lp:~zorba-coders/zorba/feature-transcode_streambuf
Merge into: lp:zorba
Diff against target: 2967 lines (+1874/-555)
37 files modified
ChangeLog (+4/-0)
include/zorba/internal/proxy.h (+48/-0)
include/zorba/pregenerated/diagnostic_list.h (+4/-0)
include/zorba/transcode_stream.h (+213/-0)
modules/ExternalModules.conf (+1/-1)
modules/com/zorba-xquery/www/modules/http-client.xq (+2/-2)
modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.cpp (+337/-338)
modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.h (+164/-143)
modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.cpp (+71/-21)
modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.h (+10/-6)
modules/com/zorba-xquery/www/modules/pregenerated/errors.xq (+8/-0)
modules/org/expath/ns/file.xq.src/file.cpp (+25/-10)
modules/org/expath/ns/file.xq.src/file_function.cpp (+0/-5)
modules/org/expath/ns/file.xq.src/file_function.h (+5/-9)
modules/org/expath/ns/file.xq.src/file_module.cpp (+2/-5)
modules/org/expath/ns/file.xq.src/file_module.h (+13/-6)
src/api/CMakeLists.txt (+1/-0)
src/api/transcode_streambuf.cpp (+102/-0)
src/diagnostics/diagnostic_en.xml (+8/-0)
src/diagnostics/pregenerated/diagnostic_list.cpp (+6/-0)
src/diagnostics/pregenerated/dict_en.cpp (+2/-0)
src/unit_tests/CMakeLists.txt (+4/-6)
src/unit_tests/test_icu_streambuf.cpp (+151/-0)
src/unit_tests/unit_test_list.h (+5/-0)
src/unit_tests/unit_tests.cpp (+3/-0)
src/util/CMakeLists.txt (+6/-1)
src/util/icu_streambuf.cpp (+300/-0)
src/util/icu_streambuf.h (+140/-0)
src/util/passthru_streambuf.cpp (+105/-0)
src/util/passthru_streambuf.h (+76/-0)
src/util/transcode_streambuf.h (+47/-0)
test/rbkt/ExpQueryResults/zorba/file/cp1252.xml.res (+1/-0)
test/rbkt/Queries/zorba/file/cp1252.txt (+1/-0)
test/rbkt/Queries/zorba/file/cp1252.xq (+3/-0)
test/rbkt/Queries/zorba/file/invalid_encoding.spec (+1/-0)
test/rbkt/Queries/zorba/file/invalid_encoding.xq (+3/-0)
test/rbkt/Queries/zorba/http-client/send-request/http2-read-svg.xq (+2/-2)
To merge this branch: bzr merge lp:~zorba-coders/zorba/feature-transcode_streambuf
Reviewer Review Type Date Requested Status
Paul J. Lucas 2012-02-08 Approve on 2012-02-15
Matthias Brantner Approve on 2012-02-08
Review via email: mp+92113@code.launchpad.net

This proposal supersedes a proposal from 2012-02-08.

This proposal has been superseded by a proposal from 2012-02-16.

Commit message

Added transcode_streambuf.

Description of the change

Added transcode_streambuf.

To post a comment you must log in.
Paul J. Lucas (paul-lucas) : Posted in a previous version of this proposal
review: Approve
review: Approve
Zorba Build Bot (zorba-buildbot) wrote :

The attempt to merge lp:~zorba-coders/zorba/feature-transcode_streambuf into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:274 (message):
  Validation queue job feature-transcode_streambuf-2012-02-08T19-21-05.882Z
  is finished. The final status was:

  3 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Paul J. Lucas (paul-lucas) :
review: Approve
Zorba Build Bot (zorba-buildbot) wrote :

The attempt to merge lp:~zorba-coders/zorba/feature-transcode_streambuf into lp:zorba failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:274 (message):
  Validation queue job feature-transcode_streambuf-2012-02-15T16-29-00.272Z
  is finished. The final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Zorba Build Bot (zorba-buildbot) wrote :

Attempt to merge into lp:zorba failed due to conflicts:

text conflict in src/unit_tests/unit_test_list.h
text conflict in src/unit_tests/unit_tests.cpp

Zorba Build Bot (zorba-buildbot) wrote :

There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog'
2--- ChangeLog 2012-02-16 00:52:25 +0000
3+++ ChangeLog 2012-02-16 02:09:20 +0000
4@@ -34,6 +34,10 @@
5 * zerr is not predeclared anymore to be http://www.zorba-xquery.com/errors
6 * Add new XQuery interface for the PHP bindings.
7 * Added API method Item::getNamespaceBindings().
8+ * Added a transcoding streambuffer to the API which allows transcoding arbitrary encodings
9+ from and to UTF-8
10+ * file:read-text is able to handle arbitrary encodings (fixes bug #867159)
11+ * http:send-request is able to handle arbitrary encodings
12 * Fixed bug #917981 (disallow declaring same module twice).
13 * Added API method StaticContext::getNamespaceBindings() (see bug #905035)
14 * Deprecated StaticContext:getNamespaceURIByPrefix()
15
16=== added file 'include/zorba/internal/proxy.h'
17--- include/zorba/internal/proxy.h 1970-01-01 00:00:00 +0000
18+++ include/zorba/internal/proxy.h 2012-02-16 02:09:20 +0000
19@@ -0,0 +1,48 @@
20+/*
21+ * Copyright 2006-2008 The FLWOR Foundation.
22+ *
23+ * Licensed under the Apache License, Version 2.0 (the "License");
24+ * you may not use this file except in compliance with the License.
25+ * You may obtain a copy of the License at
26+ *
27+ * http://www.apache.org/licenses/LICENSE-2.0
28+ *
29+ * Unless required by applicable law or agreed to in writing, software
30+ * distributed under the License is distributed on an "AS IS" BASIS,
31+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
32+ * See the License for the specific language governing permissions and
33+ * limitations under the License.
34+ */
35+
36+#ifndef ZORBA_INTERNAL_PROXY_H
37+#define ZORBA_INTERNAL_PROXY_H
38+
39+namespace zorba {
40+namespace internal {
41+namespace ztd {
42+
43+///////////////////////////////////////////////////////////////////////////////
44+
45+/**
46+ * \internal
47+ * A %proxy<T> is-a \c T that also contains a T* -- a pointer to the original.
48+ */
49+template<class OriginalType>
50+class proxy : public OriginalType {
51+public:
52+ proxy( OriginalType *p ) : original_( p ) { }
53+
54+ OriginalType* original() const {
55+ return original_;
56+ }
57+private:
58+ OriginalType *original_;
59+};
60+
61+///////////////////////////////////////////////////////////////////////////////
62+
63+} // namespace ztd
64+} // namespace internal
65+} // namespace zorba
66+#endif /* ZORBA_INTERNAL_PROXY_H */
67+/* vim:set et sw=2 ts=2: */
68
69=== modified file 'include/zorba/pregenerated/diagnostic_list.h'
70--- include/zorba/pregenerated/diagnostic_list.h 2012-01-26 01:35:11 +0000
71+++ include/zorba/pregenerated/diagnostic_list.h 2012-02-16 02:09:20 +0000
72@@ -392,6 +392,8 @@
73
74 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP0005_NOT_ENABLED;
75
76+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP0006_UNKNOWN_ENCODING;
77+
78 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP0007_FUNCTION_SIGNATURE_NOT_EQUAL;
79
80 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP0008_FUNCTION_IMPL_NOT_FOUND;
81@@ -684,6 +686,8 @@
82
83 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZOSE0005_DLL_LOAD_FAILED;
84
85+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZOSE0006_TRANSCODING_ERROR;
86+
87 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZSTR0001_INDEX_ALREADY_EXISTS;
88
89 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZSTR0002_INDEX_DOES_NOT_EXIST;
90
91=== added file 'include/zorba/transcode_stream.h'
92--- include/zorba/transcode_stream.h 1970-01-01 00:00:00 +0000
93+++ include/zorba/transcode_stream.h 2012-02-16 02:09:20 +0000
94@@ -0,0 +1,213 @@
95+/*
96+ * Copyright 2006-2008 The FLWOR Foundation.
97+ *
98+ * Licensed under the Apache License, Version 2.0 (the "License");
99+ * you may not use this file except in compliance with the License.
100+ * You may obtain a copy of the License at
101+ *
102+ * http://www.apache.org/licenses/LICENSE-2.0
103+ *
104+ * Unless required by applicable law or agreed to in writing, software
105+ * distributed under the License is distributed on an "AS IS" BASIS,
106+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
107+ * See the License for the specific language governing permissions and
108+ * limitations under the License.
109+ */
110+
111+#ifndef ZORBA_TRANSCODE_STREAM_API_H
112+#define ZORBA_TRANSCODE_STREAM_API_H
113+
114+#include <stdexcept>
115+#include <streambuf>
116+#include <string>
117+
118+#include <zorba/config.h>
119+#include <zorba/internal/proxy.h>
120+#include <zorba/internal/unique_ptr.h>
121+
122+namespace zorba {
123+
124+typedef internal::ztd::proxy<std::streambuf> proxy_streambuf;
125+
126+namespace transcode {
127+
128+///////////////////////////////////////////////////////////////////////////////
129+
130+/**
131+ * A %transcode::streambuf is-a std::streambuf for transcoding character
132+ * encodings from/to UTF-8 on-the-fly.
133+ *
134+ * To use it, replace a stream's streambuf:
135+ * \code
136+ * istream is;
137+ * // ...
138+ * transcode::streambuf tbuf( "ISO-8859-1", is.rdbuf() );
139+ * is.ios::rdbuf( &tbuf );
140+ * \endcode
141+ * Note that the %transcode::streambuf must exist for as long as it's being used
142+ * by the stream. If you are replacing the streabuf for a stream you did not
143+ * create, you should set it back to the original streambuf:
144+ * \code
145+ * void f( ostream &os ) {
146+ * transcode::streambuf tbuf( "ISO-8859-1", os.rdbuf() );
147+ * try {
148+ * os.ios::rdbuf( &tbuf );
149+ * // ...
150+ * }
151+ * catch ( ... ) {
152+ * os.ios::rdbuf( tbuf.orig_streambuf() );
153+ * throw;
154+ * }
155+ * }
156+ * \endcode
157+ *
158+ * While %transcode::streambuf does support seeking, the positions are relative
159+ * to the original byte stream.
160+ */
161+class ZORBA_DLL_PUBLIC streambuf : public std::streambuf {
162+public:
163+ /**
164+ * Constructs a %transcode::streambuf.
165+ *
166+ * @param charset The name of the character encoding to convert from/to.
167+ * @param orig The original streambuf to read/write from/to.
168+ * @throws std::invalid_argument if either \a charset is not supported or
169+ * \a orig is null.
170+ */
171+ streambuf( char const *charset, std::streambuf *orig );
172+
173+ /**
174+ * Destructs a %transcode::streambuf.
175+ */
176+ ~streambuf();
177+
178+ /**
179+ * Gets the original streambuf.
180+ *
181+ * @return said streambuf.
182+ */
183+ std::streambuf* orig_streambuf() const {
184+ return proxy_buf_->original();
185+ }
186+
187+protected:
188+ void imbue( std::locale const& );
189+ pos_type seekoff( off_type, std::ios_base::seekdir, std::ios_base::openmode );
190+ pos_type seekpos( pos_type, std::ios_base::openmode );
191+ std::streambuf* setbuf( char_type*, std::streamsize );
192+ std::streamsize showmanyc();
193+ int sync();
194+ int_type overflow( int_type );
195+ int_type pbackfail( int_type );
196+ int_type uflow();
197+ int_type underflow();
198+ std::streamsize xsgetn( char_type*, std::streamsize );
199+ std::streamsize xsputn( char_type const*, std::streamsize );
200+
201+private:
202+ std::unique_ptr<proxy_streambuf> proxy_buf_;
203+
204+ // forbid
205+ streambuf( streambuf const& );
206+ streambuf& operator=( streambuf const& );
207+};
208+
209+///////////////////////////////////////////////////////////////////////////////
210+
211+/**
212+ * A %transcode::stream is used to wrap a C++ standard I/O stream with a
213+ * transcode::streambuf so that transcoding and the management of the streambuf
214+ * happens automatically.
215+ *
216+ * @tparam StreamType The I/O stream class type to wrap. It must be a concrete
217+ * stream class.
218+ */
219+template<class StreamType>
220+class stream : public StreamType {
221+public:
222+ /**
223+ * Constructs a %transcode::stream.
224+ *
225+ * @param charset The name of the character encoding to convert from/to.
226+ * @throws std::invalid_argument if \a charset is not supported.
227+ */
228+ stream( char const *charset ) :
229+ tbuf_( charset, this->rdbuf() )
230+ {
231+ init();
232+ }
233+
234+ /**
235+ * Constructs a %stream.
236+ *
237+ * @tparam StreamArgType The type of the first argument of \a StreamType's
238+ * constructor.
239+ * @param charset The name of the character encoding to convert from/to.
240+ * @param stream_arg The argument to pass as the first argument to
241+ * \a StreamType's constructor.
242+ * @throws std::invalid_argument if \a charset is not supported.
243+ */
244+ template<typename StreamArgType>
245+ stream( char const *charset, StreamArgType stream_arg ) :
246+ StreamType( stream_arg ),
247+ tbuf_( charset, this->rdbuf() )
248+ {
249+ init();
250+ }
251+
252+ /**
253+ * Constructs a %transcode::stream.
254+ *
255+ * @tparam StreamArgType The type of the first argument of \a StreamType's
256+ * constructor.
257+ * @param charset The name of the character encoding to convert from/to.
258+ * @param stream_arg The argument to pass as the first argument to
259+ * \a StreamType's constructor.
260+ * @param mode The open-mode to pass to \a StreamType's constructor.
261+ * @throws std::invalid_argument if \a charset is not supported.
262+ */
263+ template<typename StreamArgType>
264+ stream( char const *charset, StreamArgType stream_arg,
265+ std::ios_base::openmode mode ) :
266+ StreamType( stream_arg, mode ),
267+ tbuf_( charset, this->rdbuf() )
268+ {
269+ init();
270+ }
271+
272+private:
273+ streambuf tbuf_;
274+
275+ void init() {
276+ this->std::ios::rdbuf( &tbuf_ );
277+ }
278+};
279+
280+///////////////////////////////////////////////////////////////////////////////
281+
282+/**
283+ * Checks whether it would be necessary to transcode from the given character
284+ * encoding to UTF-8.
285+ *
286+ * @param charset The name of the character encoding to check.
287+ * @return \c true only if it would be necessary to transcode from the given
288+ * character encoding to UTF-8.
289+ */
290+ZORBA_DLL_PUBLIC
291+bool is_necessary( char const *charset );
292+
293+/**
294+ * Checks whether the given character set is supported for transcoding.
295+ *
296+ * @param charset The name of the character encoding to check.
297+ * @return \c true only if the character encoding is supported.
298+ */
299+ZORBA_DLL_PUBLIC
300+bool is_supported( char const *charset );
301+
302+///////////////////////////////////////////////////////////////////////////////
303+
304+} // namespace transcode
305+} // namespace zorba
306+#endif /* ZORBA_TRANSCODE_STREAM_API_H */
307+/* vim:set et sw=2 ts=2: */
308
309=== modified file 'modules/ExternalModules.conf'
310--- modules/ExternalModules.conf 2012-02-16 00:52:25 +0000
311+++ modules/ExternalModules.conf 2012-02-16 02:09:20 +0000
312@@ -32,7 +32,7 @@
313 email bzr lp:zorba/email-module zorba-2.1
314 excel bzr lp:zorba/excel-module zorba-2.1
315 geo bzr lp:zorba/geo-module zorba-2.1
316-http-client bzr lp:zorba/http-client-module 1.0
317+http-client bzr lp:zorba/http-client-module
318 image bzr lp:zorba/image-module zorba-2.1
319 languages bzr lp:zorba/languages-module zorba-2.1
320 oauth bzr lp:zorba/oauth-module zorba-2.1
321
322=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq'
323--- modules/com/zorba-xquery/www/modules/http-client.xq 2011-08-26 23:36:24 +0000
324+++ modules/com/zorba-xquery/www/modules/http-client.xq 2012-02-16 02:09:20 +0000
325@@ -354,7 +354,7 @@
326 :)
327 declare %ann:nondeterministic function http:get-node($href as xs:string) as item()+
328 {
329- http:http-nondeterministic-impl(validate {<http-schema:request method="GET" href="{$href}" follow-redirect="true" override-media-type="text/xml"/>}, (), ())
330+ http:http-nondeterministic-impl(validate {<http-schema:request method="GET" href="{$href}" follow-redirect="true" override-media-type="text/xml; charset=utf-8"/>}, (), ())
331 };
332
333 (:~
334@@ -374,7 +374,7 @@
335 :)
336 declare %ann:nondeterministic function http:get-text($href as xs:string) as item()+
337 {
338- http:http-nondeterministic-impl(validate {<http-schema:request method="GET" href="{$href}" follow-redirect="true" override-media-type="text/plain"/>}, (), ())
339+ http:http-nondeterministic-impl(validate {<http-schema:request method="GET" href="{$href}" follow-redirect="true" override-media-type="text/plain; charset=utf-8"/>}, (), ())
340 };
341
342 (:~
343
344=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.cpp'
345--- modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.cpp 2011-07-29 08:12:36 +0000
346+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.cpp 2012-02-16 02:09:20 +0000
347@@ -21,6 +21,7 @@
348 #include <iostream>
349 #include <cassert>
350 #ifndef WIN32
351+#include <cerrno>
352 #include <sys/time.h>
353 #endif /* WIN32 */
354
355@@ -32,349 +33,347 @@
356 using namespace std;
357
358 namespace zorba {
359- namespace curl {
360-
361- ///////////////////////////////////////////////////////////////////////////////
362-
363+namespace curl {
364+
365+///////////////////////////////////////////////////////////////////////////////
366+
367 #define ZORBA_CURL_ASSERT(expr) \
368-do { \
369-if ( CURLcode const code##__LINE__ = (expr) ) \
370-throw exception( #expr, "", code##__LINE__ ); \
371-} while (0)
372-
373+ do { \
374+ if ( CURLcode const code##__LINE__ = (expr) ) \
375+ throw exception( #expr, "", code##__LINE__ ); \
376+ } while (0)
377+
378 #define ZORBA_CURLM_ASSERT(expr) \
379-do { \
380-if ( CURLMcode const code##__LINE__ = (expr) ) \
381-if ( code##__LINE__ != CURLM_CALL_MULTI_PERFORM ) \
382-throw exception( #expr, "", code##__LINE__ ); \
383-} while (0)
384-
385- exception::exception( char const *function, char const *uri, char const *msg ) :
386- std::exception(), theMessage(msg)
387- {
388- }
389-
390- exception::exception( char const *function, char const *uri, CURLcode code ) :
391- std::exception(), theMessage(curl_easy_strerror(code))
392- {
393- }
394-
395- exception::exception( char const *function, char const *uri, CURLMcode code ) :
396- std::exception(), theMessage(curl_multi_strerror(code))
397- {
398- }
399-
400- const char* exception::what() const throw() {
401- return theMessage;
402- }
403-
404-
405- ///////////////////////////////////////////////////////////////////////////////
406-
407- CURL* create( char const *uri, write_fn_t fn, void *data ) {
408- //
409- // Having cURL initialization wrapped by a class and using a singleton static
410- // instance guarantees that cURL is initialized exactly once before use and
411- // and also is cleaned-up at program termination (when destructors for static
412- // objects are called).
413- //
414- struct curl_initializer {
415- curl_initializer() {
416- ZORBA_CURL_ASSERT( curl_global_init( CURL_GLOBAL_ALL ) );
417- }
418- ~curl_initializer() {
419- curl_global_cleanup();
420- }
421- };
422- static curl_initializer initializer;
423-
424- CURL *const curl = curl_easy_init();
425- if ( !curl )
426- throw exception( "curl_easy_init()", uri, "" );
427-
428- try {
429- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_URL, uri ) );
430- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEDATA, data ) );
431- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, fn ) );
432-
433- // Tells cURL to follow redirects. CURLOPT_MAXREDIRS is by default set to -1
434- // thus cURL will do an infinite number of redirects.
435- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_FOLLOWLOCATION, 1 ) );
436-
437+ do { \
438+ if ( CURLMcode const code##__LINE__ = (expr) ) \
439+ if ( code##__LINE__ != CURLM_CALL_MULTI_PERFORM ) \
440+ throw exception( #expr, "", code##__LINE__ ); \
441+ } while (0)
442+
443+exception::exception( char const *function, char const *uri, char const *msg ) :
444+ std::exception(), msg_( msg )
445+{
446+}
447+
448+exception::exception( char const *function, char const *uri, CURLcode code ) :
449+ std::exception(),
450+ msg_( curl_easy_strerror( code ) )
451+{
452+}
453+
454+exception::exception( char const *function, char const *uri, CURLMcode code ) :
455+ std::exception(),
456+ msg_( curl_multi_strerror( code ) )
457+{
458+}
459+
460+exception::~exception() throw() {
461+ // out-of-line since it's virtual
462+}
463+
464+const char* exception::what() const throw() {
465+ return msg_.c_str();
466+}
467+
468+///////////////////////////////////////////////////////////////////////////////
469+
470+CURL* create( char const *uri, write_fn_t fn, void *data ) {
471+ //
472+ // Having cURL initialization wrapped by a class and using a singleton static
473+ // instance guarantees that cURL is initialized exactly once before use and
474+ // and also is cleaned-up at program termination (when destructors for static
475+ // objects are called).
476+ //
477+ struct curl_initializer {
478+ curl_initializer() {
479+ ZORBA_CURL_ASSERT( curl_global_init( CURL_GLOBAL_ALL ) );
480+ }
481+ ~curl_initializer() {
482+ curl_global_cleanup();
483+ }
484+ };
485+ static curl_initializer initializer;
486+
487+ CURL *const curl = curl_easy_init();
488+ if ( !curl )
489+ throw exception( "curl_easy_init()", uri, "" );
490+
491+ try {
492+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_URL, uri ) );
493+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEDATA, data ) );
494+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, fn ) );
495+
496+ // Tells cURL to follow redirects. CURLOPT_MAXREDIRS is by default set to -1
497+ // thus cURL will do an infinite number of redirects.
498+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_FOLLOWLOCATION, 1 ) );
499+
500 #ifndef ZORBA_VERIFY_PEER_SSL_CERTIFICATE
501- ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_SSL_VERIFYPEER, 0 ) );
502- //
503- // CURLOPT_SSL_VERIFYHOST is left default, value 2, meaning verify that the
504- // Common Name or Subject Alternate Name field in the certificate matches
505- // the name of the server.
506- //
507- // Tested with https://www.npr.org/rss/rss.php?id=1001
508- // About using SSL certs in curl: http://curl.haxx.se/docs/sslcerts.html
509+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_SSL_VERIFYPEER, 0 ) );
510+ //
511+ // CURLOPT_SSL_VERIFYHOST is left default, value 2, meaning verify that the
512+ // Common Name or Subject Alternate Name field in the certificate matches
513+ // the name of the server.
514+ //
515+ // Tested with https://www.npr.org/rss/rss.php?id=1001
516+ // About using SSL certs in curl: http://curl.haxx.se/docs/sslcerts.html
517 #else
518 # ifdef WIN32
519- // set the root CA certificates file path
520- if ( GENV.g_curl_root_CA_certificates_path[0] )
521- ZORBA_CURL_ASSERT(
522- curl_easy_setopt(
523- curl, CURLOPT_CAINFO, GENV.g_curl_root_CA_certificates_path
524- )
525- );
526+ // set the root CA certificates file path
527+ if ( GENV.g_curl_root_CA_certificates_path[0] )
528+ ZORBA_CURL_ASSERT(
529+ curl_easy_setopt(
530+ curl, CURLOPT_CAINFO, GENV.g_curl_root_CA_certificates_path
531+ )
532+ );
533 # endif /* WIN32 */
534 #endif /* ZORBA_VERIFY_PEER_SSL_CERTIFICATE */
535-
536- //
537- // Some servers don't like requests that are made without a user-agent
538- // field, so we provide one.
539- //
540- ZORBA_CURL_ASSERT(
541- curl_easy_setopt( curl, CURLOPT_USERAGENT, "libcurl-agent/1.0" )
542- );
543-
544- return curl;
545- }
546- catch ( ... ) {
547- destroy( curl );
548- throw;
549- }
550- }
551-
552- void destroy( CURL *curl ) {
553- if ( curl ) {
554- curl_easy_reset( curl );
555- curl_easy_cleanup( curl );
556- }
557- }
558-
559- ///////////////////////////////////////////////////////////////////////////////
560-
561- streambuf::streambuf() : theInformer(0), theOwnInformer(false) {
562-#ifdef WIN32
563- theDummySocket = socket(AF_INET, SOCK_DGRAM, 0);
564- if (theDummySocket == CURL_SOCKET_BAD || theDummySocket == INVALID_SOCKET) {
565- std::cerr << "creating the socket failed" << std::endl;
566- }
567-#endif
568- init();
569- }
570-
571- streambuf::streambuf( char const *uri ) : theInformer(0), theOwnInformer(false) {
572-#ifdef WIN32
573- theDummySocket = socket(AF_INET, SOCK_DGRAM, 0);
574- if (theDummySocket == CURL_SOCKET_BAD || theDummySocket == INVALID_SOCKET) {
575- std::cerr << "creating the socket failed" << std::endl;
576- }
577-#endif
578- init();
579- open( uri );
580- }
581-
582- int streambuf::multi_perform() {
583- underflow();
584- CURLMsg* msg;
585- int msgInQueue;
586- int error = 0;
587- while ((msg = curl_multi_info_read(curlm_, &msgInQueue))) {
588- if (msg->msg == CURLMSG_DONE) {
589- error = msg->data.result;
590- }
591- }
592- return error;
593- }
594-
595- streambuf::streambuf( CURL* aCurl) : theInformer(0), theOwnInformer(false) {
596-#ifdef WIN32
597- theDummySocket = socket(AF_INET, SOCK_DGRAM, 0);
598- if (theDummySocket == CURL_SOCKET_BAD || theDummySocket == INVALID_SOCKET) {
599- std::cerr << "creating the socket failed" << std::endl;
600- }
601-#endif
602- init();
603- curl_ = aCurl;
604- ZORBA_CURL_ASSERT( curl_easy_setopt( aCurl, CURLOPT_WRITEDATA, this ) );
605- ZORBA_CURL_ASSERT( curl_easy_setopt( aCurl, CURLOPT_WRITEFUNCTION, curl_write_callback ) );
606-
607- init_curlm();
608- }
609-
610- streambuf::~streambuf() {
611- free( buf_ );
612- close();
613-#ifdef WIN32
614- closesocket(theDummySocket);
615-#endif
616- // If we have been assigned memory ownership of theInformer, delete it now.
617- if (theOwnInformer)
618- delete theInformer;
619- }
620-
621- void streambuf::close() {
622- if ( curl_ ) {
623- if ( curlm_ ) {
624- curl_multi_remove_handle( curlm_, curl_ );
625- curl_multi_cleanup( curlm_ );
626- curlm_ = 0;
627- }
628- destroy( curl_ );
629- curl_ = 0;
630- }
631- }
632-
633- void streambuf::curl_read() {
634- buf_len_ = 0;
635- while ( curl_running_ && !buf_len_ ) {
636- fd_set fd_read, fd_write, fd_except;
637- FD_ZERO( &fd_read );
638- FD_ZERO( &fd_write );
639- FD_ZERO( &fd_except );
640- int max_fd = -1;
641-#ifdef WIN32
642- // Windows does not like a call to select where all arguments are 0. So
643- // we just add a dummy socket to make the call to select happy.
644- FD_SET (theDummySocket, &fd_read);
645-#endif
646- ZORBA_CURLM_ASSERT(
647- curl_multi_fdset( curlm_, &fd_read, &fd_write, &fd_except, &max_fd )
648- );
649-
650- //
651- // Note that the fopen.c sample code is unnecessary at best or wrong at
652- // worst; see: http://curl.haxx.se/mail/lib-2011-05/0011.html
653- //
654- timeval timeout;
655- long curl_timeout_ms;
656- ZORBA_CURLM_ASSERT( curl_multi_timeout( curlm_, &curl_timeout_ms ) );
657- if ( curl_timeout_ms > 0 ) {
658- timeout.tv_sec = curl_timeout_ms / 1000;
659- timeout.tv_usec = curl_timeout_ms % 1000 * 1000;
660- } else {
661- //
662- // From curl_multi_timeout(3):
663- //
664- // Note: if libcurl returns a -1 timeout here, it just means that
665- // libcurl currently has no stored timeout value. You must not wait
666- // too long (more than a few seconds perhaps) before you call
667- // curl_multi_perform() again.
668- //
669- // So we just pick some not-too-long default.
670- //
671- timeout.tv_sec = 1;
672- timeout.tv_usec = 0;
673- }
674-
675- switch ( select( max_fd + 1, &fd_read, &fd_write, &fd_except, &timeout ) ) {
676- case -1: // select error
677-#ifdef WIN32
678- std::cout << "Error = " << WSAGetLastError() << std::endl;
679-#endif
680- throw exception( "select()", "" );
681- case 0: // timeout
682- // no break;
683- default:
684- CURLMcode code;
685- do {
686- code = curl_multi_perform( curlm_, &curl_running_ );
687- } while ( code == CURLM_CALL_MULTI_PERFORM );
688- ZORBA_CURLM_ASSERT( code );
689- }
690- }
691- if (theInformer) {
692- theInformer->afterRead();
693- }
694- }
695-
696- size_t streambuf::curl_write_callback( void *ptr, size_t size, size_t nmemb,
697- void *data ) {
698- size *= nmemb;
699- streambuf *const that = static_cast<streambuf*>( data );
700-
701- std::streamoff buf_free = that->buf_capacity_ - that->buf_len_;
702- if (that->theInformer) {
703- that->theInformer->beforeRead();
704- }
705- if ( size > buf_free ) {
706- std::streamoff new_capacity = that->buf_capacity_ + size - buf_free;
707- if ( void *const new_buf = realloc( that->buf_, static_cast<size_t>(new_capacity) ) ) {
708- that->buf_ = static_cast<char*>( new_buf );
709- that->buf_capacity_ = new_capacity;
710- } else
711- throw exception( "realloc()", "" );
712- }
713- ::memcpy( that->buf_ + that->buf_len_, ptr, size );
714- that->buf_len_ += size;
715- return size;
716- }
717-
718- void streambuf::init() {
719- buf_ = 0;
720- buf_capacity_ = 0;
721- buf_len_ = 0;
722- curl_ = 0;
723- curlm_ = 0;
724- curl_running_ = 0;
725- }
726-
727- void streambuf::init_curlm() {
728- //
729- // Lie about cURL running initially so the while-loop in curl_read() will run
730- // at least once.
731- //
732- curl_running_ = 1;
733-
734- //
735- // Set the "get" pointer to the end (gptr() == egptr()) so a call to
736- // underflow() and initial data read will be triggered.
737- //
738- buf_len_ = buf_capacity_;
739- setg( buf_, buf_ + buf_len_, buf_ + buf_capacity_ );
740-
741- //
742- // Clean-up has to be done here with try/catch (as opposed to relying on the
743- // destructor) because open() can be called from the constructor. If an
744- // exception is thrown, the constructor will not have completed, hence the
745- // object will not have been fully constructed; therefore the destructor will
746- // not be called.
747- //
748- try {
749- if ( !(curlm_ = curl_multi_init()) )
750- throw exception( "curl_multi_init()", "" );
751- try {
752- ZORBA_CURLM_ASSERT( curl_multi_add_handle( curlm_, curl_ ) );
753- }
754- catch ( ... ) {
755- curl_multi_cleanup( curlm_ );
756- curlm_ = 0;
757- throw;
758- }
759- }
760- catch ( ... ) {
761- destroy( curl_ );
762- curl_ = 0;
763- throw;
764- }
765- }
766-
767- void streambuf::open( char const *uri ) {
768- curl_ = create( uri, curl_write_callback, this );
769-
770- init_curlm();
771- }
772-
773- streamsize streambuf::showmanyc() {
774- return egptr() - gptr();
775- }
776-
777- streambuf::int_type streambuf::underflow() {
778- while ( true ) {
779- if ( gptr() < egptr() )
780- return traits_type::to_int_type( *gptr() );
781- curl_read();
782- if ( !buf_len_ )
783- return traits_type::eof();
784- setg( buf_, buf_, buf_ + buf_len_ );
785- }
786- }
787-
788- ///////////////////////////////////////////////////////////////////////////////
789-
790- } // namespace curl
791+
792+ //
793+ // Some servers don't like requests that are made without a user-agent
794+ // field, so we provide one.
795+ //
796+ ZORBA_CURL_ASSERT(
797+ curl_easy_setopt( curl, CURLOPT_USERAGENT, "libcurl-agent/1.0" )
798+ );
799+
800+ return curl;
801+ }
802+ catch ( ... ) {
803+ destroy( curl );
804+ throw;
805+ }
806+}
807+
808+void destroy( CURL *curl ) {
809+ if ( curl ) {
810+ curl_easy_reset( curl );
811+ curl_easy_cleanup( curl );
812+ }
813+}
814+
815+///////////////////////////////////////////////////////////////////////////////
816+
817+streambuf::streambuf() {
818+ init();
819+}
820+
821+streambuf::streambuf( char const *uri ) {
822+ init();
823+ open( uri );
824+}
825+
826+streambuf::streambuf( CURL *curl ) {
827+ init();
828+ curl_ = curl;
829+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEDATA, this ) );
830+ ZORBA_CURL_ASSERT( curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, curl_write_callback ) );
831+ init_curlm();
832+}
833+
834+streambuf::~streambuf() {
835+ free( buf_ );
836+ close();
837+#ifdef WIN32
838+ closesocket( dummy_socket_ );
839+#endif
840+ // If we have been assigned memory ownership of theInformer, delete it now.
841+ if ( theOwnInformer )
842+ delete theInformer;
843+}
844+
845+void streambuf::close() {
846+ if ( curl_ ) {
847+ if ( curlm_ ) {
848+ curl_multi_remove_handle( curlm_, curl_ );
849+ curl_multi_cleanup( curlm_ );
850+ curlm_ = 0;
851+ }
852+ destroy( curl_ );
853+ curl_ = 0;
854+ }
855+}
856+
857+void streambuf::curl_read() {
858+ buf_len_ = 0;
859+ while ( curl_running_ && !buf_len_ ) {
860+ fd_set fd_read, fd_write, fd_except;
861+ FD_ZERO( &fd_read );
862+ FD_ZERO( &fd_write );
863+ FD_ZERO( &fd_except );
864+ int max_fd = -1;
865+#ifdef WIN32
866+ //
867+ // Windows does not like a call to select where all arguments are 0, so we
868+ // just add a dummy socket to make the call to select happy.
869+ //
870+ FD_SET( dummy_socket_, &fd_read );
871+#endif /* WIN32 */
872+ ZORBA_CURLM_ASSERT(
873+ curl_multi_fdset( curlm_, &fd_read, &fd_write, &fd_except, &max_fd )
874+ );
875+
876+ //
877+ // Note that the fopen.c sample code is unnecessary at best or wrong at
878+ // worst; see: http://curl.haxx.se/mail/lib-2011-05/0011.html
879+ //
880+ timeval timeout;
881+ long curl_timeout_ms;
882+ ZORBA_CURLM_ASSERT( curl_multi_timeout( curlm_, &curl_timeout_ms ) );
883+ if ( curl_timeout_ms > 0 ) {
884+ timeout.tv_sec = curl_timeout_ms / 1000;
885+ timeout.tv_usec = curl_timeout_ms % 1000 * 1000;
886+ } else {
887+ //
888+ // From curl_multi_timeout(3):
889+ //
890+ // Note: if libcurl returns a -1 timeout here, it just means that
891+ // libcurl currently has no stored timeout value. You must not wait
892+ // too long (more than a few seconds perhaps) before you call
893+ // curl_multi_perform() again.
894+ //
895+ // So we just pick some not-too-long default.
896+ //
897+ timeout.tv_sec = 1;
898+ timeout.tv_usec = 0;
899+ }
900+
901+ switch ( select( max_fd + 1, &fd_read, &fd_write, &fd_except, &timeout ) ) {
902+ case -1: // select error
903+#ifdef WIN32
904+ char err_buf[8];
905+ sprintf( err_buf, "%d", WSAGetLastError() );
906+ throw exception( "select()", "", err_buf );
907+#else
908+ throw exception( "select()", "", strerror( errno ) );
909+#endif
910+ case 0: // timeout
911+ // no break;
912+ default:
913+ CURLMcode code;
914+ do {
915+ code = curl_multi_perform( curlm_, &curl_running_ );
916+ } while ( code == CURLM_CALL_MULTI_PERFORM );
917+ ZORBA_CURLM_ASSERT( code );
918+ }
919+ }
920+ if ( theInformer )
921+ theInformer->afterRead();
922+}
923+
924+size_t streambuf::curl_write_callback( void *ptr, size_t size, size_t nmemb,
925+ void *data ) {
926+ size *= nmemb;
927+ streambuf *const that = static_cast<streambuf*>( data );
928+
929+ if ( that->theInformer )
930+ that->theInformer->beforeRead();
931+
932+ size_t const buf_free = that->buf_capacity_ - that->buf_len_;
933+ if ( size > buf_free ) {
934+ streamoff new_capacity = that->buf_capacity_ + size - buf_free;
935+ if ( void *const new_buf =
936+ realloc( that->buf_, static_cast<size_t>( new_capacity ) ) ) {
937+ that->buf_ = static_cast<char*>( new_buf );
938+ that->buf_capacity_ = new_capacity;
939+ } else
940+ throw exception( "realloc()", "" );
941+ }
942+ ::memcpy( that->buf_ + that->buf_len_, ptr, size );
943+ that->buf_len_ += size;
944+ return size;
945+}
946+
947+void streambuf::init() {
948+ buf_ = 0;
949+ buf_capacity_ = 0;
950+ buf_len_ = 0;
951+ curl_ = 0;
952+ curlm_ = 0;
953+ curl_running_ = 0;
954+ theInformer = 0;
955+ theOwnInformer = false;
956+#ifdef WIN32
957+ dummy_socket_ = socket( AF_INET, SOCK_DGRAM, 0 );
958+ if ( dummy_socket_ == CURL_SOCKET_BAD || dummy_socket_ == INVALID_SOCKET )
959+ throw exception( "socket()", "" );
960+#endif /* WIN32 */
961+}
962+
963+void streambuf::init_curlm() {
964+ //
965+ // Lie about cURL running initially so the while-loop in curl_read() will run
966+ // at least once.
967+ //
968+ curl_running_ = 1;
969+
970+ //
971+ // Set the "get" pointer to the end (gptr() == egptr()) so a call to
972+ // underflow() and initial data read will be triggered.
973+ //
974+ buf_len_ = buf_capacity_;
975+ setg( buf_, buf_ + buf_len_, buf_ + buf_capacity_ );
976+
977+ //
978+ // Clean-up has to be done here with try/catch (as opposed to relying on the
979+ // destructor) because open() can be called from the constructor. If an
980+ // exception is thrown, the constructor will not have completed, hence the
981+ // object will not have been fully constructed; therefore the destructor will
982+ // not be called.
983+ //
984+ try {
985+ if ( !(curlm_ = curl_multi_init()) )
986+ throw exception( "curl_multi_init()", "" );
987+ try {
988+ ZORBA_CURLM_ASSERT( curl_multi_add_handle( curlm_, curl_ ) );
989+ }
990+ catch ( ... ) {
991+ curl_multi_cleanup( curlm_ );
992+ curlm_ = 0;
993+ throw;
994+ }
995+ }
996+ catch ( ... ) {
997+ destroy( curl_ );
998+ curl_ = 0;
999+ throw;
1000+ }
1001+}
1002+
1003+int streambuf::multi_perform() {
1004+ underflow();
1005+ CURLMsg *msg;
1006+ int msgInQueue;
1007+ int error = 0;
1008+ while ( (msg = curl_multi_info_read( curlm_, &msgInQueue )) ) {
1009+ if ( msg->msg == CURLMSG_DONE )
1010+ error = msg->data.result;
1011+ }
1012+ return error;
1013+}
1014+
1015+void streambuf::open( char const *uri ) {
1016+ curl_ = create( uri, curl_write_callback, this );
1017+
1018+ init_curlm();
1019+}
1020+
1021+streamsize streambuf::showmanyc() {
1022+ return egptr() - gptr();
1023+}
1024+
1025+streambuf::int_type streambuf::underflow() {
1026+ while ( true ) {
1027+ if ( gptr() < egptr() )
1028+ return traits_type::to_int_type( *gptr() );
1029+ curl_read();
1030+ if ( !buf_len_ )
1031+ return traits_type::eof();
1032+ setg( buf_, buf_, buf_ + buf_len_ );
1033+ }
1034+}
1035+
1036+///////////////////////////////////////////////////////////////////////////////
1037+
1038+} // namespace curl
1039 } // namespace zorba
1040+/* vim:set et sw=2 ts=2: */
1041
1042=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.h'
1043--- modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.h 2011-07-29 08:12:36 +0000
1044+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.h 2012-02-16 02:09:20 +0000
1045@@ -19,154 +19,175 @@
1046
1047 #include <zorba/config.h>
1048
1049+#include <exception>
1050 #include <istream>
1051-#include <exception>
1052 #include <streambuf>
1053+#include <string>
1054 #include <curl/curl.h>
1055
1056 namespace zorba {
1057-
1058- namespace http_client {
1059- class InformDataRead;
1060- }
1061-
1062- namespace curl {
1063-
1064- class exception : public std::exception {
1065- public:
1066- exception( char const *function, char const *uri, char const *msg = 0 );
1067- exception( char const *function, char const *uri, CURLcode code );
1068- exception( char const *function, char const *uri, CURLMcode code );
1069- public:
1070- virtual const char* what() const throw();
1071- private:
1072- const char* theMessage;
1073- };
1074-
1075-
1076-
1077- ////////// create & destroy ///////////////////////////////////////////////////
1078-
1079- /**
1080- * The signature type of cURL's write function callback.
1081- */
1082- typedef size_t (*write_fn_t)( void*, size_t, size_t, void* );
1083-
1084- /**
1085- * Creates a new, initialized cURL instance.
1086- *
1087- * @throws exception upon failure.
1088- */
1089- CURL* create( char const *uri, write_fn_t fn, void *data );
1090-
1091- /**
1092- * Destroys a cURL instance.
1093- *
1094- * @param instance A cURL instance. If \c NULL, does nothing.
1095- */
1096- void destroy( CURL *instance );
1097-
1098- ////////// streambuf //////////////////////////////////////////////////////////
1099-
1100- /**
1101- * A curl::streambuf is-a std::streambuf for streaming the contents of URI
1102- * using cURL. However, do not use this class directly. Use uri::streambuf
1103- * instead.
1104- */
1105- class streambuf : public std::streambuf {
1106- public:
1107- /**
1108- * Constructs a %streambuf.
1109- */
1110- streambuf();
1111-
1112- /**
1113- * Constructs a %streambuf and opens a connection to the server hosting the
1114- * given URI for subsequent streaming.
1115- *
1116- * @param uri The URI to stream.
1117- */
1118- streambuf( char const *uri );
1119-
1120- /**
1121- * In case we already have a curl object, which was set up somewhere else, we
1122- * take it here as an arument. This takes ownership over the object.
1123- */
1124- streambuf( CURL* aCurl );
1125-
1126- /**
1127- * Destroys a %streambuf.
1128- */
1129- ~streambuf();
1130-
1131- /**
1132- * Opens a connection to the server hosting the given URI for subsequent
1133- * streaming.
1134- *
1135- * @param uri The URI to stream.
1136- * @throws exception upon failure.
1137- */
1138- void open( char const *uri );
1139-
1140- /**
1141- * Tests whether the buffer is open.
1142- *
1143- * @return Returns \c true only if the buffer is open.
1144- */
1145- bool is_open() const {
1146- return !!curl_;
1147- }
1148-
1149- /**
1150- * Closes this %streambuf.
1151- */
1152- void close();
1153-
1154- /**
1155- * Provide a InformDataRead that will get callbacks about read events.
1156- */
1157- void setInformer(::zorba::http_client::InformDataRead* aInformer) { theInformer = aInformer; }
1158-
1159- /**
1160- * Specify whether this streambuf has memory ownership over the
1161- * InformDataRead it has been passed. You can use this if, for example,
1162- * the lifetime of the streambuf will extend past the lifetime of the
1163- * object which created the InformDataRead.
1164- */
1165- void setOwnInformer(bool aOwnInformer) { theOwnInformer = aOwnInformer; }
1166-
1167- int multi_perform();
1168-
1169- protected:
1170- // inherited
1171- std::streamsize showmanyc();
1172- int_type underflow();
1173-
1174- private:
1175- void curl_read();
1176- static size_t curl_write_callback( void*, size_t, size_t, void* );
1177-
1178- void init();
1179- void init_curlm();
1180-
1181- char *buf_;
1182- std::streamsize buf_capacity_;
1183- std::streamoff buf_len_;
1184-
1185- CURL *curl_;
1186- CURLM *curlm_;
1187- int curl_running_;
1188- ::zorba::http_client::InformDataRead* theInformer;
1189- bool theOwnInformer;
1190-
1191- // forbid
1192- streambuf( streambuf const& );
1193- streambuf& operator=( streambuf const& );
1194+
1195+namespace http_client {
1196+ class InformDataRead;
1197+}
1198+
1199+namespace curl {
1200+
1201+///////////////////////////////////////////////////////////////////////////////
1202+
1203+class exception : public std::exception {
1204+public:
1205+ exception( char const *function, char const *uri, char const *msg = 0 );
1206+ exception( char const *function, char const *uri, CURLcode code );
1207+ exception( char const *function, char const *uri, CURLMcode code );
1208+ ~exception() throw();
1209+
1210+ virtual const char* what() const throw();
1211+
1212+private:
1213+ std::string msg_;
1214+};
1215+
1216+////////// create & destroy ///////////////////////////////////////////////////
1217+
1218+/**
1219+ * The signature type of cURL's write function callback.
1220+ */
1221+typedef size_t (*write_fn_t)( void*, size_t, size_t, void* );
1222+
1223+/**
1224+ * Creates a new, initialized cURL instance.
1225+ *
1226+ * @throws exception upon failure.
1227+ */
1228+CURL* create( char const *uri, write_fn_t fn, void *data );
1229+
1230+/**
1231+ * Destroys a cURL instance.
1232+ *
1233+ * @param instance A cURL instance. If \c NULL, does nothing.
1234+ */
1235+void destroy( CURL *instance );
1236+
1237+////////// streambuf //////////////////////////////////////////////////////////
1238+
1239+/**
1240+ * A curl::streambuf is-a std::streambuf for streaming the contents of URI
1241+ * using cURL. However, do not use this class directly. Use uri::streambuf
1242+ * instead.
1243+ */
1244+class streambuf : public std::streambuf {
1245+public:
1246+ /**
1247+ * Constructs a %streambuf.
1248+ */
1249+ streambuf();
1250+
1251+ /**
1252+ * Constructs a %streambuf and opens a connection to the server hosting the
1253+ * given URI for subsequent streaming.
1254+ *
1255+ * @param uri The URI to stream.
1256+ */
1257+ streambuf( char const *uri );
1258+
1259+ /**
1260+ * Constructs a %streambuf using an existing CURL object.
1261+ *
1262+ * @param curl The CURL object to use. This %streambuf takes ownership of
1263+ * it.
1264+ */
1265+ streambuf( CURL *curl );
1266+
1267+ /**
1268+ * Destroys a %streambuf.
1269+ */
1270+ ~streambuf();
1271+
1272+ /**
1273+ * Opens a connection to the server hosting the given URI for subsequent
1274+ * streaming.
1275+ *
1276+ * @param uri The URI to stream.
1277+ * @throws exception upon failure.
1278+ */
1279+ void open( char const *uri );
1280+
1281+ /**
1282+ * Tests whether the buffer is open.
1283+ *
1284+ * @return Returns \c true only if the buffer is open.
1285+ */
1286+ bool is_open() const {
1287+ return !!curl_;
1288+ }
1289+
1290+ /**
1291+ * Closes this %streambuf.
1292+ */
1293+ void close();
1294+
1295+ /**
1296+ * Gets the CURL object in use.
1297+ *
1298+ * @return Return said CURL object.
1299+ */
1300+ CURL* curl() const {
1301+ return curl_;
1302+ }
1303+
1304+ /**
1305+ * Provide a InformDataRead that will get callbacks about read events.
1306+ */
1307+ void setInformer( http_client::InformDataRead *aInformer ) {
1308+ theInformer = aInformer;
1309+ }
1310+
1311+ /**
1312+ * Specify whether this streambuf has memory ownership over the
1313+ * InformDataRead it has been passed. You can use this if, for example,
1314+ * the lifetime of the streambuf will extend past the lifetime of the
1315+ * object which created the InformDataRead.
1316+ */
1317+ void setOwnInformer( bool aOwnInformer ) {
1318+ theOwnInformer = aOwnInformer;
1319+ }
1320+
1321+ int multi_perform();
1322+
1323+protected:
1324+ // inherited
1325+ std::streamsize showmanyc();
1326+ int_type underflow();
1327+
1328+private:
1329+ void curl_read();
1330+ static size_t curl_write_callback( void*, size_t, size_t, void* );
1331+
1332+ void init();
1333+ void init_curlm();
1334+
1335+ char *buf_;
1336+ std::streamsize buf_capacity_;
1337+ std::streamoff buf_len_;
1338+
1339+ CURL *curl_;
1340+ CURLM *curlm_;
1341+ int curl_running_;
1342+ http_client::InformDataRead *theInformer;
1343+ bool theOwnInformer;
1344+
1345+ // forbid
1346+ streambuf( streambuf const& );
1347+ streambuf& operator=( streambuf const& );
1348 #ifdef WIN32
1349- SOCKET theDummySocket;
1350-#endif
1351- };
1352-
1353- } // namespace curl
1354+ SOCKET dummy_socket_;
1355+#endif /* WIN32 */
1356+};
1357+
1358+///////////////////////////////////////////////////////////////////////////////
1359+
1360+} // namespace curl
1361 } // namespace zorba
1362 #endif /* ZORBA_CURL_UTIL_H */
1363+/* vim:set et sw=2 ts=2: */
1364
1365=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.cpp'
1366--- modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.cpp 2011-07-29 08:12:36 +0000
1367+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.cpp 2012-02-16 02:09:20 +0000
1368@@ -26,12 +26,44 @@
1369 #include <zorba/error.h>
1370 #include <zorba/xquery_exception.h>
1371 #include <zorba/xquery_functions.h>
1372+#include <zorba/transcode_stream.h>
1373
1374 #include "http_response_parser.h"
1375 #include "http_request_handler.h"
1376 #include "curl_stream_buffer.h"
1377
1378-namespace zorba { namespace http_client {
1379+namespace zorba {
1380+
1381+static bool parse_content_type( std::string const &s, std::string *mime_type,
1382+ std::string *charset ) {
1383+ std::string::size_type pos = s.find( ';' );
1384+ *mime_type = s.substr( 0, pos );
1385+
1386+ if ( pos != std::string::npos ) {
1387+ //
1388+ // Parse: charset="?XXXXX"?[ (comment)]
1389+ //
1390+ if ( (pos = s.find( '=' )) != std::string::npos ) {
1391+ std::string t = s.substr( pos + 1 );
1392+ if ( !t.empty() ) {
1393+ if ( t[0] == '"' ) {
1394+ t.erase( 0, 1 );
1395+ if ( (pos = t.find( '"' )) != std::string::npos )
1396+ t.erase( pos );
1397+ } else {
1398+ if ( (pos = t.find( ' ' )) != std::string::npos )
1399+ t.erase( pos );
1400+ }
1401+ *charset = t;
1402+ }
1403+ }
1404+ } else {
1405+ // The HTTP/1.1 spec says that the default charset is ISO-8859-1.
1406+ *charset = "ISO-8859-1";
1407+ }
1408+}
1409+
1410+namespace http_client {
1411
1412 HttpResponseParser::HttpResponseParser(RequestHandler& aHandler, CURL* aCurl,
1413 ErrorThrower& aErrorThrower,
1414@@ -60,19 +92,30 @@
1415 if (lCode)
1416 return lCode;
1417 if (!theStatusOnly) {
1418- std::auto_ptr<std::istream> lStream(new std::istream(theStreamBuffer));
1419+
1420+ if (!theOverridenContentType.empty()) {
1421+ parse_content_type(
1422+ theOverridenContentType, &theCurrentContentType, &theCurrentCharset
1423+ );
1424+ }
1425+
1426+ std::auto_ptr<std::istream> lStream;
1427+ if ( transcode::is_necessary( theCurrentCharset.c_str() ) ) {
1428+ lStream.reset(
1429+ new transcode::stream<std::istream>(
1430+ theCurrentCharset.c_str(), theStreamBuffer
1431+ )
1432+ );
1433+ } else
1434+ lStream.reset(new std::istream(theStreamBuffer));
1435+
1436 Item lItem;
1437- if (theOverridenContentType != "") {
1438- theCurrentContentType = theOverridenContentType;
1439- }
1440 if (theCurrentContentType == "text/xml" ||
1441 theCurrentContentType == "application/xml" ||
1442 theCurrentContentType == "text/xml-external-parsed-entity" ||
1443 theCurrentContentType == "application/xml-external-parsed-entity" ||
1444 theCurrentContentType.find("+xml") == theCurrentContentType.size()-4) {
1445 lItem = createXmlItem(*lStream.get());
1446- } else if (theCurrentContentType.find("text/html") == 0) {
1447- lItem = createTextItem(lStream.release());
1448 } else if (theCurrentContentType.find("text/") == 0) {
1449 lItem = createTextItem(lStream.release());
1450 } else {
1451@@ -106,8 +149,8 @@
1452 }
1453 theInsideRead = true;
1454 theHandler.beginResponse(theStatus, theMessage);
1455- std::vector<std::pair<std::string, std::string> >::iterator lIter;
1456- for (lIter = theHeaders.begin(); lIter != theHeaders.end(); ++lIter) {
1457+ for ( headers_type::const_iterator
1458+ lIter = theHeaders.begin(); lIter != theHeaders.end(); ++lIter) {
1459 theHandler.header(lIter->first, lIter->second);
1460 }
1461 if (!theStatusOnly)
1462@@ -120,23 +163,20 @@
1463
1464 void HttpResponseParser::registerHandler()
1465 {
1466- curl_easy_setopt(theCurl, CURLOPT_HEADERFUNCTION,
1467- &HttpResponseParser::headerfunction);
1468+ curl_easy_setopt(theCurl, CURLOPT_HEADERFUNCTION, &curl_headerfunction);
1469 curl_easy_setopt(theCurl, CURLOPT_HEADERDATA, this);
1470 }
1471
1472- size_t HttpResponseParser::headerfunction(void *ptr,
1473- size_t size,
1474- size_t nmemb,
1475- void *stream)
1476+ size_t HttpResponseParser::curl_headerfunction( void *ptr, size_t size,
1477+ size_t nmemb, void *data )
1478 {
1479 size_t lSize = size*nmemb;
1480 size_t lResult = lSize;
1481- HttpResponseParser* lParser = static_cast<HttpResponseParser*>(stream);
1482+ HttpResponseParser* lParser = static_cast<HttpResponseParser*>(data);
1483 if (lParser->theInsideRead) {
1484 lParser->theHandler.endBody();
1485+ lParser->theInsideRead = false;
1486 }
1487- lParser->theInsideRead = false;
1488 const char* lDataChar = (const char*) ptr;
1489 while (lSize != 0 && (lDataChar[lSize - 1] == 10
1490 || lDataChar[lSize - 1] == 13)) {
1491@@ -173,7 +213,9 @@
1492 }
1493 String lNameS = fn::lower_case( lName );
1494 if (lNameS == "content-type") {
1495- lParser->theCurrentContentType = lValue.substr(0, lValue.find(';'));
1496+ parse_content_type(
1497+ lValue, &lParser->theCurrentContentType, &lParser->theCurrentCharset
1498+ );
1499 } else if (lNameS == "content-id") {
1500 lParser->theId = lValue;
1501 } else if (lNameS == "content-description") {
1502@@ -184,7 +226,7 @@
1503 return lResult;
1504 }
1505
1506- void HttpResponseParser::parseStatusAndMessage(std::string aHeader)
1507+ void HttpResponseParser::parseStatusAndMessage(std::string const &aHeader)
1508 {
1509 std::string::size_type lPos = aHeader.find(' ');
1510 assert(lPos != std::string::npos);
1511@@ -215,7 +257,12 @@
1512 static void streamReleaser(std::istream* aStream)
1513 {
1514 // This istream contains our curl stream buffer, so we have to delete it too
1515- delete aStream->rdbuf();
1516+ std::streambuf *const sbuf = aStream->rdbuf();
1517+ if ( transcode::streambuf *tbuf =
1518+ dynamic_cast<transcode::streambuf*>( sbuf ) )
1519+ delete tbuf->orig_streambuf();
1520+ else
1521+ delete sbuf;
1522 delete aStream;
1523 }
1524
1525@@ -265,4 +312,7 @@
1526 return Item();
1527 }
1528 }
1529-}}
1530+
1531+} // namespace http_client
1532+} // namespace zorba
1533+/* vim:set et sw=2 ts=2: */
1534
1535=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.h'
1536--- modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.h 2011-07-29 08:12:36 +0000
1537+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.h 2012-02-16 02:09:20 +0000
1538@@ -31,6 +31,7 @@
1539 namespace curl {
1540 class streambuf;
1541 }
1542+
1543 namespace http_client {
1544 class RequestHandler;
1545
1546@@ -40,7 +41,9 @@
1547 CURL* theCurl;
1548 ErrorThrower& theErrorThrower;
1549 std::string theCurrentContentType;
1550- std::vector<std::pair<std::string, std::string> > theHeaders;
1551+ std::string theCurrentCharset;
1552+ typedef std::vector<std::pair<std::string, std::string> > headers_type;
1553+ headers_type theHeaders;
1554 int theStatus;
1555 std::string theMessage;
1556 zorba::curl::streambuf* theStreamBuffer;
1557@@ -74,15 +77,16 @@
1558 virtual void afterRead();
1559 private:
1560 void registerHandler();
1561- void parseStatusAndMessage(std::string aHeader);
1562+ void parseStatusAndMessage(std::string const &aHeader);
1563 Item createXmlItem(std::istream& aStream);
1564 Item createHtmlItem(std::istream& aStream);
1565 Item createTextItem(std::istream* aStream);
1566 Item createBase64Item(std::istream& aStream);
1567- public: //Handler
1568- static size_t headerfunction( void *ptr, size_t size, size_t nmemb,
1569- void *stream);
1570+
1571+ static size_t curl_headerfunction( void*, size_t, size_t, void* );
1572 };
1573-}} // namespace zorba, http_client
1574+
1575+} // namespace http_client
1576+} // namespace zorba
1577
1578 #endif //HTTP_RESPONSE_PARSER_H
1579
1580=== modified file 'modules/com/zorba-xquery/www/modules/pregenerated/errors.xq'
1581--- modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2012-01-26 01:35:11 +0000
1582+++ modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2012-02-16 02:09:20 +0000
1583@@ -81,6 +81,10 @@
1584
1585 (:~
1586 :)
1587+declare variable $zerr:ZXQP0006 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP0006");
1588+
1589+(:~
1590+:)
1591 declare variable $zerr:ZXQP0007 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP0007");
1592
1593 (:~
1594@@ -664,6 +668,10 @@
1595
1596 (:~
1597 :)
1598+declare variable $zerr:ZOSE0006 as xs:QName := fn:QName($zerr:NS, "zerr:ZOSE0006");
1599+
1600+(:~
1601+:)
1602 declare variable $zerr:ZSTR0001 as xs:QName := fn:QName($zerr:NS, "zerr:ZSTR0001");
1603
1604 (:~
1605
1606=== modified file 'modules/org/expath/ns/file.xq.src/file.cpp'
1607--- modules/org/expath/ns/file.xq.src/file.cpp 2011-07-22 08:12:31 +0000
1608+++ modules/org/expath/ns/file.xq.src/file.cpp 2012-02-16 02:09:20 +0000
1609@@ -28,6 +28,7 @@
1610 #include <zorba/singleton_item_sequence.h>
1611 #include <zorba/util/path.h>
1612 #include <zorba/user_exception.h>
1613+#include <zorba/transcode_stream.h>
1614
1615 #include "file_module.h"
1616
1617@@ -188,6 +189,7 @@
1618 {
1619 String lFileStr = getFilePathString(aArgs, 0);
1620 File_t lFile = File::createFile(lFileStr.c_str());
1621+ String lEncoding("UTF-8");
1622
1623 // preconditions
1624 if (!lFile->exists()) {
1625@@ -198,18 +200,30 @@
1626 }
1627
1628 if (aArgs.size() == 2) {
1629- // since Zorba currently only supports UTF-8 we only call this function
1630- // to reject any other encoding requested bu the user
1631- getEncodingArg(aArgs, 1);
1632+ lEncoding = getEncodingArg(aArgs, 1);
1633 }
1634
1635- std::auto_ptr<StreamableItemSequence> lSeq(new StreamableItemSequence());
1636- lFile->openInputStream(*lSeq->theStream, false, true);
1637-
1638- lSeq->theItem = theModule->getItemFactory()->createStreamableString(
1639- *lSeq->theStream, &StreamableItemSequence::streamReleaser);
1640-
1641- return ItemSequence_t(lSeq.release());
1642+ zorba::Item lResult;
1643+ std::unique_ptr<std::ifstream> lInStream;
1644+ if ( transcode::is_necessary( lEncoding.c_str() ) )
1645+ {
1646+ try {
1647+ lInStream.reset( new transcode::stream<std::ifstream>(lEncoding.c_str()) );
1648+ } catch (std::invalid_argument const& e)
1649+ {
1650+ raiseFileError("FOFL0006", "Unsupported encoding", lEncoding.c_str());
1651+ }
1652+ }
1653+ else
1654+ {
1655+ lInStream.reset( new std::ifstream() );
1656+ }
1657+ lFile->openInputStream(*lInStream.get(), false, true);
1658+ lResult = theModule->getItemFactory()->createStreamableString(
1659+ *lInStream.release(), &FileModule::streamReleaser
1660+ );
1661+ return ItemSequence_t(new SingletonItemSequence(lResult));
1662+
1663 }
1664
1665 //*****************************************************************************
1666@@ -722,3 +736,4 @@
1667 extern "C" DLL_EXPORT zorba::ExternalModule* createModule() {
1668 return new zorba::filemodule::FileModule();
1669 }
1670+/* vim:set et sw=2 ts=2: */
1671
1672=== modified file 'modules/org/expath/ns/file.xq.src/file_function.cpp'
1673--- modules/org/expath/ns/file.xq.src/file_function.cpp 2011-07-13 01:56:45 +0000
1674+++ modules/org/expath/ns/file.xq.src/file_function.cpp 2012-02-16 02:09:20 +0000
1675@@ -141,11 +141,6 @@
1676 arg_iter->close();
1677 }
1678
1679- if (!(lEncoding == "UTF-8" || lEncoding == "UTF8")) {
1680- // the rest are not supported encodings
1681- raiseFileError("FOFL0006", "Unsupported encoding", lEncoding.c_str());
1682- }
1683-
1684 return lEncoding;
1685 }
1686
1687
1688=== modified file 'modules/org/expath/ns/file.xq.src/file_function.h'
1689--- modules/org/expath/ns/file.xq.src/file_function.h 2011-07-22 08:12:31 +0000
1690+++ modules/org/expath/ns/file.xq.src/file_function.h 2012-02-16 02:09:20 +0000
1691@@ -25,7 +25,9 @@
1692
1693 #include <fstream>
1694
1695-namespace zorba { namespace filemodule {
1696+namespace zorba {
1697+
1698+ namespace filemodule {
1699
1700 class FileModule;
1701
1702@@ -136,18 +138,12 @@
1703 next(Item& aResult);
1704 };
1705
1706- Item theItem;
1707- std::ifstream* theStream;
1708+ Item theItem;
1709+ std::ifstream* theStream;
1710
1711 StreamableItemSequence()
1712 : theStream(new std::ifstream()) {}
1713
1714- static void
1715- streamReleaser(std::istream* stream)
1716- {
1717- delete stream;
1718- }
1719-
1720 Iterator_t getIterator()
1721 {
1722 return new InternalIterator(this);
1723
1724=== modified file 'modules/org/expath/ns/file.xq.src/file_module.cpp'
1725--- modules/org/expath/ns/file.xq.src/file_module.cpp 2011-06-08 18:37:56 +0000
1726+++ modules/org/expath/ns/file.xq.src/file_module.cpp 2012-02-16 02:09:20 +0000
1727@@ -17,11 +17,10 @@
1728 #include "file.h"
1729 #include "file_module.h"
1730 #include "file_function.h"
1731+#include <cassert>
1732
1733 namespace zorba { namespace filemodule {
1734
1735- ItemFactory* FileModule::theFactory = 0;
1736-
1737 const char* FileModule::theNamespace = "http://expath.org/ns/file";
1738
1739
1740@@ -39,9 +38,7 @@
1741 {
1742 ExternalFunction*& lFunc = theFunctions[aLocalname];
1743 if (!lFunc) {
1744- if (1 == 0) {
1745-
1746- } else if (aLocalname == "create-directory") {
1747+ if (aLocalname == "create-directory") {
1748 lFunc = new CreateDirectoryFunction(this);
1749 } else if (aLocalname == "delete-file-impl") {
1750 lFunc = new DeleteFileImplFunction(this);
1751
1752=== modified file 'modules/org/expath/ns/file.xq.src/file_module.h'
1753--- modules/org/expath/ns/file.xq.src/file_module.h 2011-06-08 18:37:56 +0000
1754+++ modules/org/expath/ns/file.xq.src/file_module.h 2012-02-16 02:09:20 +0000
1755@@ -27,7 +27,7 @@
1756 class FileModule : public ExternalModule
1757 {
1758 private:
1759- static ItemFactory* theFactory;
1760+ mutable ItemFactory* theFactory;
1761
1762 public:
1763 static const char* theNamespace;
1764@@ -43,10 +43,17 @@
1765 };
1766
1767 typedef std::map<String, ExternalFunction*, ltstr> FuncMap_t;
1768-
1769 FuncMap_t theFunctions;
1770-
1771+
1772 public:
1773+ static void
1774+ streamReleaser(std::istream* stream)
1775+ {
1776+ delete stream;
1777+ }
1778+
1779+ FileModule() : theFactory(0) {}
1780+
1781 virtual ~FileModule();
1782
1783 virtual String
1784@@ -58,10 +65,10 @@
1785 virtual void
1786 destroy();
1787
1788- static ItemFactory*
1789- getItemFactory()
1790+ ItemFactory*
1791+ getItemFactory() const
1792 {
1793- if(!theFactory)
1794+ if (!theFactory)
1795 {
1796 theFactory = Zorba::getInstance(0)->getItemFactory();
1797 }
1798
1799=== modified file 'src/api/CMakeLists.txt'
1800--- src/api/CMakeLists.txt 2011-08-31 13:17:59 +0000
1801+++ src/api/CMakeLists.txt 2012-02-16 02:09:20 +0000
1802@@ -55,6 +55,7 @@
1803 zorba_functions.cpp
1804 annotationimpl.cpp
1805 auditimpl.cpp
1806+ transcode_streambuf.cpp
1807 )
1808
1809 IF (NOT ZORBA_NO_FULL_TEXT)
1810
1811=== added file 'src/api/transcode_streambuf.cpp'
1812--- src/api/transcode_streambuf.cpp 1970-01-01 00:00:00 +0000
1813+++ src/api/transcode_streambuf.cpp 2012-02-16 02:09:20 +0000
1814@@ -0,0 +1,102 @@
1815+/*
1816+ * Copyright 2006-2008 The FLWOR Foundation.
1817+ *
1818+ * Licensed under the Apache License, Version 2.0 (the "License");
1819+ * you may not use this file except in compliance with the License.
1820+ * You may obtain a copy of the License at
1821+ *
1822+ * http://www.apache.org/licenses/LICENSE-2.0
1823+ *
1824+ * Unless required by applicable law or agreed to in writing, software
1825+ * distributed under the License is distributed on an "AS IS" BASIS,
1826+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1827+ * See the License for the specific language governing permissions and
1828+ * limitations under the License.
1829+ */
1830+
1831+#include <zorba/transcode_stream.h>
1832+
1833+#include "util/transcode_streambuf.h"
1834+
1835+using namespace std;
1836+
1837+namespace zorba {
1838+namespace transcode {
1839+
1840+///////////////////////////////////////////////////////////////////////////////
1841+
1842+streambuf::streambuf( char const *charset, std::streambuf *orig ) :
1843+ proxy_buf_( new internal::transcode::streambuf( charset, orig ) )
1844+{
1845+}
1846+
1847+streambuf::~streambuf() {
1848+ // out-of-line since it's virtual
1849+}
1850+
1851+void streambuf::imbue( std::locale const &loc ) {
1852+ proxy_buf_->pubimbue( loc );
1853+}
1854+
1855+streambuf::pos_type streambuf::seekoff( off_type o, ios_base::seekdir d,
1856+ ios_base::openmode m ) {
1857+ return proxy_buf_->pubseekoff( o, d, m );
1858+}
1859+
1860+streambuf::pos_type streambuf::seekpos( pos_type p, ios_base::openmode m ) {
1861+ return proxy_buf_->pubseekpos( p, m );
1862+}
1863+
1864+std::streambuf* streambuf::setbuf( char_type *p, streamsize s ) {
1865+ proxy_buf_->pubsetbuf( p, s );
1866+ return this;
1867+}
1868+
1869+streamsize streambuf::showmanyc() {
1870+ return proxy_buf_->in_avail();
1871+}
1872+
1873+int streambuf::sync() {
1874+ return proxy_buf_->pubsync();
1875+}
1876+
1877+streambuf::int_type streambuf::overflow( int_type c ) {
1878+ return proxy_buf_->sputc( c );
1879+}
1880+
1881+streambuf::int_type streambuf::pbackfail( int_type c ) {
1882+ return proxy_buf_->sputbackc( traits_type::to_char_type( c ) );
1883+}
1884+
1885+streambuf::int_type streambuf::uflow() {
1886+ return proxy_buf_->sbumpc();
1887+}
1888+
1889+streambuf::int_type streambuf::underflow() {
1890+ return proxy_buf_->sgetc();
1891+}
1892+
1893+streamsize streambuf::xsgetn( char_type *to, streamsize size ) {
1894+ return proxy_buf_->sgetn( to, size );
1895+}
1896+
1897+streamsize streambuf::xsputn( char_type const *from,
1898+ streamsize size ) {
1899+ return proxy_buf_->sputn( from, size );
1900+}
1901+
1902+///////////////////////////////////////////////////////////////////////////////
1903+
1904+bool is_necessary( char const *charset ) {
1905+ return internal::transcode::streambuf::is_necessary( charset );
1906+}
1907+
1908+bool is_supported( char const *charset ) {
1909+ return internal::transcode::streambuf::is_supported( charset );
1910+}
1911+
1912+///////////////////////////////////////////////////////////////////////////////
1913+
1914+} // namespace transcode
1915+} // namespace zorba
1916+/* vim:set et sw=2 ts=2: */
1917
1918=== modified file 'src/diagnostics/diagnostic_en.xml'
1919--- src/diagnostics/diagnostic_en.xml 2012-02-16 00:52:25 +0000
1920+++ src/diagnostics/diagnostic_en.xml 2012-02-16 02:09:20 +0000
1921@@ -1581,6 +1581,10 @@
1922 <value>"$1": feature not enabled</value>
1923 </diagnostic>
1924
1925+ <diagnostic code="ZXQP0006" name="UNKNOWN_ENCODING">
1926+ <value>"$1": unknown character encoding</value>
1927+ </diagnostic>
1928+
1929 <diagnostic code="ZXQP0007" name="FUNCTION_SIGNATURE_NOT_EQUAL">
1930 <value>"$1": function signature does not match declaration</value>
1931 </diagnostic>
1932@@ -2193,6 +2197,10 @@
1933 <value>"$1": error loading dynamic library${: 2}</value>
1934 </diagnostic>
1935
1936+ <diagnostic code="ZOSE0006" name="TRANSCODING_ERROR">
1937+ <value>stream transcoding error ($1)</value>
1938+ </diagnostic>
1939+
1940 <!--////////// Zorba Store Errors //////////////////////////////////////-->
1941
1942 <diagnostic code="ZSTR0001" name="INDEX_ALREADY_EXISTS">
1943
1944=== modified file 'src/diagnostics/pregenerated/diagnostic_list.cpp'
1945--- src/diagnostics/pregenerated/diagnostic_list.cpp 2012-01-26 01:35:11 +0000
1946+++ src/diagnostics/pregenerated/diagnostic_list.cpp 2012-02-16 02:09:20 +0000
1947@@ -568,6 +568,9 @@
1948 ZorbaErrorCode ZXQP0005_NOT_ENABLED( "ZXQP0005" );
1949
1950
1951+ZorbaErrorCode ZXQP0006_UNKNOWN_ENCODING( "ZXQP0006" );
1952+
1953+
1954 ZorbaErrorCode ZXQP0007_FUNCTION_SIGNATURE_NOT_EQUAL( "ZXQP0007" );
1955
1956
1957@@ -1004,6 +1007,9 @@
1958 ZorbaErrorCode ZOSE0005_DLL_LOAD_FAILED( "ZOSE0005" );
1959
1960
1961+ZorbaErrorCode ZOSE0006_TRANSCODING_ERROR( "ZOSE0006" );
1962+
1963+
1964 ZorbaErrorCode ZSTR0001_INDEX_ALREADY_EXISTS( "ZSTR0001" );
1965
1966
1967
1968=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
1969--- src/diagnostics/pregenerated/dict_en.cpp 2012-02-16 00:52:25 +0000
1970+++ src/diagnostics/pregenerated/dict_en.cpp 2012-02-16 02:09:20 +0000
1971@@ -354,6 +354,7 @@
1972 { "ZOSE0003", "stream read failure" },
1973 { "ZOSE0004", "${\"1\": }I/O error${: 2}" },
1974 { "ZOSE0005", "\"$1\": error loading dynamic library${: 2}" },
1975+ { "ZOSE0006", "stream transcoding error ($1)" },
1976 { "ZSTR0001", "\"$1\": index already exists" },
1977 { "ZSTR0002", "\"$1\": index does not exist" },
1978 { "ZSTR0003", "\"$1\": partial key insertion into index \"$2\"" },
1979@@ -392,6 +393,7 @@
1980 { "ZXQP0003", "internal error${: 1}" },
1981 { "ZXQP0004", "not yet implemented: $1" },
1982 { "ZXQP0005", "\"$1\": feature not enabled" },
1983+ { "ZXQP0006", "\"$1\": unknown character encoding" },
1984 { "ZXQP0007", "\"$1\": function signature does not match declaration" },
1985 { "ZXQP0008", "\"$1\": function implementation not found" },
1986 { "ZXQP0009", "\"$1\": function referred to by this local-name has the local-name \"$2\" instead" },
1987
1988=== modified file 'src/unit_tests/CMakeLists.txt'
1989--- src/unit_tests/CMakeLists.txt 2012-02-02 16:38:39 +0000
1990+++ src/unit_tests/CMakeLists.txt 2012-02-16 02:09:20 +0000
1991@@ -11,7 +11,6 @@
1992 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1993 # See the License for the specific language governing permissions and
1994 # limitations under the License.
1995-
1996
1997 SET(UNIT_TEST_SRCS
1998 string_instantiate.cpp
1999@@ -30,10 +29,9 @@
2000 tokenizer.cpp)
2001 ENDIF (NOT ZORBA_NO_FULL_TEXT)
2002
2003-IF(ZORBA_WITH_DEBUGGER)
2004- LIST(APPEND UNIT_TEST_SRCS
2005-# test_debugger_protocol.cpp
2006- )
2007-ENDIF(ZORBA_WITH_DEBUGGER)
2008+IF (NOT ZORBA_NO_UNICODE)
2009+ LIST (APPEND UNIT_TEST_SRCS
2010+ test_icu_streambuf.cpp)
2011+ENDIF (NOT ZORBA_NO_UNICODE)
2012
2013 # vim:set et sw=2 tw=2:
2014
2015=== added file 'src/unit_tests/test_icu_streambuf.cpp'
2016--- src/unit_tests/test_icu_streambuf.cpp 1970-01-01 00:00:00 +0000
2017+++ src/unit_tests/test_icu_streambuf.cpp 2012-02-16 02:09:20 +0000
2018@@ -0,0 +1,151 @@
2019+/*
2020+ * Copyright 2006-2008 The FLWOR Foundation.
2021+ *
2022+ * Licensed under the Apache License, Version 2.0 (the "License");
2023+ * you may not use this file except in compliance with the License.
2024+ * You may obtain a copy of the License at
2025+ *
2026+ * http://www.apache.org/licenses/LICENSE-2.0
2027+ *
2028+ * Unless required by applicable law or agreed to in writing, software
2029+ * distributed under the License is distributed on an "AS IS" BASIS,
2030+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2031+ * See the License for the specific language governing permissions and
2032+ * limitations under the License.
2033+ */
2034+
2035+#include <fstream>
2036+#include <iostream>
2037+#include <sstream>
2038+
2039+#include "util/transcode_streambuf.h"
2040+
2041+using namespace std;
2042+using namespace zorba;
2043+
2044+#define COPYRIGHT_ISO "\xA9"
2045+#define COPYRIGHT_UTF8 "\xC2\xA9"
2046+
2047+#define ONE_THIRD_UTF8 "\xE2\x85\x93"
2048+#define ONE_THIRD_UTF16BE "\x21\x53"
2049+
2050+struct test {
2051+ char const *ext_charset;
2052+ char const *ext_str;
2053+ int ext_len;
2054+ char const *utf8_str;
2055+};
2056+
2057+static test const tests[] = {
2058+ /* 0 */ { "ISO-8859-1", "Copyright " COPYRIGHT_ISO " 2011", 0, "Copyright " COPYRIGHT_UTF8 " 2011" },
2059+ /* 1 */ { "UTF-16BE", ONE_THIRD_UTF16BE "\0 \0c\0u\0p", 10, ONE_THIRD_UTF8 " cup" },
2060+ { 0, 0, 0, 0 }
2061+};
2062+
2063+static string make_ext_str( test const *t ) {
2064+ if ( t->ext_len )
2065+ return string( t->ext_str, t->ext_len );
2066+ return string( t->ext_str );
2067+}
2068+
2069+///////////////////////////////////////////////////////////////////////////////
2070+
2071+static int failures;
2072+
2073+static bool assert_true( int no, char const *expr, int line, bool result ) {
2074+ if ( !result ) {
2075+ cout << '#' << no << " FAILED, line " << line << ": " << expr << endl;
2076+ ++failures;
2077+ }
2078+ return result;
2079+}
2080+
2081+static void print_exception( int no, char const *expr, int line,
2082+ std::exception const &e ) {
2083+ assert_true( no, expr, line, false );
2084+ cout << "+ exception: " << e.what() << endl;
2085+}
2086+
2087+#define ASSERT_TRUE( NO, EXPR ) assert_true( NO, #EXPR, __LINE__, !!(EXPR) )
2088+
2089+#define ASSERT_TRUE_AND_NO_EXCEPTION( NO, EXPR ) \
2090+ try { ASSERT_TRUE( NO, EXPR ); } \
2091+ catch ( std::exception const &e ) { print_exception( NO, #EXPR, __LINE__, e ); }
2092+
2093+///////////////////////////////////////////////////////////////////////////////
2094+
2095+static bool test_getline( test const *t ) {
2096+ string const ext_str( make_ext_str( t ) );
2097+ istringstream iss( ext_str );
2098+ icu_streambuf xbuf( t->ext_charset, iss.rdbuf() );
2099+ iss.ios::rdbuf( &xbuf );
2100+
2101+ char utf8_buf[ 1024 ];
2102+ iss.getline( utf8_buf, sizeof utf8_buf );
2103+ if ( iss.gcount() ) {
2104+ string const utf8_str( utf8_buf );
2105+ return utf8_str == t->utf8_str;
2106+ }
2107+ return false;
2108+}
2109+
2110+static bool test_read( test const *t ) {
2111+ string const ext_str( make_ext_str( t ) );
2112+ istringstream iss( ext_str );
2113+ icu_streambuf xbuf( t->ext_charset, iss.rdbuf() );
2114+ iss.ios::rdbuf( &xbuf );
2115+
2116+ char utf8_buf[ 1024 ];
2117+ iss.read( utf8_buf, sizeof utf8_buf );
2118+ if ( iss.gcount() ) {
2119+ string const utf8_str( utf8_buf, iss.gcount() );
2120+ return utf8_str == t->utf8_str;
2121+ }
2122+ return false;
2123+}
2124+
2125+static bool test_insertion( test const *t ) {
2126+ ostringstream oss;
2127+ icu_streambuf xbuf( t->ext_charset, oss.rdbuf() );
2128+ oss.ios::rdbuf( &xbuf );
2129+
2130+ oss << t->utf8_str << flush;
2131+ string const ext_str( oss.str() );
2132+
2133+ string const expected_ext_str( make_ext_str( t ) );
2134+ return ext_str == expected_ext_str;
2135+}
2136+
2137+static bool test_put( test const *t ) {
2138+ ostringstream oss;
2139+ icu_streambuf xbuf( t->ext_charset, oss.rdbuf() );
2140+ oss.ios::rdbuf( &xbuf );
2141+
2142+ for ( char const *c = t->utf8_str; *c; ++c )
2143+ oss.put( *c );
2144+ string const ext_str( oss.str() );
2145+
2146+ string const expected_ext_str( make_ext_str( t ) );
2147+ return ext_str == expected_ext_str;
2148+}
2149+
2150+///////////////////////////////////////////////////////////////////////////////
2151+
2152+namespace zorba {
2153+namespace UnitTests {
2154+
2155+int test_icu_streambuf( int, char*[] ) {
2156+ int test_no = 0;
2157+ for ( test const *t = tests; t->utf8_str; ++t, ++test_no ) {
2158+ ASSERT_TRUE_AND_NO_EXCEPTION( test_no, test_getline( t ) );
2159+ ASSERT_TRUE_AND_NO_EXCEPTION( test_no, test_read( t ) );
2160+ ASSERT_TRUE_AND_NO_EXCEPTION( test_no, test_insertion( t ) );
2161+ ASSERT_TRUE_AND_NO_EXCEPTION( test_no, test_put( t ) );
2162+ }
2163+ cout << failures << " test(s) failed\n";
2164+ return failures ? 1 : 0;
2165+}
2166+
2167+} // namespace UnitTests
2168+} // namespace zorba
2169+/* vim:set et sw=2 ts=2: */
2170
2171=== modified file 'src/unit_tests/unit_test_list.h'
2172--- src/unit_tests/unit_test_list.h 2012-02-02 16:38:39 +0000
2173+++ src/unit_tests/unit_test_list.h 2012-02-16 02:09:20 +0000
2174@@ -17,6 +17,8 @@
2175 #ifndef ZORBA_UNIT_TEST_LIST_H
2176 #define ZORBA_UNIT_TEST_LIST_H
2177
2178+#include <iostream>
2179+
2180 #include <zorba/config.h>
2181
2182 namespace zorba {
2183@@ -34,6 +36,9 @@
2184 /**
2185 * ADD NEW UNIT TESTS HERE
2186 */
2187+#ifndef ZORBA_NO_UNICODE
2188+ int test_icu_streambuf( int, char*[] );
2189+#endif /* ZORBA_NO_UNICODE */
2190 int json_parser( int, char*[] );
2191
2192 void initializeTestList();
2193
2194=== modified file 'src/unit_tests/unit_tests.cpp'
2195--- src/unit_tests/unit_tests.cpp 2012-02-02 16:38:39 +0000
2196+++ src/unit_tests/unit_tests.cpp 2012-02-16 02:09:20 +0000
2197@@ -39,6 +39,9 @@
2198 void initializeTestList() {
2199 libunittests["string"] = test_string;
2200 libunittests["uri"] = runUriTest;
2201+#ifndef ZORBA_NO_UNICODE
2202+ libunittests["icu_streambuf"] = test_icu_streambuf;
2203+#endif /* ZORBA_NO_UNICODE */
2204 libunittests["json_parser"] = json_parser;
2205 libunittests["unique_ptr"] = test_unique_ptr;
2206 #ifndef ZORBA_NO_FULL_TEXT
2207
2208=== modified file 'src/util/CMakeLists.txt'
2209--- src/util/CMakeLists.txt 2011-12-20 18:29:15 +0000
2210+++ src/util/CMakeLists.txt 2012-02-16 02:09:20 +0000
2211@@ -41,7 +41,12 @@
2212 ENDIF(ZORBA_WITH_FILE_ACCESS)
2213
2214 IF(ZORBA_NO_UNICODE)
2215- LIST(APPEND UTIL_SRCS regex_ascii.cpp)
2216+ LIST(APPEND UTIL_SRCS
2217+ regex_ascii.cpp
2218+ passthru_streambuf.cpp)
2219+ELSE(ZORBA_NO_UNICODE)
2220+ LIST(APPEND UTIL_SRCS
2221+ icu_streambuf.cpp)
2222 ENDIF(ZORBA_NO_UNICODE)
2223
2224 HEADER_GROUP_SUBFOLDER(UTIL_SRCS fx)
2225
2226=== added file 'src/util/icu_streambuf.cpp'
2227--- src/util/icu_streambuf.cpp 1970-01-01 00:00:00 +0000
2228+++ src/util/icu_streambuf.cpp 2012-02-16 02:09:20 +0000
2229@@ -0,0 +1,300 @@
2230+/*
2231+ * Copyright 2006-2008 The FLWOR Foundation.
2232+ *
2233+ * Licensed under the Apache License, Version 2.0 (the "License");
2234+ * you may not use this file except in compliance with the License.
2235+ * You may obtain a copy of the License at
2236+ *
2237+ * http://www.apache.org/licenses/LICENSE-2.0
2238+ *
2239+ * Unless required by applicable law or agreed to in writing, software
2240+ * distributed under the License is distributed on an "AS IS" BASIS,
2241+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2242+ * See the License for the specific language governing permissions and
2243+ * limitations under the License.
2244+ */
2245+
2246+#define ZORBA_DEBUG_ICU_STREAMBUF 0
2247+
2248+#ifdef ZORBA_DEBUG_ICU_STREAMBUF
2249+# include <stdio.h>
2250+#endif
2251+
2252+#include <algorithm>
2253+#include <cassert>
2254+
2255+#include <zorba/diagnostic_list.h>
2256+
2257+#include "diagnostics/assert.h"
2258+#include "diagnostics/diagnostic.h"
2259+#include "diagnostics/zorba_exception.h"
2260+#include "util/cxx_util.h"
2261+#include "util/string_util.h"
2262+#include "util/utf8_util.h"
2263+
2264+#include "icu_streambuf.h"
2265+
2266+using namespace std;
2267+
2268+namespace zorba {
2269+
2270+int const Small_External_Buf_Size = 6;
2271+int const Large_External_Buf_Size = 4096;
2272+
2273+///////////////////////////////////////////////////////////////////////////////
2274+
2275+inline void icu_streambuf::buf_type_base::reset() {
2276+ pivot_source_ = pivot_target_ = pivot_buf_;
2277+}
2278+
2279+inline void icu_streambuf::resetg() {
2280+ setg(
2281+ g_.utf8_char_, g_.utf8_char_ + sizeof g_.utf8_char_,
2282+ g_.utf8_char_ + sizeof g_.utf8_char_
2283+ );
2284+}
2285+
2286+icu_streambuf::icu_streambuf( char const *charset, streambuf *orig ) :
2287+ proxy_streambuf( orig ),
2288+ no_conv_( !is_necessary( charset ) ),
2289+ external_conv_( no_conv_ ? nullptr : create_conv( charset ) ),
2290+ utf8_conv_( no_conv_ ? nullptr : create_conv( "UTF-8" ) )
2291+{
2292+ if ( !orig )
2293+ throw invalid_argument( "null streambuf" );
2294+ resetg();
2295+}
2296+
2297+icu_streambuf::~icu_streambuf() {
2298+ if ( external_conv_ )
2299+ ucnv_close( external_conv_ );
2300+ if ( utf8_conv_ )
2301+ ucnv_close( utf8_conv_ );
2302+}
2303+
2304+void icu_streambuf::clear() {
2305+ if ( !no_conv_ ) {
2306+ ucnv_reset( external_conv_ );
2307+ ucnv_reset( utf8_conv_ );
2308+ g_.reset();
2309+ p_.reset();
2310+ resetg();
2311+ }
2312+}
2313+
2314+UConverter* icu_streambuf::create_conv( char const *charset ) {
2315+ UErrorCode err = U_ZERO_ERROR;
2316+ UConverter *const conv = ucnv_open( charset, &err );
2317+ ucnv_setFromUCallBack(
2318+ conv, UCNV_FROM_U_CALLBACK_STOP, nullptr, nullptr, nullptr, &err
2319+ );
2320+ ucnv_setToUCallBack(
2321+ conv, UCNV_TO_U_CALLBACK_STOP, nullptr, nullptr, nullptr, &err
2322+ );
2323+ if ( !conv || U_FAILURE( err ) ) {
2324+ if ( conv )
2325+ ucnv_close( conv );
2326+ throw invalid_argument( charset );
2327+ }
2328+ return conv;
2329+}
2330+
2331+bool icu_streambuf::is_necessary( char const *charset ) {
2332+ //
2333+ // Checking for "US-ASCII" explicitly isn't necessary since ICU knows about
2334+ // aliases.
2335+ //
2336+ return ucnv_compareNames( charset, "ASCII" )
2337+ && ucnv_compareNames( charset, "UTF-8" );
2338+}
2339+
2340+bool icu_streambuf::is_supported( char const *charset ) {
2341+ try {
2342+ ucnv_close( create_conv( charset ) );
2343+ return true;
2344+ }
2345+ catch ( invalid_argument const& ) {
2346+ return false;
2347+ }
2348+}
2349+
2350+icu_streambuf::pos_type icu_streambuf::seekoff( off_type o, ios_base::seekdir d,
2351+ ios_base::openmode m ) {
2352+ clear();
2353+ return original()->pubseekoff( o, d, m );
2354+}
2355+
2356+icu_streambuf::pos_type icu_streambuf::seekpos( pos_type p,
2357+ ios_base::openmode m ) {
2358+ clear();
2359+ return original()->pubseekpos( p, m );
2360+}
2361+
2362+streambuf* icu_streambuf::setbuf( char_type *p, streamsize s ) {
2363+ original()->pubsetbuf( p, s );
2364+ return this;
2365+}
2366+
2367+int icu_streambuf::sync() {
2368+ return original()->pubsync();
2369+}
2370+
2371+icu_streambuf::int_type icu_streambuf::overflow( int_type c ) {
2372+#if ZORBA_DEBUG_ICU_STREAMBUF
2373+ printf( "overflow()\n" );
2374+#endif
2375+ if ( no_conv_ )
2376+ return original()->sputc( c );
2377+
2378+ if ( traits_type::eq_int_type( c, traits_type::eof() ) )
2379+ return traits_type::eof();
2380+
2381+ char_type const utf8_byte = traits_type::to_char_type( c );
2382+ char_type const *from = &utf8_byte;
2383+ char ebuf[ Small_External_Buf_Size ], *to = ebuf;
2384+
2385+ bool const ok = to_external( &from, from + 1, &to, to + sizeof ebuf );
2386+ assert( ok );
2387+ if ( streamsize const n = to - ebuf ) {
2388+ original()->sputn( ebuf, n );
2389+ p_.reset();
2390+ }
2391+
2392+ return c;
2393+}
2394+
2395+bool icu_streambuf::to_external( char_type const **from,
2396+ char_type const *from_end, char **to,
2397+ char const *to_end, bool flush ) {
2398+ UErrorCode err = U_ZERO_ERROR;
2399+ ucnv_convertEx(
2400+ external_conv_, utf8_conv_, to, to_end, from, from_end,
2401+ p_.pivot_buf_, &p_.pivot_source_, &p_.pivot_target_,
2402+ p_.pivot_buf_ + sizeof p_.pivot_buf_,
2403+ /*reset*/ false, flush, &err
2404+ );
2405+ if ( err == U_TRUNCATED_CHAR_FOUND || err == U_BUFFER_OVERFLOW_ERROR )
2406+ return false;
2407+ if ( U_FAILURE( err ) )
2408+ throw ZORBA_EXCEPTION(
2409+ zerr::ZOSE0006_TRANSCODING_ERROR, ERROR_PARAMS( u_errorName( err ) )
2410+ );
2411+ return true;
2412+}
2413+
2414+bool icu_streambuf::to_utf8( char const **from, char const *from_end,
2415+ char_type **to, char_type const *to_end,
2416+ bool flush ) {
2417+ UErrorCode err = U_ZERO_ERROR;
2418+ ucnv_convertEx(
2419+ utf8_conv_, external_conv_, to, to_end, from, from_end,
2420+ g_.pivot_buf_, &g_.pivot_source_, &g_.pivot_target_,
2421+ g_.pivot_buf_ + sizeof g_.pivot_buf_,
2422+ /*reset*/ false, flush, &err
2423+ );
2424+ if ( err == U_TRUNCATED_CHAR_FOUND || err == U_BUFFER_OVERFLOW_ERROR )
2425+ return false;
2426+ if ( U_FAILURE( err ) )
2427+ throw ZORBA_EXCEPTION(
2428+ zerr::ZOSE0006_TRANSCODING_ERROR, ERROR_PARAMS( u_errorName( err ) )
2429+ );
2430+ return true;
2431+}
2432+
2433+icu_streambuf::int_type icu_streambuf::underflow() {
2434+#if ZORBA_DEBUG_ICU_STREAMBUF
2435+ printf( "underflow()\n" );
2436+#endif
2437+ if ( no_conv_ )
2438+ return original()->sgetc();
2439+
2440+ if ( gptr() >= egptr() ) {
2441+ utf8::storage_type *to = g_.utf8_char_;
2442+ utf8::storage_type const *const to_end = to + sizeof g_.utf8_char_;
2443+
2444+ while ( true ) {
2445+ int_type const c = original()->sbumpc();
2446+ if ( traits_type::eq_int_type( c, traits_type::eof() ) )
2447+ return traits_type::eof();
2448+
2449+ char const ebyte = traits_type::to_char_type( c );
2450+ char const *from = &ebyte;
2451+
2452+ to_utf8( &from, from + 1, &to, to_end );
2453+ if ( to > g_.utf8_char_ ) {
2454+ setg( g_.utf8_char_, g_.utf8_char_, to );
2455+ g_.reset();
2456+ break;
2457+ }
2458+ }
2459+ }
2460+ return traits_type::to_int_type( *gptr() );
2461+}
2462+
2463+streamsize icu_streambuf::xsgetn( char_type *to, streamsize size ) {
2464+#if ZORBA_DEBUG_ICU_STREAMBUF
2465+ printf( "xsgetn()\n" );
2466+#endif
2467+ if ( no_conv_ )
2468+ return original()->sgetn( to, size );
2469+
2470+ streamsize return_size = 0;
2471+ char_type *const to_end = to + size;
2472+
2473+ if ( streamsize const gsize = egptr() - gptr() ) {
2474+ // must first get any chars in g_.utf8_char_
2475+ streamsize const n = min( gsize, size );
2476+ traits_type::copy( to, gptr(), n );
2477+ gbump( n );
2478+ to += n;
2479+ size -= n, return_size += n;
2480+ }
2481+
2482+ while ( size > 0 ) {
2483+ char ebuf[ Large_External_Buf_Size ];
2484+ streamsize const get = min( (streamsize)(sizeof ebuf), size );
2485+ if ( streamsize const got = original()->sgetn( ebuf, get ) ) {
2486+ char const *from = ebuf;
2487+ char_type const *const to_orig = to;
2488+ int_type const peek = original()->sgetc();
2489+ bool const flush = traits_type::eq_int_type( peek, traits_type::eof() );
2490+ to_utf8( &from, from + got, &to, to_end, flush );
2491+ streamsize const n = to - to_orig;
2492+ size -= n, return_size += n;
2493+ if ( flush )
2494+ break;
2495+ } else
2496+ break;
2497+ }
2498+ return return_size;
2499+}
2500+
2501+streamsize icu_streambuf::xsputn( char_type const *from, streamsize size ) {
2502+#if ZORBA_DEBUG_ICU_STREAMBUF
2503+ printf( "xsputn()\n" );
2504+#endif
2505+ if ( no_conv_ )
2506+ return original()->sputn( from, size );
2507+
2508+ streamsize return_size = 0;
2509+ char_type const *const from_end = from + size;
2510+ char ebuf[ Large_External_Buf_Size ], *to = ebuf;
2511+ char const *const to_end = to + sizeof ebuf;
2512+
2513+ while ( size > 0 ) {
2514+ char_type const *const from_orig = from;
2515+ to_external( &from, from_end, &to, to_end );
2516+ streamsize n = to - ebuf;
2517+ if ( n && !original()->sputn( ebuf, n ) )
2518+ break;
2519+ to = ebuf;
2520+ n = from - from_orig;
2521+ size -= n, return_size += n;
2522+ }
2523+ return return_size;
2524+}
2525+
2526+///////////////////////////////////////////////////////////////////////////////
2527+
2528+} // namespace zorba
2529+/* vim:set et sw=2 ts=2: */
2530
2531=== added file 'src/util/icu_streambuf.h'
2532--- src/util/icu_streambuf.h 1970-01-01 00:00:00 +0000
2533+++ src/util/icu_streambuf.h 2012-02-16 02:09:20 +0000
2534@@ -0,0 +1,140 @@
2535+/*
2536+ * Copyright 2006-2008 The FLWOR Foundation.
2537+ *
2538+ * Licensed under the Apache License, Version 2.0 (the "License");
2539+ * you may not use this file except in compliance with the License.
2540+ * You may obtain a copy of the License at
2541+ *
2542+ * http://www.apache.org/licenses/LICENSE-2.0
2543+ *
2544+ * Unless required by applicable law or agreed to in writing, software
2545+ * distributed under the License is distributed on an "AS IS" BASIS,
2546+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2547+ * See the License for the specific language governing permissions and
2548+ * limitations under the License.
2549+ */
2550+
2551+#ifndef ZORBA_ICU_STREAMBUF_H
2552+#define ZORBA_ICU_STREAMBUF_H
2553+
2554+#include <zorba/transcode_stream.h>
2555+
2556+#include "util/utf8_util.h"
2557+
2558+namespace zorba {
2559+
2560+///////////////////////////////////////////////////////////////////////////////
2561+
2562+/**
2563+ * An %icu_streambuf is-a std::streambuf for transcoding character encodings
2564+ * from/to UTF-8 on-the-fly.
2565+ *
2566+ * To use it, replace a stream's streambuf:
2567+ * \code
2568+ * istream is;
2569+ * // ...
2570+ * icu_streambuf xbuf( "ISO-8859-1", is.rdbuf() );
2571+ * is.ios::rdbuf( &xbuf );
2572+ * \endcode
2573+ * Note that the %icu_streambuf must exist for as long as it's being used by
2574+ * the stream. If you are replacing the streabuf for a stream you did not
2575+ * create, you should set it back to the original streambuf:
2576+ * \code
2577+ * void f( ostream &os ) {
2578+ * icu_streambuf xbuf( "ISO-8859-1", os.rdbuf() );
2579+ * try {
2580+ * os.ios::rdbuf( &xbuf );
2581+ * // ...
2582+ * }
2583+ * catch ( ... ) {
2584+ * os.ios::rdbuf( xbuf.original() );
2585+ * throw;
2586+ * }
2587+ * }
2588+ * \endcode
2589+ *
2590+ * While %icu_streambuf does support seeking, the positions are relative to the
2591+ * original byte stream.
2592+ */
2593+class icu_streambuf : public proxy_streambuf {
2594+public:
2595+ /**
2596+ * Constructs an %icu_streambuf.
2597+ *
2598+ * @param charset The name of the character encoding to convert from/to.
2599+ * @param orig The original streambuf to read/write from/to.
2600+ */
2601+ icu_streambuf( char const *charset, std::streambuf *orig );
2602+
2603+ /**
2604+ * Destructs an %icu_streambuf.
2605+ */
2606+ ~icu_streambuf();
2607+
2608+ /**
2609+ * Checks whether it would be necessary to transcode from the given character
2610+ * encoding to UTF-8.
2611+ *
2612+ * @param charset The name of the character encoding to check.
2613+ * @return \c true only if t would be necessary to transcode from the given
2614+ * character encoding to UTF-8.
2615+ */
2616+ static bool is_necessary( char const *charset );
2617+
2618+ /**
2619+ * Checks whether the given character set is supported for transcoding.
2620+ *
2621+ * @param charset The name of the character encoding to check.
2622+ * @return \c true only if the character encoding is supported.
2623+ */
2624+ static bool is_supported( char const *charset );
2625+
2626+protected:
2627+ pos_type seekoff( off_type, std::ios_base::seekdir, std::ios_base::openmode );
2628+ pos_type seekpos( pos_type, std::ios_base::openmode );
2629+ std::streambuf* setbuf( char_type*, std::streamsize );
2630+ int sync();
2631+ int_type overflow( int_type );
2632+ int_type underflow();
2633+ std::streamsize xsgetn( char_type*, std::streamsize );
2634+ std::streamsize xsputn( char_type const*, std::streamsize );
2635+
2636+private:
2637+ struct buf_type_base {
2638+ UChar pivot_buf_[ 4096 ], *pivot_source_, *pivot_target_;
2639+
2640+ buf_type_base() { reset(); }
2641+ void reset();
2642+ };
2643+
2644+ struct gbuf_type : buf_type_base {
2645+ utf8::encoded_char_type utf8_char_;
2646+ };
2647+ gbuf_type g_;
2648+
2649+ typedef buf_type_base pbuf_type;
2650+ pbuf_type p_;
2651+
2652+ bool const no_conv_; // true = no conversion needed
2653+ UConverter *const external_conv_, *const utf8_conv_;
2654+
2655+ void clear();
2656+ static UConverter* create_conv( char const *charset );
2657+ void resetg();
2658+
2659+ bool to_external( char_type const **from, char_type const *from_end,
2660+ char **to, char const *to_end, bool flush = false );
2661+
2662+ bool to_utf8( char const **from, char const *from_end, char_type **to,
2663+ char_type const *to_end, bool flush = false );
2664+
2665+ // forbid
2666+ icu_streambuf( icu_streambuf const& );
2667+ icu_streambuf& operator=( icu_streambuf const& );
2668+};
2669+
2670+///////////////////////////////////////////////////////////////////////////////
2671+
2672+} // namespace zorba
2673+#endif /* ZORBA_ICU_STREAMBUF_H */
2674+/* vim:set et sw=2 ts=2: */
2675
2676=== added file 'src/util/passthru_streambuf.cpp'
2677--- src/util/passthru_streambuf.cpp 1970-01-01 00:00:00 +0000
2678+++ src/util/passthru_streambuf.cpp 2012-02-16 02:09:20 +0000
2679@@ -0,0 +1,105 @@
2680+/*
2681+ * Copyright 2006-2008 The FLWOR Foundation.
2682+ *
2683+ * Licensed under the Apache License, Version 2.0 (the "License");
2684+ * you may not use this file except in compliance with the License.
2685+ * You may obtain a copy of the License at
2686+ *
2687+ * http://www.apache.org/licenses/LICENSE-2.0
2688+ *
2689+ * Unless required by applicable law or agreed to in writing, software
2690+ * distributed under the License is distributed on an "AS IS" BASIS,
2691+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2692+ * See the License for the specific language governing permissions and
2693+ * limitations under the License.
2694+ */
2695+
2696+#include "passthru_streambuf.h"
2697+
2698+using namespace std;
2699+
2700+namespace zorba {
2701+
2702+///////////////////////////////////////////////////////////////////////////////
2703+
2704+passthru_streambuf::passthru_streambuf( char const*, streambuf *orig ) :
2705+ proxy_streambuf( orig )
2706+{
2707+ if ( !orig )
2708+ throw invalid_argument( "null streambuf" );
2709+}
2710+
2711+passthru_streambuf::~passthru_streambuf() {
2712+ // out-of-line since it's virtual
2713+}
2714+
2715+void passthru_streambuf::imbue( std::locale const &loc ) {
2716+ original()->pubimbue( loc );
2717+}
2718+
2719+bool passthru_streambuf::is_necessary( char const *cc_charset ) {
2720+ zstring charset( cc_charset );
2721+ ascii::trim_whitespace( charset );
2722+ ascii::to_upper( charset );
2723+ return charset != "ASCII"
2724+ && charset != "US-ASCII"
2725+ && charset != "UTF-8";
2726+}
2727+
2728+bool passthru_streambuf::is_supported( char const *cc_charset ) {
2729+ return !is_necessary( charset );
2730+}
2731+
2732+passthru_streambuf::pos_type
2733+passthru_streambuf::seekoff( off_type o, ios_base::seekdir d,
2734+ ios_base::openmode m ) {
2735+ return original()->pubseekoff( o, d, m );
2736+}
2737+
2738+passthru_streambuf::pos_type
2739+passthru_streambuf::seekpos( pos_type p, ios_base::openmode m ) {
2740+ return original()->pubseekpos( p, m );
2741+}
2742+
2743+streambuf* passthru_streambuf::setbuf( char_type *p, streamsize s ) {
2744+ original()->pubsetbuf( p, s );
2745+ return this;
2746+}
2747+
2748+streamsize passthru_streambuf::showmanyc() {
2749+ return original()->in_avail();
2750+}
2751+
2752+int passthru_streambuf::sync() {
2753+ return original()->pubsync();
2754+}
2755+
2756+passthru_streambuf::int_type passthru_streambuf::overflow( int_type c ) {
2757+ return original()->sputc( c );
2758+}
2759+
2760+passthru_streambuf::int_type passthru_streambuf::pbackfail( int_type c ) {
2761+ return original()->sputbackc( traits_type::to_char_type( c ) );
2762+}
2763+
2764+passthru_streambuf::int_type passthru_streambuf::uflow() {
2765+ return original()->sbumpc();
2766+}
2767+
2768+passthru_streambuf::int_type passthru_streambuf::underflow() {
2769+ return original()->sgetc();
2770+}
2771+
2772+streamsize passthru_streambuf::xsgetn( char_type *to, streamsize size ) {
2773+ return original()->sgetn( to, size );
2774+}
2775+
2776+streamsize passthru_streambuf::xsputn( char_type const *from,
2777+ streamsize size ) {
2778+ return original()->sputn( from, size );
2779+}
2780+
2781+///////////////////////////////////////////////////////////////////////////////
2782+
2783+} // namespace zorba
2784+/* vim:set et sw=2 ts=2: */
2785
2786=== added file 'src/util/passthru_streambuf.h'
2787--- src/util/passthru_streambuf.h 1970-01-01 00:00:00 +0000
2788+++ src/util/passthru_streambuf.h 2012-02-16 02:09:20 +0000
2789@@ -0,0 +1,76 @@
2790+/*
2791+ * Copyright 2006-2008 The FLWOR Foundation.
2792+ *
2793+ * Licensed under the Apache License, Version 2.0 (the "License");
2794+ * you may not use this file except in compliance with the License.
2795+ * You may obtain a copy of the License at
2796+ *
2797+ * http://www.apache.org/licenses/LICENSE-2.0
2798+ *
2799+ * Unless required by applicable law or agreed to in writing, software
2800+ * distributed under the License is distributed on an "AS IS" BASIS,
2801+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2802+ * See the License for the specific language governing permissions and
2803+ * limitations under the License.
2804+ */
2805+
2806+#ifndef ZORBA_PASSTHRU_STREAMBUF_H
2807+#define ZORBA_PASSTHRU_STREAMBUF_H
2808+
2809+#include <zorba/transcode_streambuf.h>
2810+
2811+namespace zorba {
2812+
2813+///////////////////////////////////////////////////////////////////////////////
2814+
2815+/**
2816+ * A %passthru_streambuf is-a std::streambuf TODO
2817+ */
2818+class passthru_streambuf : public proxy_streambuf {
2819+public:
2820+ /**
2821+ * Constructs an %passthru_streambuf.
2822+ *
2823+ * @param charset The name of the character encoding to convert from/to.
2824+ * @param orig The original streambuf to read/write from/to.
2825+ */
2826+ passthru_streambuf( char const *charset, std::streambuf *orig );
2827+
2828+ /**
2829+ * Destructs an %passthru_streambuf.
2830+ */
2831+ ~passthru_streambuf();
2832+
2833+ /**
2834+ * Checks whether the given character set is supported for transcoding.
2835+ *
2836+ * @param charset The name of the character encoding to check.
2837+ * @return \c true only if the character encoding is supported.
2838+ */
2839+ static bool is_supported( char const *charset );
2840+
2841+protected:
2842+ void imbue( std::locale const& );
2843+ pos_type seekoff( off_type, std::ios_base::seekdir, std::ios_base::openmode );
2844+ pos_type seekpos( pos_type, std::ios_base::openmode );
2845+ std::streambuf* setbuf( char_type*, std::streamsize );
2846+ std::streamsize showmanyc();
2847+ int sync();
2848+ int_type overflow( int_type );
2849+ int_type pbackfail( int_type );
2850+ int_type uflow();
2851+ int_type underflow();
2852+ std::streamsize xsgetn( char_type*, std::streamsize );
2853+ std::streamsize xsputn( char_type const*, std::streamsize );
2854+
2855+private:
2856+ // forbid
2857+ passthru_streambuf( passthru_streambuf const& );
2858+ passthru_streambuf& operator=( passthru_streambuf const& );
2859+};
2860+
2861+///////////////////////////////////////////////////////////////////////////////
2862+
2863+} // namespace zorba
2864+#endif /* ZORBA_PASSTHRU_STREAMBUF_H */
2865+/* vim:set et sw=2 ts=2: */
2866
2867=== added file 'src/util/transcode_streambuf.h'
2868--- src/util/transcode_streambuf.h 1970-01-01 00:00:00 +0000
2869+++ src/util/transcode_streambuf.h 2012-02-16 02:09:20 +0000
2870@@ -0,0 +1,47 @@
2871+/*
2872+ * Copyright 2006-2008 The FLWOR Foundation.
2873+ *
2874+ * Licensed under the Apache License, Version 2.0 (the "License");
2875+ * you may not use this file except in compliance with the License.
2876+ * You may obtain a copy of the License at
2877+ *
2878+ * http://www.apache.org/licenses/LICENSE-2.0
2879+ *
2880+ * Unless required by applicable law or agreed to in writing, software
2881+ * distributed under the License is distributed on an "AS IS" BASIS,
2882+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
2883+ * See the License for the specific language governing permissions and
2884+ * limitations under the License.
2885+ */
2886+
2887+#ifndef ZORBA_TRANSCODE_STREAMBUF_H
2888+#define ZORBA_TRANSCODE_STREAMBUF_H
2889+
2890+#include <zorba/config.h>
2891+
2892+///////////////////////////////////////////////////////////////////////////////
2893+
2894+#ifdef ZORBA_NO_UNICODE
2895+# include "passthru_streambuf.h"
2896+#else
2897+# include "icu_streambuf.h"
2898+#endif /* ZORBA_NO_UNICODE */
2899+
2900+namespace zorba {
2901+namespace internal {
2902+namespace transcode {
2903+
2904+#ifdef ZORBA_NO_UNICODE
2905+typedef passthru_streambuf streambuf;
2906+#else
2907+typedef icu_streambuf streambuf;
2908+#endif /* ZORBA_NO_UNICODE */
2909+
2910+} // namespace transcode
2911+} // namespace internal
2912+} // namespace zorba
2913+
2914+///////////////////////////////////////////////////////////////////////////////
2915+
2916+#endif /* ZORBA_TRANSCODE_STREAMBUF_H */
2917+/* vim:set et sw=2 ts=2: */
2918
2919=== added file 'test/rbkt/ExpQueryResults/zorba/file/cp1252.xml.res'
2920--- test/rbkt/ExpQueryResults/zorba/file/cp1252.xml.res 1970-01-01 00:00:00 +0000
2921+++ test/rbkt/ExpQueryResults/zorba/file/cp1252.xml.res 2012-02-16 02:09:20 +0000
2922@@ -0,0 +1,1 @@
2923+üäö
2924
2925=== added file 'test/rbkt/Queries/zorba/file/cp1252.txt'
2926--- test/rbkt/Queries/zorba/file/cp1252.txt 1970-01-01 00:00:00 +0000
2927+++ test/rbkt/Queries/zorba/file/cp1252.txt 2012-02-16 02:09:20 +0000
2928@@ -0,0 +1,1 @@
2929+üäö
2930
2931=== added file 'test/rbkt/Queries/zorba/file/cp1252.xq'
2932--- test/rbkt/Queries/zorba/file/cp1252.xq 1970-01-01 00:00:00 +0000
2933+++ test/rbkt/Queries/zorba/file/cp1252.xq 2012-02-16 02:09:20 +0000
2934@@ -0,0 +1,3 @@
2935+import module namespace f = "http://expath.org/ns/file";
2936+
2937+f:read-text(fn:resolve-uri("cp1252.txt"), "CP1252")
2938
2939=== added file 'test/rbkt/Queries/zorba/file/invalid_encoding.spec'
2940--- test/rbkt/Queries/zorba/file/invalid_encoding.spec 1970-01-01 00:00:00 +0000
2941+++ test/rbkt/Queries/zorba/file/invalid_encoding.spec 2012-02-16 02:09:20 +0000
2942@@ -0,0 +1,1 @@
2943+Error: http://expath.org/ns/file:FOFL0006
2944
2945=== added file 'test/rbkt/Queries/zorba/file/invalid_encoding.xq'
2946--- test/rbkt/Queries/zorba/file/invalid_encoding.xq 1970-01-01 00:00:00 +0000
2947+++ test/rbkt/Queries/zorba/file/invalid_encoding.xq 2012-02-16 02:09:20 +0000
2948@@ -0,0 +1,3 @@
2949+import module namespace f = "http://expath.org/ns/file";
2950+
2951+f:read-text(fn:resolve-uri("cp1252.txt"), "FOO")
2952
2953=== modified file 'test/rbkt/Queries/zorba/http-client/send-request/http2-read-svg.xq'
2954--- test/rbkt/Queries/zorba/http-client/send-request/http2-read-svg.xq 2011-08-23 07:11:31 +0000
2955+++ test/rbkt/Queries/zorba/http-client/send-request/http2-read-svg.xq 2012-02-16 02:09:20 +0000
2956@@ -7,9 +7,9 @@
2957 auth-method="Basic"
2958 send-authorization="true"
2959 username="zorba"
2960- password="blub"/>;
2961+ password="blub"
2962+ override-media-type="application/xml; charset=utf-8"/>;
2963
2964 variable $http-res := http:send-request($req, (), ());
2965
2966 $http-res[2]
2967-

Subscribers

People subscribed via source and target branches