Merge lp:~danielturcanu/zorba/mytrunk into lp:zorba

Proposed by Daniel Turcanu
Status: Superseded
Proposed branch: lp:~danielturcanu/zorba/mytrunk
Merge into: lp:zorba
Diff against target: 176 lines (+38/-11)
13 files modified
ChangeLog (+2/-0)
src/diagnostics/dict_XX_cpp.xq (+1/-0)
src/diagnostics/pregenerated/dict_en.cpp (+1/-0)
src/runtime/strings/strings_impl.cpp (+1/-1)
src/util/utf8_util_base.h (+1/-2)
src/zorbatypes/URI.cpp (+19/-8)
src/zorbatypes/URI.h (+2/-0)
test/rbkt/ExpQueryResults/zorba/resolving/path_to_uri.xml.res (+1/-0)
test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a4.xml.res (+1/-0)
test/rbkt/Queries/zorba/resolving/path_to_uri.spec (+4/-0)
test/rbkt/Queries/zorba/resolving/path_to_uri.xq (+3/-0)
test/rbkt/Queries/zorba/resolving/test.xml (+1/-0)
test/rbkt/Queries/zorba/string/Regex/regex_a4.xq (+1/-0)
To merge this branch: bzr merge lp:~danielturcanu/zorba/mytrunk
Reviewer Review Type Date Requested Status
Chris Hillery Disapprove
Matthias Brantner Pending
Review via email: mp+78253@code.launchpad.net

This proposal supersedes a proposal from 2011-10-04.

To post a comment you must log in.
Revision history for this message
Matthias Brantner (matthias-brantner) wrote : Posted in a previous version of this proposal

Please open two bugs for this and link the bug numbers from the ChangeLog. Also, please describe (or mention the bugs) in the commit message. We already went back and forth with URI fixes plenty of times and it's very confusing if it's unclear which scenario your commit is supposed to fix.

Revision history for this message
Matthias Brantner (matthias-brantner) wrote : Posted in a previous version of this proposal

See comment recently made.

review: Needs Fixing
Revision history for this message
Chris Hillery (ceejatec) wrote :

This isn't a good fix. The "path" of a URI isn't a filesystem path; it's a specific part of a URI (see RFC 3986). The path portion of a URI cannot be "absolute" or "relative". So it doesn't make sense to have this functionality in the URI class.

fn:resolve-uri() is not supposed to accept a filesystem path, so the test case path-to-uri.xq is not valid. If it DOES work on Unix as written, that's probably a different bug.

I am commenting specifically on the fix to bug 868329. I have nothing to say about the fix for bug 868325 except that it should probably be in a separate commit.

review: Disapprove
Revision history for this message
Daniel Turcanu (danielturcanu) wrote :

For Windows, the absolute path for URI is still in the form C:/, only that it looks like C3A/.
At least for file:// scheme. I guess I should add a check only for that scheme.

About absolute and relative paths in URI, think about what fn:resolve-uri works with. It can receive a relative path, or absolute path or entire URI. When receiving absolute paths, it doesn't work now in Windows for file://.

Revision history for this message
Chris Hillery (ceejatec) wrote :

The path component of a file: URI on Windows can contain C:/ (or C%3A/), but that's not directly relevant.

fn:resolve-uri() works with exactly URIs. It is not defined to accept filesystem paths. Thus, if the input is "C:/foo", it will interpret that strictly as a URI (and most likely fail since the "C:" scheme probably isn't known).

fn:doc() is an explicit exception in Zorba - it is defined (by us) to accept either a URI *or* a filesystem path. This is technically not W3C-spec compliant, but it was decided that it was necessary for a good customer experience. But fn:resolve-uri() does not have the same "magic". If you want, you could maybe make a case that it should have the same magic, but even if we agreed on that, it should use the same code from inside the implementation of fn:doc() to achieve it.

Revision history for this message
Daniel Turcanu (danielturcanu) wrote :

Ok, I guess you are right. I reread again the URI spec and it doesn't say anything about Windows paths.
But how to transform Windows filepaths into file URI in a OS independent way?
There is another function fn:encode-for-uri, which is supposed to encode a path to be used as uri, but that isn't prepared for Windows either.
fn:encode-for-uri("C:\path1\file.xml") gives C%3Apath1%2Ffile.xml.
I would expect /C%3A/path1/file.xml , meaning detect that is an absolute Windows path, and replace backslashes with slashes.
I think the problem is in this function, it should be made to work differently for Windows.

Revision history for this message
Chris Hillery (ceejatec) wrote :

The W3C doesn't care about filesystems. :) encode-for-uri() just escapes illegal characters in URIs.

Fortunately, EXPath cares: the File module has functions to go back and forth. file:path-to-uri() converts a filesystem path to a file: URI, and file:path-to-uri() (which is a bit oddly named) does the reverse.

Note that those functions don't work "in an OS independent way" as you say - they work differently depending on the platform running the script. So file:path-to-uri("C:\foo") won't work on a Unix system. I'm not sure that's a problem, but if you think you need to be able to do that kind of conversion anywhere, that would be a new feature request for EXpath to consider.

Revision history for this message
Daniel Turcanu (danielturcanu) wrote :

Maybe W3C doesn't care about file systems, but URIs do.
At least http, ftp and file all work with file system paths. We actually work with URL, not URI, we don't have general URIs so far.

I tested now file:path-to-uri and it doesn't do what I want: if I give it an absolute path, then it returns a complete URI, and if I give it a relative path, then it also returns a complete URI, with the path relative to my rbkt dir in the build directory. This seems like a wrong thing. If my rbkt dir is hardcoded in the file module, then it won't work for other users if I deploy the file module.

I still think that this is a job for fn:encode-for-uri. I need a function to encode a file system path into a path for URI. So if it is absolute path, make it absolute path for URI, not a full URI, and if it is relative path, make it relative path for URI.
And then this path I can apply to fn:resolve-uri.

Revision history for this message
Chris Hillery (ceejatec) wrote :

HTTP and FTP URIs certainly don't work with filesystem paths. File: URIs only do by convention, and there are (somewhat hand-wavy) rules for mapping filesystem paths to file: URIs. But a filesystem path is-not-a file: URI, and so a filesystem path is never an appropriate argument for fn:resolve-uri() or in fact for any other function defined by XQuery F&O.

Also, URLs are no different than URIs except that URLs are expected to directly point to a resource, where a generic URI may simply name the resource. But they follow the same rules. Neither of them have any relationship to filesystem paths. I'm not saying this is good or bad, it's simply true. And Zorba most certainly works with full URIs, not just URLs.

file:path-to-uri() is documented to resolve relative filesystem paths using the current working directory as the base. So you're seeing your RBKT dir because that's the current working directory for testdriver, I suspect. Again, I'm not saying this is good or bad, it's simply true. I would certainly support you if you wanted to say that there should be a two-argument form of this function that allowed you to supply a different base URI or path, but you'll need to run that through the EXPath process (whatever that is).

Finally, regarding fn:encode-for-uri():

  http://www.w3.org/TR/xpath-functions/#func-encode-for-uri

It only escapes *characters* in a string which are illegal in the path segment of a URI. It is not intended in any way to convert a filesystem path to a valid URI (relative or otherwise).

Revision history for this message
Daniel Turcanu (danielturcanu) wrote :

Please stop giving arguments from the spec. I also know the spec.
I am trying to come with a solution to the problem. This problem cannot be solved by sticking to the spec.

So I want like this: have a variable that contains an absolute or relative file system path, and resolve it to a base absolute URI. There should be no if(is_absolute_path) and no if(win32) in the xquery code. Right now this doesn't work, because our URI implementation considers the absolute Win32 path as a relative path, because it doesn't start with slash.

Maybe we shoud add another function in the file module, like file:path-to-uri-path, which should add a slash for absolute win32 paths, and leave the relative paths as they are.

Revision history for this message
Chris Hillery (ceejatec) wrote :

Sorry, but we implement a spec. You've suggested at least two changes which would break our compliance with that spec, so I don't have much choice but to quote the spec back at you.

It sounds like what you want would be best served by an extension to the Expath file module. As I said in my previous note, I'd certainly support you if you wanted to propose adding the two-argument form of file:path-to-uri(). It would be easy to implement, since we already have this functionality in the C++ API (zorba::filesystem_path::normalize_path()). Alternately, you could introduce a new internal module for your needs, or even add a "change working directory" function to the system module - I'd ask on zorba-coders for comments if you're considering either of those options.

Revision history for this message
Daniel Turcanu (danielturcanu) wrote :

So you propose a function like file:resolve-path-to-uri($baseUri, $filePath) ?
I guess it's fine too.

lp:~danielturcanu/zorba/mytrunk updated
10473. By Daniel Turcanu

Reverted changes in URI

10474. By Daniel Turcanu

Updated to trunk

10475. By Daniel Turcanu

Update to trunk

10476. By Daniel Turcanu

Update to trunk

10477. By Daniel Turcanu

Reverted change to http_client

10478. By Daniel Turcanu

Update to trunk

10479. By Daniel Turcanu

Deleted blank uri test

10480. By Daniel Turcanu

Update to trunk

10481. By Daniel Turcanu

Update to trunk

10482. By Daniel Turcanu

Update to trunk

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'ChangeLog'
--- ChangeLog 2011-10-04 18:22:59 +0000
+++ ChangeLog 2011-10-05 13:12:05 +0000
@@ -41,6 +41,8 @@
41 * Fixed bug #863730 (static delete-node* functions don't raise ZDDY0012)41 * Fixed bug #863730 (static delete-node* functions don't raise ZDDY0012)
42 * Implemented the probe-index-range-value for general indexes42 * Implemented the probe-index-range-value for general indexes
43 * Fixed bug #867662 ("nullptr" warning)43 * Fixed bug #867662 ("nullptr" warning)
44 * Fixed bug #868329 (URI resolver fails on Windows with absolute paths)
45 * Fixed bug #868325 (fn:analyze-string fails with some recursive subgroups)
4446
45version 2.0.147version 2.0.1
4648
4749
=== modified file 'src/diagnostics/dict_XX_cpp.xq'
--- src/diagnostics/dict_XX_cpp.xq 2011-08-05 02:21:55 +0000
+++ src/diagnostics/dict_XX_cpp.xq 2011-10-05 13:12:05 +0000
@@ -64,6 +64,7 @@
64return string-join(64return string-join(
65 ( util:copyright(), 65 ( util:copyright(),
66 '#include "stdafx.h"',66 '#include "stdafx.h"',
67 '#include "zorba/config.h"',
67 '#include "diagnostics/dict_impl.h"',68 '#include "diagnostics/dict_impl.h"',
68 '',69 '',
69 'namespace zorba {',70 'namespace zorba {',
7071
=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
--- src/diagnostics/pregenerated/dict_en.cpp 2011-10-04 05:28:07 +0000
+++ src/diagnostics/pregenerated/dict_en.cpp 2011-10-05 13:12:05 +0000
@@ -20,6 +20,7 @@
20 */20 */
21 21
22#include "stdafx.h"22#include "stdafx.h"
23#include "zorba/config.h"
23#include "diagnostics/dict_impl.h"24#include "diagnostics/dict_impl.h"
2425
25namespace zorba {26namespace zorba {
2627
=== modified file 'src/runtime/strings/strings_impl.cpp'
--- src/runtime/strings/strings_impl.cpp 2011-08-10 18:58:11 +0000
+++ src/runtime/strings/strings_impl.cpp 2011-10-05 13:12:05 +0000
@@ -1688,7 +1688,7 @@
1688 GENV_ITEMFACTORY->createString(strid_item, zstrid);1688 GENV_ITEMFACTORY->createString(strid_item, zstrid);
1689 store::Item_t id_attrib_item;1689 store::Item_t id_attrib_item;
1690 GENV_ITEMFACTORY->createAttributeNode(id_attrib_item, group_elem.getp(), nr_attrib_name, untyped_type_name, strid_item);1690 GENV_ITEMFACTORY->createAttributeNode(id_attrib_item, group_elem.getp(), nr_attrib_name, untyped_type_name, strid_item);
1691 if(match_startg < 0)1691 if((match_startg < 0) || (match_startg < match_endgood))
1692 continue;1692 continue;
1693 match_endgood = match_endg;1693 match_endgood = match_endg;
1694 if((i+1)<nr_pattern_groups)1694 if((i+1)<nr_pattern_groups)
16951695
=== modified file 'src/util/utf8_util_base.h'
--- src/util/utf8_util_base.h 2011-07-17 20:05:49 +0000
+++ src/util/utf8_util_base.h 2011-10-05 13:12:05 +0000
@@ -148,8 +148,7 @@
148 * @return Returns the Unicode code-point of the next character.148 * @return Returns the Unicode code-point of the next character.
149 */149 */
150template<class OctetIterator>150template<class OctetIterator>
151ZORBA_DLL_PUBLIC151ZORBA_DLL_PUBLIC unicode::code_point next_char( OctetIterator &i );
152unicode::code_point next_char( OctetIterator &i );
153152
154/**153/**
155 * Decodes the previous Unicode character.154 * Decodes the previous Unicode character.
156155
=== modified file 'src/zorbatypes/URI.cpp'
--- src/zorbatypes/URI.cpp 2011-06-24 23:00:33 +0000
+++ src/zorbatypes/URI.cpp 2011-10-05 13:12:05 +0000
@@ -1191,7 +1191,16 @@
1191 return is_set(Scheme) && !theScheme.empty();1191 return is_set(Scheme) && !theScheme.empty();
1192}1192}
11931193
11941194bool URI::is_absolute_path(zstring &thePath)
1195{
1196#ifndef WIN32
1197 return thePath[0] == '/';
1198#else
1199 return (thePath.length() > 2) &&
1200 ((thePath[1] == ':') ||
1201 ((thePath[1] == '%') && (thePath[2] == '3') && (thePath[3] == 'A')));
1202#endif
1203}
11951204
1196/*******************************************************************************1205/*******************************************************************************
11971206
@@ -1331,7 +1340,7 @@
13311340
1332 // If the path component begins with a slash character ("/"), then1341 // If the path component begins with a slash character ("/"), then
1333 // the reference is an absolute-path and we skip to step 7.1342 // the reference is an absolute-path and we skip to step 7.
1334 if ( (is_set(Path)) && (thePath[0] =='/') ) 1343 if ( is_set(Path) && is_absolute_path(thePath) )
1335 {1344 {
1336 invalidate_text();1345 invalidate_text();
1337 return;1346 return;
@@ -1342,12 +1351,14 @@
13421351
1343 if ( base_uri->is_set(Path) ) 1352 if ( base_uri->is_set(Path) )
1344 {1353 {
1345 zstring::size_type last_slash = base_path.rfind("/");1354 if(!is_absolute_path(thePath))
1346 if ( last_slash != zstring::npos )1355 {
1347 path = base_path.substr(0, last_slash+1);1356 zstring::size_type last_slash = base_path.rfind("/");
1348// else1357 if ( last_slash != zstring::npos )
1349// path = "/";1358 path = base_path.substr(0, last_slash+1);
13501359 // else
1360 // path = "/";
1361 }
1351 }1362 }
13521363
1353 // 6b - append the relative URI path1364 // 6b - append the relative URI path
13541365
=== modified file 'src/zorbatypes/URI.h'
--- src/zorbatypes/URI.h 2011-06-14 17:26:33 +0000
+++ src/zorbatypes/URI.h 2011-10-05 13:12:05 +0000
@@ -190,6 +190,8 @@
190 void unset_state(uint32_t s) const { theState &= ~s; }190 void unset_state(uint32_t s) const { theState &= ~s; }
191191
192 void invalidate_text() const;192 void invalidate_text() const;
193
194 bool is_absolute_path(zstring &thePath);
193};195};
194196
195197
196198
=== added file 'test/rbkt/ExpQueryResults/zorba/resolving/path_to_uri.xml.res'
--- test/rbkt/ExpQueryResults/zorba/resolving/path_to_uri.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/resolving/path_to_uri.xml.res 2011-10-05 13:12:05 +0000
@@ -0,0 +1,1 @@
1<a/>
0\ No newline at end of file2\ No newline at end of file
13
=== added directory 'test/rbkt/ExpQueryResults/zorba/string/Regex'
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a4.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a4.xml.res 2011-10-05 13:12:05 +0000
@@ -0,0 +1,1 @@
1<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:match>a<fn:group nr="1"><fn:group nr="2"/><fn:group nr="3">c</fn:group></fn:group></fn:match></fn:analyze-string-result>
0\ No newline at end of file2\ No newline at end of file
13
=== added file 'test/rbkt/Queries/zorba/resolving/path_to_uri.spec'
--- test/rbkt/Queries/zorba/resolving/path_to_uri.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/resolving/path_to_uri.spec 2011-10-05 13:12:05 +0000
@@ -0,0 +1,4 @@
1Args:
2-x
3input-context:=xs:string($RBKT_SRC_DIR/Queries/zorba/resolving/test.xml)
4
05
=== added file 'test/rbkt/Queries/zorba/resolving/path_to_uri.xq'
--- test/rbkt/Queries/zorba/resolving/path_to_uri.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/resolving/path_to_uri.xq 2011-10-05 13:12:05 +0000
@@ -0,0 +1,3 @@
1declare variable $input-context as xs:string external;
2
3fn:doc(fn:resolve-uri($input-context))
0\ No newline at end of file4\ No newline at end of file
15
=== added file 'test/rbkt/Queries/zorba/resolving/test.xml'
--- test/rbkt/Queries/zorba/resolving/test.xml 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/resolving/test.xml 2011-10-05 13:12:05 +0000
@@ -0,0 +1,1 @@
1<a/>
0\ No newline at end of file2\ No newline at end of file
13
=== added directory 'test/rbkt/Queries/zorba/string/Regex'
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a4.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a4.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a4.xq 2011-10-05 13:12:05 +0000
@@ -0,0 +1,1 @@
1fn:analyze-string("ac", "((a)|(c))+")
0\ No newline at end of file2\ No newline at end of file

Subscribers

People subscribed via source and target branches