Merge lp:~danielturcanu/zorba/mytrunk into lp:zorba

Proposed by Daniel Turcanu on 2011-10-05
Status: Superseded
Proposed branch: lp:~danielturcanu/zorba/mytrunk
Merge into: lp:zorba
Diff against target: 176 lines (+38/-11)
13 files modified
ChangeLog (+2/-0)
src/diagnostics/dict_XX_cpp.xq (+1/-0)
src/diagnostics/pregenerated/dict_en.cpp (+1/-0)
src/runtime/strings/strings_impl.cpp (+1/-1)
src/util/utf8_util_base.h (+1/-2)
src/zorbatypes/URI.cpp (+19/-8)
src/zorbatypes/URI.h (+2/-0)
test/rbkt/ExpQueryResults/zorba/resolving/path_to_uri.xml.res (+1/-0)
test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a4.xml.res (+1/-0)
test/rbkt/Queries/zorba/resolving/path_to_uri.spec (+4/-0)
test/rbkt/Queries/zorba/resolving/path_to_uri.xq (+3/-0)
test/rbkt/Queries/zorba/resolving/test.xml (+1/-0)
test/rbkt/Queries/zorba/string/Regex/regex_a4.xq (+1/-0)
To merge this branch: bzr merge lp:~danielturcanu/zorba/mytrunk
Reviewer Review Type Date Requested Status
Chris Hillery Disapprove on 2011-10-05
Matthias Brantner 2011-10-05 Pending
Review via email: mp+78253@code.launchpad.net

This proposal supersedes a proposal from 2011-10-04.

To post a comment you must log in.
Matthias Brantner (matthias-brantner) wrote : Posted in a previous version of this proposal

Please open two bugs for this and link the bug numbers from the ChangeLog. Also, please describe (or mention the bugs) in the commit message. We already went back and forth with URI fixes plenty of times and it's very confusing if it's unclear which scenario your commit is supposed to fix.

Matthias Brantner (matthias-brantner) wrote : Posted in a previous version of this proposal

See comment recently made.

review: Needs Fixing
Chris Hillery (ceejatec) wrote :

This isn't a good fix. The "path" of a URI isn't a filesystem path; it's a specific part of a URI (see RFC 3986). The path portion of a URI cannot be "absolute" or "relative". So it doesn't make sense to have this functionality in the URI class.

fn:resolve-uri() is not supposed to accept a filesystem path, so the test case path-to-uri.xq is not valid. If it DOES work on Unix as written, that's probably a different bug.

I am commenting specifically on the fix to bug 868329. I have nothing to say about the fix for bug 868325 except that it should probably be in a separate commit.

review: Disapprove
Daniel Turcanu (danielturcanu) wrote :

For Windows, the absolute path for URI is still in the form C:/, only that it looks like C3A/.
At least for file:// scheme. I guess I should add a check only for that scheme.

About absolute and relative paths in URI, think about what fn:resolve-uri works with. It can receive a relative path, or absolute path or entire URI. When receiving absolute paths, it doesn't work now in Windows for file://.

Chris Hillery (ceejatec) wrote :

The path component of a file: URI on Windows can contain C:/ (or C%3A/), but that's not directly relevant.

fn:resolve-uri() works with exactly URIs. It is not defined to accept filesystem paths. Thus, if the input is "C:/foo", it will interpret that strictly as a URI (and most likely fail since the "C:" scheme probably isn't known).

fn:doc() is an explicit exception in Zorba - it is defined (by us) to accept either a URI *or* a filesystem path. This is technically not W3C-spec compliant, but it was decided that it was necessary for a good customer experience. But fn:resolve-uri() does not have the same "magic". If you want, you could maybe make a case that it should have the same magic, but even if we agreed on that, it should use the same code from inside the implementation of fn:doc() to achieve it.

Daniel Turcanu (danielturcanu) wrote :

Ok, I guess you are right. I reread again the URI spec and it doesn't say anything about Windows paths.
But how to transform Windows filepaths into file URI in a OS independent way?
There is another function fn:encode-for-uri, which is supposed to encode a path to be used as uri, but that isn't prepared for Windows either.
fn:encode-for-uri("C:\path1\file.xml") gives C%3Apath1%2Ffile.xml.
I would expect /C%3A/path1/file.xml , meaning detect that is an absolute Windows path, and replace backslashes with slashes.
I think the problem is in this function, it should be made to work differently for Windows.

Chris Hillery (ceejatec) wrote :

The W3C doesn't care about filesystems. :) encode-for-uri() just escapes illegal characters in URIs.

Fortunately, EXPath cares: the File module has functions to go back and forth. file:path-to-uri() converts a filesystem path to a file: URI, and file:path-to-uri() (which is a bit oddly named) does the reverse.

Note that those functions don't work "in an OS independent way" as you say - they work differently depending on the platform running the script. So file:path-to-uri("C:\foo") won't work on a Unix system. I'm not sure that's a problem, but if you think you need to be able to do that kind of conversion anywhere, that would be a new feature request for EXpath to consider.

Daniel Turcanu (danielturcanu) wrote :

Maybe W3C doesn't care about file systems, but URIs do.
At least http, ftp and file all work with file system paths. We actually work with URL, not URI, we don't have general URIs so far.

I tested now file:path-to-uri and it doesn't do what I want: if I give it an absolute path, then it returns a complete URI, and if I give it a relative path, then it also returns a complete URI, with the path relative to my rbkt dir in the build directory. This seems like a wrong thing. If my rbkt dir is hardcoded in the file module, then it won't work for other users if I deploy the file module.

I still think that this is a job for fn:encode-for-uri. I need a function to encode a file system path into a path for URI. So if it is absolute path, make it absolute path for URI, not a full URI, and if it is relative path, make it relative path for URI.
And then this path I can apply to fn:resolve-uri.

Chris Hillery (ceejatec) wrote :

HTTP and FTP URIs certainly don't work with filesystem paths. File: URIs only do by convention, and there are (somewhat hand-wavy) rules for mapping filesystem paths to file: URIs. But a filesystem path is-not-a file: URI, and so a filesystem path is never an appropriate argument for fn:resolve-uri() or in fact for any other function defined by XQuery F&O.

Also, URLs are no different than URIs except that URLs are expected to directly point to a resource, where a generic URI may simply name the resource. But they follow the same rules. Neither of them have any relationship to filesystem paths. I'm not saying this is good or bad, it's simply true. And Zorba most certainly works with full URIs, not just URLs.

file:path-to-uri() is documented to resolve relative filesystem paths using the current working directory as the base. So you're seeing your RBKT dir because that's the current working directory for testdriver, I suspect. Again, I'm not saying this is good or bad, it's simply true. I would certainly support you if you wanted to say that there should be a two-argument form of this function that allowed you to supply a different base URI or path, but you'll need to run that through the EXPath process (whatever that is).

Finally, regarding fn:encode-for-uri():

  http://www.w3.org/TR/xpath-functions/#func-encode-for-uri

It only escapes *characters* in a string which are illegal in the path segment of a URI. It is not intended in any way to convert a filesystem path to a valid URI (relative or otherwise).

Daniel Turcanu (danielturcanu) wrote :

Please stop giving arguments from the spec. I also know the spec.
I am trying to come with a solution to the problem. This problem cannot be solved by sticking to the spec.

So I want like this: have a variable that contains an absolute or relative file system path, and resolve it to a base absolute URI. There should be no if(is_absolute_path) and no if(win32) in the xquery code. Right now this doesn't work, because our URI implementation considers the absolute Win32 path as a relative path, because it doesn't start with slash.

Maybe we shoud add another function in the file module, like file:path-to-uri-path, which should add a slash for absolute win32 paths, and leave the relative paths as they are.

Chris Hillery (ceejatec) wrote :

Sorry, but we implement a spec. You've suggested at least two changes which would break our compliance with that spec, so I don't have much choice but to quote the spec back at you.

It sounds like what you want would be best served by an extension to the Expath file module. As I said in my previous note, I'd certainly support you if you wanted to propose adding the two-argument form of file:path-to-uri(). It would be easy to implement, since we already have this functionality in the C++ API (zorba::filesystem_path::normalize_path()). Alternately, you could introduce a new internal module for your needs, or even add a "change working directory" function to the system module - I'd ask on zorba-coders for comments if you're considering either of those options.

Daniel Turcanu (danielturcanu) wrote :

So you propose a function like file:resolve-path-to-uri($baseUri, $filePath) ?
I guess it's fine too.

lp:~danielturcanu/zorba/mytrunk updated on 2011-11-23
10473. By Daniel Turcanu on 2011-10-07

Reverted changes in URI

10474. By Daniel Turcanu on 2011-10-07

Updated to trunk

10475. By Daniel Turcanu on 2011-10-11

Update to trunk

10476. By Daniel Turcanu on 2011-10-11

Update to trunk

10477. By Daniel Turcanu on 2011-10-11

Reverted change to http_client

10478. By Daniel Turcanu on 2011-10-12

Update to trunk

10479. By Daniel Turcanu on 2011-10-12

Deleted blank uri test

10480. By Daniel Turcanu on 2011-10-12

Update to trunk

10481. By Daniel Turcanu on 2011-10-24

Update to trunk

10482. By Daniel Turcanu on 2011-11-23

Update to trunk

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ChangeLog'
2--- ChangeLog 2011-10-04 18:22:59 +0000
3+++ ChangeLog 2011-10-05 13:12:05 +0000
4@@ -41,6 +41,8 @@
5 * Fixed bug #863730 (static delete-node* functions don't raise ZDDY0012)
6 * Implemented the probe-index-range-value for general indexes
7 * Fixed bug #867662 ("nullptr" warning)
8+ * Fixed bug #868329 (URI resolver fails on Windows with absolute paths)
9+ * Fixed bug #868325 (fn:analyze-string fails with some recursive subgroups)
10
11 version 2.0.1
12
13
14=== modified file 'src/diagnostics/dict_XX_cpp.xq'
15--- src/diagnostics/dict_XX_cpp.xq 2011-08-05 02:21:55 +0000
16+++ src/diagnostics/dict_XX_cpp.xq 2011-10-05 13:12:05 +0000
17@@ -64,6 +64,7 @@
18 return string-join(
19 ( util:copyright(),
20 '#include "stdafx.h"',
21+ '#include "zorba/config.h"',
22 '#include "diagnostics/dict_impl.h"',
23 '',
24 'namespace zorba {',
25
26=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
27--- src/diagnostics/pregenerated/dict_en.cpp 2011-10-04 05:28:07 +0000
28+++ src/diagnostics/pregenerated/dict_en.cpp 2011-10-05 13:12:05 +0000
29@@ -20,6 +20,7 @@
30 */
31
32 #include "stdafx.h"
33+#include "zorba/config.h"
34 #include "diagnostics/dict_impl.h"
35
36 namespace zorba {
37
38=== modified file 'src/runtime/strings/strings_impl.cpp'
39--- src/runtime/strings/strings_impl.cpp 2011-08-10 18:58:11 +0000
40+++ src/runtime/strings/strings_impl.cpp 2011-10-05 13:12:05 +0000
41@@ -1688,7 +1688,7 @@
42 GENV_ITEMFACTORY->createString(strid_item, zstrid);
43 store::Item_t id_attrib_item;
44 GENV_ITEMFACTORY->createAttributeNode(id_attrib_item, group_elem.getp(), nr_attrib_name, untyped_type_name, strid_item);
45- if(match_startg < 0)
46+ if((match_startg < 0) || (match_startg < match_endgood))
47 continue;
48 match_endgood = match_endg;
49 if((i+1)<nr_pattern_groups)
50
51=== modified file 'src/util/utf8_util_base.h'
52--- src/util/utf8_util_base.h 2011-07-17 20:05:49 +0000
53+++ src/util/utf8_util_base.h 2011-10-05 13:12:05 +0000
54@@ -148,8 +148,7 @@
55 * @return Returns the Unicode code-point of the next character.
56 */
57 template<class OctetIterator>
58-ZORBA_DLL_PUBLIC
59-unicode::code_point next_char( OctetIterator &i );
60+ZORBA_DLL_PUBLIC unicode::code_point next_char( OctetIterator &i );
61
62 /**
63 * Decodes the previous Unicode character.
64
65=== modified file 'src/zorbatypes/URI.cpp'
66--- src/zorbatypes/URI.cpp 2011-06-24 23:00:33 +0000
67+++ src/zorbatypes/URI.cpp 2011-10-05 13:12:05 +0000
68@@ -1191,7 +1191,16 @@
69 return is_set(Scheme) && !theScheme.empty();
70 }
71
72-
73+bool URI::is_absolute_path(zstring &thePath)
74+{
75+#ifndef WIN32
76+ return thePath[0] == '/';
77+#else
78+ return (thePath.length() > 2) &&
79+ ((thePath[1] == ':') ||
80+ ((thePath[1] == '%') && (thePath[2] == '3') && (thePath[3] == 'A')));
81+#endif
82+}
83
84 /*******************************************************************************
85
86@@ -1331,7 +1340,7 @@
87
88 // If the path component begins with a slash character ("/"), then
89 // the reference is an absolute-path and we skip to step 7.
90- if ( (is_set(Path)) && (thePath[0] =='/') )
91+ if ( is_set(Path) && is_absolute_path(thePath) )
92 {
93 invalidate_text();
94 return;
95@@ -1342,12 +1351,14 @@
96
97 if ( base_uri->is_set(Path) )
98 {
99- zstring::size_type last_slash = base_path.rfind("/");
100- if ( last_slash != zstring::npos )
101- path = base_path.substr(0, last_slash+1);
102-// else
103-// path = "/";
104-
105+ if(!is_absolute_path(thePath))
106+ {
107+ zstring::size_type last_slash = base_path.rfind("/");
108+ if ( last_slash != zstring::npos )
109+ path = base_path.substr(0, last_slash+1);
110+ // else
111+ // path = "/";
112+ }
113 }
114
115 // 6b - append the relative URI path
116
117=== modified file 'src/zorbatypes/URI.h'
118--- src/zorbatypes/URI.h 2011-06-14 17:26:33 +0000
119+++ src/zorbatypes/URI.h 2011-10-05 13:12:05 +0000
120@@ -190,6 +190,8 @@
121 void unset_state(uint32_t s) const { theState &= ~s; }
122
123 void invalidate_text() const;
124+
125+ bool is_absolute_path(zstring &thePath);
126 };
127
128
129
130=== added file 'test/rbkt/ExpQueryResults/zorba/resolving/path_to_uri.xml.res'
131--- test/rbkt/ExpQueryResults/zorba/resolving/path_to_uri.xml.res 1970-01-01 00:00:00 +0000
132+++ test/rbkt/ExpQueryResults/zorba/resolving/path_to_uri.xml.res 2011-10-05 13:12:05 +0000
133@@ -0,0 +1,1 @@
134+<a/>
135\ No newline at end of file
136
137=== added directory 'test/rbkt/ExpQueryResults/zorba/string/Regex'
138=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a4.xml.res'
139--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a4.xml.res 1970-01-01 00:00:00 +0000
140+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a4.xml.res 2011-10-05 13:12:05 +0000
141@@ -0,0 +1,1 @@
142+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:match>a<fn:group nr="1"><fn:group nr="2"/><fn:group nr="3">c</fn:group></fn:group></fn:match></fn:analyze-string-result>
143\ No newline at end of file
144
145=== added file 'test/rbkt/Queries/zorba/resolving/path_to_uri.spec'
146--- test/rbkt/Queries/zorba/resolving/path_to_uri.spec 1970-01-01 00:00:00 +0000
147+++ test/rbkt/Queries/zorba/resolving/path_to_uri.spec 2011-10-05 13:12:05 +0000
148@@ -0,0 +1,4 @@
149+Args:
150+-x
151+input-context:=xs:string($RBKT_SRC_DIR/Queries/zorba/resolving/test.xml)
152+
153
154=== added file 'test/rbkt/Queries/zorba/resolving/path_to_uri.xq'
155--- test/rbkt/Queries/zorba/resolving/path_to_uri.xq 1970-01-01 00:00:00 +0000
156+++ test/rbkt/Queries/zorba/resolving/path_to_uri.xq 2011-10-05 13:12:05 +0000
157@@ -0,0 +1,3 @@
158+declare variable $input-context as xs:string external;
159+
160+fn:doc(fn:resolve-uri($input-context))
161\ No newline at end of file
162
163=== added file 'test/rbkt/Queries/zorba/resolving/test.xml'
164--- test/rbkt/Queries/zorba/resolving/test.xml 1970-01-01 00:00:00 +0000
165+++ test/rbkt/Queries/zorba/resolving/test.xml 2011-10-05 13:12:05 +0000
166@@ -0,0 +1,1 @@
167+<a/>
168\ No newline at end of file
169
170=== added directory 'test/rbkt/Queries/zorba/string/Regex'
171=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a4.xq'
172--- test/rbkt/Queries/zorba/string/Regex/regex_a4.xq 1970-01-01 00:00:00 +0000
173+++ test/rbkt/Queries/zorba/string/Regex/regex_a4.xq 2011-10-05 13:12:05 +0000
174@@ -0,0 +1,1 @@
175+fn:analyze-string("ac", "((a)|(c))+")
176\ No newline at end of file

Subscribers

People subscribed via source and target branches