Merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module
- data-cleaning
- Merge into data-cleaning-module
Status: | Merged |
---|---|
Approved by: | Matthias Brantner |
Approved revision: | 38 |
Merged at revision: | 33 |
Proposed branch: | lp:~diogo-simoes89/zorba/data-cleaning |
Merge into: | lp:zorba/data-cleaning-module |
Diff against target: |
882 lines (+374/-144) 12 files modified
src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq (+103/-55) src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq (+260/-82) src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq (+1/-1) test/ExpQueryResults/data-cleaning/conversion/address-from-user.xml.res (+1/-1) test/ExpQueryResults/data-cleaning/conversion/geocode-from-address.xml.res (+1/-1) test/ExpQueryResults/data-cleaning/conversion/phone-from-user.xml.res (+1/-1) test/ExpQueryResults/data-cleaning/conversion/user-from-phone.xml.res (+1/-1) test/ExpQueryResults/data-cleaning/normalization/to-date.xml.res (+1/-0) test/ExpQueryResults/data-cleaning/normalization/to-dateTime.xml.res (+1/-0) test/ExpQueryResults/data-cleaning/normalization/to-time.xml.res (+1/-0) test/Queries/data-cleaning/conversion/geocode-from-address.xq (+3/-1) test/Queries/data-cleaning/conversion/unit-convert.spec (+0/-1) |
To merge this branch: | bzr merge lp:~diogo-simoes89/zorba/data-cleaning |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Matthias Brantner | Approve | ||
Bruno Martins | Approve | ||
Review via email: mp+79530@code.launchpad.net |
Commit message
Changes on normalization functions (to-time, to-dateTime, to date)
Changes on conversion expected results (address-
Description of the change
Changes on normalization functions:
- to-dateTime: uncomment the function, resolve the bugs
- to-time: uncomment the function, resolve the bugs
- implementation of check-functions that verifies the if a string corresponds to a xs:date, xs: time or xs:dateTime
Changes on conversion tests (changing the test result):
- address-from-user
- phone-from-user
- user-from-phone
Zorba Build Bot (zorba-buildbot) wrote : | # |
Zorba Build Bot (zorba-buildbot) wrote : | # |
The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.
CMake Error at /home/ceej/
Validation queue job data-cleaning-
The final status was:
3 tests did not succeed - changes not commited.
Error in read script: /home/ceej/
Zorba Build Bot (zorba-buildbot) wrote : | # |
There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue starting for merge proposal.
Log at: http://
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue job data-cleaning-
All tests succeeded!
Zorba Build Bot (zorba-buildbot) wrote : | # |
Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1. Got: 1 Pending.
Bruno Martins (bgmartins) wrote : | # |
Some minor things that should be changed before approving the merge:
* The documentation for functions like conversion:
* The documentation for functions like normalization:
* The documentation for the private functions like normalization:
invokes these functions.
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue starting for merge proposal.
Log at: http://
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue job data-cleaning-
All tests succeeded!
Zorba Build Bot (zorba-buildbot) wrote : | # |
Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1. Got: 1 Needs Fixing.
Bruno Martins (bgmartins) wrote : | # |
Checked the latest revisions from Diogo and they seem ok.
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue starting for merge proposal.
Log at: http://
Zorba Build Bot (zorba-buildbot) wrote : | # |
The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.
CMake Error at /home/ceej/
Validation queue job data-cleaning-
The final status was:
2 tests did not succeed - changes not commited.
Error in read script: /home/ceej/
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue starting for merge proposal.
Log at: http://
Zorba Build Bot (zorba-buildbot) wrote : | # |
The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.
CMake Error at /home/ceej/
Validation queue job data-cleaning-
The final status was:
1 tests did not succeed - changes not commited.
Error in read script: /home/ceej/
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue starting for merge proposal.
Log at: http://
Zorba Build Bot (zorba-buildbot) wrote : | # |
The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.
CMake Error at /home/ceej/
Validation queue job data-cleaning-
The final status was:
4 tests did not succeed - changes not commited.
Error in read script: /home/ceej/
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue starting for merge proposal.
Log at: http://
Zorba Build Bot (zorba-buildbot) wrote : | # |
The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.
CMake Error at /home/ceej/
Validation queue job data-cleaning-
The final status was:
3 tests did not succeed - changes not commited.
Error in read script: /home/ceej/
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue starting for merge proposal.
Log at: http://
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue job data-cleaning-
All tests succeeded!
Zorba Build Bot (zorba-buildbot) wrote : | # |
Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1. Got: 1 Approve.
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue starting for merge proposal.
Log at: http://
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue job data-cleaning-
All tests succeeded!
Zorba Build Bot (zorba-buildbot) wrote : | # |
Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1. Got: 1 Approve.
Matthias Brantner (matthias-brantner) : | # |
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue starting for merge proposal.
Log at: http://
Zorba Build Bot (zorba-buildbot) wrote : | # |
Validation queue job data-cleaning-
All tests succeeded!
Preview Diff
1 | === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq' |
2 | --- src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq 2011-08-16 23:45:59 +0000 |
3 | +++ src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq 2011-11-16 18:49:23 +0000 |
4 | @@ -35,6 +35,8 @@ |
5 | |
6 | import module namespace http = "http://www.zorba-xquery.com/modules/http-client"; |
7 | |
8 | +import module namespace reflection = "http://www.zorba-xquery.com/modules/reflection"; |
9 | + |
10 | declare namespace ver = "http://www.zorba-xquery.com/options/versioning"; |
11 | declare option ver:module-version "2.0"; |
12 | |
13 | @@ -45,10 +47,6 @@ |
14 | : Uses a White-pages Web service to discover information about a given name, |
15 | : returning a sequence of strings for the phone numbers associated to the name. |
16 | : |
17 | - : <br/> |
18 | - : Example usage : <pre> phone-from-user ('Maria Lurdes') </pre> |
19 | - : <br/> |
20 | - : The function invocation in the example above returns : <pre> (716) 686-4500 </pre> |
21 | : |
22 | : @param $name The name of person or organization. |
23 | : @return A sequence of strings for the phone numbers associated to the name. |
24 | @@ -66,11 +64,6 @@ |
25 | : Uses a White-pages Web service to discover information about a given name, |
26 | : returning a sequence of strings for the addresses associated to the name. |
27 | : |
28 | - : <br/> |
29 | - : Example usage : <pre> address-from-user ('Maria Lurdes') </pre> |
30 | - : <br/> |
31 | - : The function invocation in the example above returns : <pre> 222 E 53rd St, Los Angeles, CA, US </pre> |
32 | - : <pre> 3362 Walden Ave, Depew, NY, US </pre> |
33 | : |
34 | : @param $name The name of person or organization. |
35 | : @return A sequence of strings for the addresses associated to the name. |
36 | @@ -93,11 +86,6 @@ |
37 | : Uses a White-pages Web service to discover information about a given phone number, |
38 | : returning a sequence of strings for the name associated to the phone number. |
39 | : |
40 | - : <br/> |
41 | - : Example usage : <pre> user-from-phone ('8654582358') </pre> |
42 | - : <br/> |
43 | - : The function invocation in the example above returns : <pre> Homer Simpson </pre> |
44 | - : <pre> Sue M Simpson </pre> |
45 | : |
46 | : @param $phone-number A string with 10 digits corresponding to the phone number. |
47 | : @return A sequence of strings for the person or organization's name associated to the phone number. |
48 | @@ -113,10 +101,6 @@ |
49 | : Uses a White-pages Web service to discover information about a given phone number, |
50 | : returning a string for the address associated to the phone number. |
51 | : |
52 | - : <br/> |
53 | - : Example usage : <pre> address-from-phone ('8654582358') </pre> |
54 | - : <br/> |
55 | - : The function invocation in the example above returns : <pre> 4610 Harrison Bend Rd, Loudon, TN, US </pre> |
56 | : |
57 | : @param $phone-number A string with 10 digits corresponding to the phone number. |
58 | : @return A string for the addresses associated to the phone number. |
59 | @@ -139,10 +123,6 @@ |
60 | : Uses a White-pages Web service to discover information about a given address, |
61 | : returning a sequence of strings for the names associated to the address. |
62 | : |
63 | - : <br/> |
64 | - : Example usage : <pre> user-from-address('5655 E Gaskill Rd, Willcox, AZ, US') </pre> |
65 | - : <br/> |
66 | - : The function invocation in the example above returns : <pre> Stan Smith </pre> |
67 | : |
68 | : @param $address A string corresponding to the address (ex: 5655 E Gaskill Rd, Willcox, AZ, US). |
69 | : @return A sequence of strings for the person or organization's names associated to the address. |
70 | @@ -169,10 +149,6 @@ |
71 | : Uses a White-pages Web service to discover information about a given address, |
72 | : returning a sequence of strings for the phone number associated to the address. |
73 | : |
74 | - : <br/> |
75 | - : Example usage : <pre> phone-from-address('5655 E Gaskill Rd, Willcox, AZ, US') </pre> |
76 | - : <br/> |
77 | - : The function invocation in the example above returns : <pre> (520) 824-3160 </pre> |
78 | : |
79 | : @param $address A string corresponding to the address (ex: 5655 E Gaskill Rd, Willcox, AZ, US). |
80 | : @return A sequence of strings for the phone number or organization's names associated to the address. |
81 | @@ -206,41 +182,122 @@ |
82 | (:~ |
83 | : Conversion function for units of measurement, acting as a wrapper over the CuppaIT WebService. |
84 | : <br/> |
85 | - : WebService documentation at http://www.cuppait.com/UnitConversionGateway-war/UnitConversion?format=XML |
86 | : |
87 | - : <br/> |
88 | - : Example usage : <pre> unit-convert ( 1 , "Distance", "mile", "kilometer" ) </pre> |
89 | - : <br/> |
90 | - : The function invocation in the example above returns : <pre> 1.609344 </pre> |
91 | : |
92 | : @param $v The amount we wish to convert. |
93 | : @param $t The type of metric (e.g., "Distance") |
94 | : @param $m1 The source measurement unit metric (e.g., "meter") |
95 | : @param $m2 The target measurement unit metric (e.g., "mile") |
96 | : @return The value resulting from the conversion |
97 | - : @error conversion:notsupported if the type of metric, the source unit or the target unit are not known to the service. |
98 | - : @see http://www.cuppait.com/UnitConversionGateway-war/UnitConversion?format=XML |
99 | : @example test/Queries/data-cleaning/conversion/unit-convert.xq |
100 | :) |
101 | declare %ann:nondeterministic function conversion:unit-convert ( $v as xs:double, $t as xs:string, $m1 as xs:string, $m2 as xs:string ) { |
102 | - let $url := "http://www.cuppait.com/UnitConversionGateway-war/UnitConversion?format=XML" |
103 | - let $ctype := concat("ctype=",$t) |
104 | - let $cfrom := concat("cfrom=",$m1) |
105 | - let $cto := concat("cto=",$m2) |
106 | - let $camount := concat("camount=",$v) |
107 | - let $par := string-join(($url,$ctype,$cfrom,$cto,$camount),"&") |
108 | - let $result := data(http:get-node($par)[2]) |
109 | - return if (matches(data($result),"-?[0-9]+(\.[0-9]+)?")) then data($result) |
110 | - else (error(QName('http://www.zorba-xquery.com/modules/data-cleaning/conversion', 'conversion:notsupported'), data($result))) |
111 | + if ( $m1 = $m2 ) then $v else |
112 | + |
113 | +let $conversion-table := |
114 | + <unit-conversion-rules> |
115 | + <unit type="Distance" from="mile" to="kilometer" value="1.609344" /> |
116 | + <unit type="Distance" from="mile" to="angstrom" value="16100000000000" /> |
117 | + <unit type="Distance" from="mile" to="picometer" value="1610000000000000" /> |
118 | + <unit type="Distance" from="mile" to="nanometer" value="1610000000000" /> |
119 | + <unit type="Distance" from="mile" to="microometer" value="1610000000" /> |
120 | + <unit type="Distance" from="mile" to="millimeter" value="1610000" /> |
121 | + <unit type="Distance" from="mile" to="centimeter" value="161000" /> |
122 | + <unit type="Distance" from="mile" to="meter" value="1610" /> |
123 | + <unit type="Distance" from="mile" to="inch" value="63400" /> |
124 | + <unit type="Distance" from="mile" to="feet" value="5280" /> |
125 | + <unit type="Distance" from="kilometer" to="meter" value="1000" /> |
126 | + <unit type="Distance" from="kilometer" to="picometer" value="1000000000000000" /> |
127 | + <unit type="Distance" from="kilometer" to="angstrom" value="10000000000000" /> |
128 | + <unit type="Distance" from="kilometer" to="nanometer" value="1000000000000" /> |
129 | + <unit type="Distance" from="kilometer" to="micrometer" value="1000000000" /> |
130 | + <unit type="Distance" from="kilometer" to="millimeter" value="1000000" /> |
131 | + <unit type="Distance" from="kilometer" to="centimeter" value="100000" /> |
132 | + <unit type="Distance" from="kilometer" to="inch" value="39400" /> |
133 | + <unit type="Distance" from="kilometer" to="feet" value="3280" /> |
134 | + <unit type="Distance" from="meter" to="centimeter" value="100" /> |
135 | + <unit type="Distance" from="meter" to="picometer" value="1000000000000" /> |
136 | + <unit type="Distance" from="meter" to="angstrom" value="10000000000" /> |
137 | + <unit type="Distance" from="meter" to="nanometer" value="1000000000" /> |
138 | + <unit type="Distance" from="meter" to="micrometer" value="1000000" /> |
139 | + <unit type="Distance" from="meter" to="millimeter" value="1000" /> |
140 | + <unit type="Distance" from="meter" to="inch" value="39.4" /> |
141 | + <unit type="Distance" from="meter" to="feet" value="3.28" /> |
142 | + <unit type="Distance" from="centimeter" to="millimeter" value="10" /> |
143 | + <unit type="Distance" from="millimeter" to="micrometer" value="1000" /> |
144 | + <unit type="Distance" from="micrometer" to="nanometer" value="1000" /> |
145 | + <unit type="Distance" from="nanometer" to="angstrom" value="10" /> |
146 | + <unit type="Distance" from="angstrom" to="picometer" value="100" /> |
147 | + <unit type="Distance" from="inch" to="feet" value="0.0833" /> |
148 | + <unit type="Mass" from="tons" to="kilograms" value="907.18474" /> |
149 | + <unit type="Mass" from="tons" to="pounds" value="2000" /> |
150 | + <unit type="Mass" from="tons" to="ounces" value="32000" /> |
151 | + <unit type="Mass" from="tons" to="grams" value="907184.74" /> |
152 | + <unit type="Mass" from="tons" to="milligrams" value="907184740" /> |
153 | + <unit type="Mass" from="kilograms" to="pounds" value="2.2046226" /> |
154 | + <unit type="Mass" from="kilograms" to="grams" value="1000" /> |
155 | + <unit type="Mass" from="kilograms" to="milligrams" value="1000000" /> |
156 | + <unit type="Mass" from="grams" to="milligrams" value="1000" /> |
157 | + <unit type="Mass" from="pounds" to="ounces" value="16" /> |
158 | + <unit type="Mass" from="pounds" to="grams" value="453.59237" /> |
159 | + <unit type="Mass" from="pounds" to="milligrams" value="453592.37" /> |
160 | + <unit type="Mass" from="ounces" to="kilograms" value="0.028349523" /> |
161 | + <unit type="Mass" from="ounces" to="grams" value="28.349523" /> |
162 | + <unit type="Mass" from="ounces" to="milligrams" value="28349.523" /> |
163 | + <unit type="Volume" from="liters" to="cubic centimeters" value="1000" /> |
164 | + <unit type="Energy" from="jouls" to="calories" value="0.239" /> |
165 | + <unit type="Pressure" from="pascals" to="kilopascals" value="0.001" /> |
166 | + <unit type="Pressure" from="pascals" to="bars" value="0.000001" /> |
167 | + <unit type="Pressure" from="pascals" to="mmHg" value="0.00750064" /> |
168 | + <unit type="Pressure" from="pascals" to="torrs" value="0.00750064" /> |
169 | + <unit type="Pressure" from="atmospheres" to="pascals" value="101325" /> |
170 | + <unit type="Pressure" from="atmospheres" to="kilopascals" value="101.325" /> |
171 | + <unit type="Pressure" from="atmospheres" to="bars" value="1.01325" /> |
172 | + <unit type="Pressure" from="atmospheres" to="mmHg" value="760" /> |
173 | + <unit type="Pressure" from="atmospheres" to="torrs" value="760" /> |
174 | + <unit type="Pressure" from="atmospheres" to="psi" value="14.7" /> |
175 | + <unit type="Pressure" from="psi" to="pascals" value="6890" /> |
176 | + <unit type="Pressure" from="psi" to="kilopascals" value="6.89" /> |
177 | + <unit type="Pressure" from="psi" to="bars" value="0.0689" /> |
178 | + <unit type="Pressure" from="psi" to="mmHg" value="51.7" /> |
179 | + <unit type="Pressure" from="psi" to="torrs" value="51.7" /> |
180 | + <unit type="Pressure" from="bars" to="kilopascals" value="100" /> |
181 | + <unit type="Pressure" from="bars" to="mmHg" value="750.064" /> |
182 | + <unit type="Pressure" from="bars" to="torrs" value="750.064" /> |
183 | + <unit type="Pressure" from="kilopascals" to="mmHg" value="7.50064" /> |
184 | + <unit type="Pressure" from="kilopascals" to="torrs" value="7.50064" /> |
185 | + <unit type="Pressure" from="mmHg" to="torrs" value="1" /> |
186 | + <unit type="Temperature" from="celsius" to="fahrenheit" value="* 9 div 5 + 32" /> |
187 | + <unit type="Temperature" from="celsius" to="kelvin" value="+ 273.15" /> |
188 | + <unit type="Temperature" from="kelvin" to="celsius" value="- 273.15" /> |
189 | + <unit type="Temperature" from="kelvin" to="fahrenheit" value="* 9 div 5 - 273.15 * 9 div 5 + 32" /> |
190 | + <unit type="Temperature" from="fahrenheit" to="celsius" value="* 5 div 9 - 32 * 5 div 9" /> |
191 | + <unit type="Temperature" from="fahrenheit" to="kelvin" value="* 5 div 9 - 32 * 5 div 9 + 273.15" /> |
192 | +</unit-conversion-rules> |
193 | + |
194 | +let $from := $conversion-table/unit[@type=$t and @from=$m1] | |
195 | + ( for $it in $conversion-table/unit[@type=$t and @to=$m1] return |
196 | + if (compare($t, "Temperature") != 0) then |
197 | + copy $aux := $it |
198 | + modify ( |
199 | + replace value of node $aux/@value with 1.0 div $aux/@value, |
200 | + replace value of node $aux/@from with $aux/@to, |
201 | + replace value of node $aux/@to with $aux/@from |
202 | + ) |
203 | + return $aux |
204 | + else() |
205 | + ) |
206 | + |
207 | +return |
208 | +if (compare($t, "Temperature") = 0) then reflection:eval(concat($v , $conversion-table//unit[@from=$m1][@to=$m2]/@value)) |
209 | +else |
210 | + if ( $from[@to=$m2]) then ( $v * $from[@to=$m2]/@value ) |
211 | + else ( for $i in $from return conversion:unit-convert ( $v * $i/@value , $t , $i/@to , $m2 ) )[1] |
212 | }; |
213 | |
214 | (:~ |
215 | : Placename to geospatial coordinates converter, acting as a wrapper over the Yahoo! geocoder service. |
216 | : |
217 | - : <br/> |
218 | - : Example usage : <pre> geocode-from-address ( ("Lisboa", "Portugal") ) </pre> |
219 | - : <br/> |
220 | - : The function invocation in the example above returns : <pre> ( 38.725735 , -9.15021 ) </pre> |
221 | : |
222 | : @param $q A sequence of strings corresponding to the different components (e.g., street, city, country, etc.) of the place name. |
223 | : @return The pair of latitude and longitude coordinates associated with the input address. |
224 | @@ -258,10 +315,6 @@ |
225 | (:~ |
226 | : Geospatial coordinates to placename converter, acting as a wrapper over the Yahoo! reverse geocoder service. |
227 | : |
228 | - : <br/> |
229 | - : Example usage : <pre> address-from-geocode ( 38.725735 , -9.15021 ) </pre> |
230 | - : <br/> |
231 | - : The function invocation in the example above returns : <pre> ( 'Portugal' , 'Lisbon' , 'praca Marques de Pombal' ) </pre> |
232 | : |
233 | : @param $lat Geospatial latitude. |
234 | : @param $lon Geospatial longitude. |
235 | @@ -288,10 +341,6 @@ |
236 | : |
237 | : WebService documentation at http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html |
238 | : |
239 | - : <br/> |
240 | - : Example usage : <pre> currency-convert ( 1, "USD", "EUR", "2011-01-18" ) </pre> |
241 | - : <br/> |
242 | - : The function invocation in the example above returns : <pre> 0.747887218607434 </pre> |
243 | : |
244 | : @param $v The amount we wish to convert. |
245 | : @param $m1 The source currency (e.g., "EUR"). |
246 | @@ -356,4 +405,3 @@ |
247 | declare function conversion:name-from-domain ( $domain as xs:string ) { |
248 | () |
249 | }; |
250 | - |
251 | |
252 | === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq' |
253 | --- src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq 2011-08-01 11:26:53 +0000 |
254 | +++ src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq 2011-11-16 18:49:23 +0000 |
255 | @@ -40,10 +40,6 @@ |
256 | : Converts a given string representation of a date value into a date representation valid according |
257 | : to the corresponding XML Schema type. |
258 | : |
259 | - : <br/> |
260 | - : Example usage : <pre> to-date ( "24OCT2002" , "%d%b%Y" ) </pre> |
261 | - : <br/> |
262 | - : The function invocation in the example above returns : <pre> 2002-10-24 </pre> |
263 | : |
264 | : @param $sd The string representation for the date |
265 | : @param $format An optional parameter denoting the format used to represent the date in the string, according to a |
266 | @@ -51,16 +47,10 @@ |
267 | : by a single letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion |
268 | : specification is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows: |
269 | : <pre> |
270 | - : '%a' Abbreviated weekday name in the current locale.<br/> |
271 | - : '%A' Full weekday name in the current locale.<br/> |
272 | : '%b' Abbreviated month name in the current locale.<br/> |
273 | : '%B' Full month name in the current locale.<br/> |
274 | : '%d' Day of the month as decimal number (01-31).<br/> |
275 | - : '%j' Day of year as decimal number (001-366).<br/> |
276 | : '%m' Month as decimal number (01-12).<br/> |
277 | - : '%U' Week of the year as decimal number (00-53) using Sunday as the first day of the week (and typically with the first Sunday of the year as day 1 of week 1). This is the US convention.<br/> |
278 | - : '%w' Weekday as decimal number (0-6, Sunday is 0).<br/> |
279 | - : '%W' Week of the year as decimal number (00-53) using Monday as the first day of the week (and typically with the first Monday of the year as day 1 of week 1). This is the UK convention.<br/> |
280 | : '%x' Date, locale-specific.<br/> |
281 | : '%y' Year without century (00-99).<br/> |
282 | : '%Y' Year with century.<br/> |
283 | @@ -68,23 +58,20 @@ |
284 | : '%D' Locale-specific date format such as '%m/%d/%y'.<br/> |
285 | : '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number.<br/> |
286 | : '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format).<br/> |
287 | - : '%g' The last two digits of the week-based year (see '%V').<br/> |
288 | - : '%G' The week-based year (see '%V') as a decimal number.<br/> |
289 | - : '%h' Equivalent to '%b'.<br/> |
290 | - : '%u' Weekday as a decimal number (1-7, Monday is 1).<br/> |
291 | - : '%V' Week of the year as decimal number (00-53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1. |
292 | + : '%h' Equivalent to '%b'.<br/> |
293 | :</pre> |
294 | : |
295 | : @return The date value resulting from the conversion. |
296 | - : <br/><br/><b> Attention : This function is still not implemented. </b> <br/> |
297 | + : @example test/Queries/data-cleaning/normalization/to-date.xq |
298 | :) |
299 | declare function normalization:to-date ( $sd as xs:string, $format as xs:string? ) as xs:string{ |
300 | -(: |
301 | + |
302 | + |
303 | let $dictionary := normalization:month-dictionary() |
304 | - let $format-tokens := tokenize($format, "%")[position()>1] |
305 | + let $format-tokens := tokenize($format, "[ %\-/:]+")[position()>1] |
306 | let $sd-tokens := |
307 | if (contains($sd, "-") or contains($sd, "/") or contains($sd, " ")) |
308 | - then tokenize ($sd, "[ \-/]") |
309 | + then tokenize ($sd, "[ \-/]+") |
310 | else let $ydtoken := tokenize(replace($sd, "[A-Za-z]", " "), " ") |
311 | let $ft := $ydtoken[position()=1] |
312 | let $lt := $ydtoken[last()] |
313 | @@ -154,25 +141,15 @@ |
314 | |
315 | let $result := concat($year, "-", $month, "-", $day) |
316 | |
317 | - return |
318 | - |
319 | - if (matches(string($result),"[0-9]+-((0[1-9])|(1[0-2]))-((0[1-9])|([12][0-9])|(3[01]))")) |
320 | - then $result |
321 | - else |
322 | - (error(QName('http://www.zorba-xquery.com/modules/data-cleaning/normalization', |
323 | - 'err:notsupported'),data(concat($result, " - ", concat("year: ", $year), concat(" month: ", $month), concat(" day:", $day))))) |
324 | + return normalization:check-date($result) |
325 | else() |
326 | - :)"" |
327 | + |
328 | }; |
329 | |
330 | (:~ |
331 | : Converts a given string representation of a time value into a time representation valid according to |
332 | : the corresponding XML Schema type. |
333 | : |
334 | - : <br/> |
335 | - : Example usage : <pre> to-time ( "09 hours 10 minutes" , "%H hours %M minutes" ) </pre> |
336 | - : <br/> |
337 | - : The function invocation in the example above returns : <pre> 09:10:00 </pre> |
338 | : |
339 | : @param $sd The string representation for the time. |
340 | : @param $format An optional parameter denoting the format used to represent the time in the string, according to a sequence of |
341 | @@ -197,10 +174,12 @@ |
342 | :</pre> |
343 | : |
344 | : @return The time value resulting from the conversion. |
345 | + : @example test/Queries/data-cleaning/normalization/to-time.xq |
346 | :) |
347 | -declare function normalization:to-time ( $sd as xs:string, $format as xs:string? ) as xs:string{ |
348 | +declare function normalization:to-time ( $sd as xs:string, $format as xs:string? ) as xs:string?{ |
349 | let $timezoneDict := normalization:timeZone-dictionary() |
350 | - let $format-tokens := tokenize($format, "%")[position()>1] |
351 | + let $format-string := replace(replace ($format, '%R', '%H:%M'), '%T', '%H:%M:%S') |
352 | + let $format-tokens := tokenize($format-string, "( |%|:)+")[position()>1] |
353 | let $sd-tokens := |
354 | if (contains($sd, ":") or contains($sd, ".") or contains($sd, " ")) |
355 | then tokenize ($sd, "[ :\.]") |
356 | @@ -313,7 +292,7 @@ |
357 | |
358 | if (count(index-of($format-tokens, "e")) != 0) |
359 | then concat("0", string($sd-tokens[position() = index-of($format-tokens, "e")])) |
360 | - else "SND" |
361 | + else "00" |
362 | |
363 | let $result := |
364 | |
365 | @@ -439,7 +418,7 @@ |
366 | else () |
367 | else |
368 | |
369 | - (:z:) |
370 | + |
371 | if (count(index-of($format-tokens, "z")) != 0) |
372 | then if (substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),1,1)='+') |
373 | then let $complement := |
374 | @@ -539,11 +518,7 @@ |
375 | |
376 | return |
377 | |
378 | - if (matches(string($result),"(([01][0-9])|(2[0-3])):[0-5][0-9]:[0-5][0-9]")) |
379 | - then $result |
380 | - else |
381 | - (error(QName('http://www.zorba-xquery.com/modules/data-cleaning/normalization', |
382 | - 'err:notsupported'),data(concat($result, " - ", concat("hours: ", $hours), concat(" minutes: ", $minutes), concat(" seconds:", $seconds))))) |
383 | + normalization:check-time($result) |
384 | else() |
385 | |
386 | }; |
387 | @@ -552,10 +527,6 @@ |
388 | : Converts a given string representation of a dateTime value into a dateTime representation |
389 | : valid according to the corresponding XML Schema type. |
390 | : |
391 | - : <br/> |
392 | - : Example usage : <pre> to-dateTime( "24OCT2002 21:22" , "%d%b%Y %H%M" ) </pre> |
393 | - : <br/> |
394 | - : The function invocation in the example above returns : <pre> 2002-20-24T21:22:00 </pre> |
395 | : |
396 | : @param $sd The string representation for the dateTime. |
397 | : @param $format An optional parameter denoting the format used to represent the dateTime in the string, according to a sequence |
398 | @@ -564,11 +535,10 @@ |
399 | : is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows: |
400 | : |
401 | : <pre> |
402 | - : '%a' Abbreviated weekday name in the current locale.<br/> |
403 | - : '%A' Full weekday name in the current locale.<br/> |
404 | : '%b' Abbreviated month name in the current locale.<br/> |
405 | : '%B' Full month name in the current locale.<br/> |
406 | : '%c' Date and time, locale-specific.<br/> |
407 | + : '%C' Century (00-99): the integer part of the year divided by 100.<br/> |
408 | : '%d' Day of the month as decimal number (01-31).<br/> |
409 | : '%H' Hours as decimal number (00-23).<br/> |
410 | : '%I' Hours as decimal number (01-12).<br/> |
411 | @@ -577,16 +547,12 @@ |
412 | : '%M' Minute as decimal number (00-59).<br/> |
413 | : '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'.<br/> |
414 | : '%S' Second as decimal number (00-61), allowing for up to two leap-seconds.<br/> |
415 | - : '%U' Week of the year as decimal number (00-53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). This is the US convention.<br/> |
416 | - : '%w' Weekday as decimal number (0-6, Sunday is 0).<br/> |
417 | - : '%W' Week of the year as decimal number (00-53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). This is the UK convention.<br/> |
418 | : '%x' Date, locale-specific.<br/> |
419 | : '%X' Time, locale-specific.<br/> |
420 | : '%y' Year without century (00-99).<br/> |
421 | : '%Y' Year with century.<br/> |
422 | : '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich.<br/> |
423 | : '%Z' Time zone as a character string.<br/> |
424 | - : '%C' Century (00-99): the integer part of the year divided by 100.<br/> |
425 | : '%D' Locale-specific date format such as '%m/%d/%y': ISO C99 says it should be that exact format.<br/> |
426 | : '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number.<br/> |
427 | : '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format).<br/> |
428 | @@ -598,19 +564,16 @@ |
429 | : '%r' The 12-hour clock time (using the locale's AM or PM).<br/> |
430 | : '%R' Equivalent to '%H:%M'.<br/> |
431 | : '%T' Equivalent to '%H:%M:%S'.<br/> |
432 | - : '%u' Weekday as a decimal number (1-7, Monday is 1).<br/> |
433 | - : '%V' Week of the year as decimal number (00-53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1. |
434 | :</pre> |
435 | : |
436 | : @return The dateTime value resulting from the conversion. |
437 | - : <br/><br/><b> Attention : This function is still not implemented. </b> <br/> |
438 | - : |
439 | + : @example test/Queries/data-cleaning/normalization/to-dateTime.xq |
440 | :) |
441 | declare function normalization:to-dateTime ( $sd as xs:string, $format as xs:string? ) as xs:string { |
442 | -(: |
443 | let $timezoneDict := normalization:timeZone-dictionary() |
444 | - let $monthDict := normalization:month-dictionary() |
445 | - let $format-tokens := tokenize($format, "[ \-%]+")[position()>1] |
446 | + let $monthDict := normalization:month-dictionary() |
447 | + let $format-string := replace(replace(replace ($format, '%R', '%H:%M'), '%T', '%H:%M:%S'), '%F', '%Y-%m-%d') |
448 | + let $format-tokens := tokenize($format-string, "[ %\-/:\.]+")[position()>1] |
449 | let $sdt := |
450 | if (contains($sd, ":") or contains($sd, ".") or contains($sd, " ") or contains($sd, "-") |
451 | or contains($sd, "/")) |
452 | @@ -801,7 +764,7 @@ |
453 | |
454 | if (count(index-of($format-tokens, "e")) != 0) |
455 | then concat("0", string($sd-tokens[position() = index-of($format-tokens, "e")])) |
456 | - else "SND" |
457 | + else "00" |
458 | |
459 | let $result := |
460 | |
461 | @@ -814,6 +777,50 @@ |
462 | then 1 |
463 | else 0 |
464 | |
465 | + let $dayscomplement := |
466 | + if (number($complement) + number($hours) + number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() =index-of($format-tokens, "Z")]]),2,2)) >= 24) |
467 | + then 1 |
468 | + else 0 |
469 | + |
470 | + let $monthscomplement := |
471 | + if(($dayscomplement + number($day) > 28) and (compare($month, '02') = 0) and (number($year) mod 4 != 0)) |
472 | + then 1 |
473 | + else |
474 | + if(($dayscomplement + number($day) > 30) and ((compare($month, '04') = 0) or (compare($month, '06') = 0) or (compare($month, '09') = 0) or (compare($month, '11') = 0))) |
475 | + then 1 |
476 | + else |
477 | + if(($dayscomplement + number($day) > 31) and ((compare($month, '04') = 0) or (compare($month, '01') = 0) or (compare($month, '03') = 0) or (compare($month, '05') = 0) or (compare($month, '07') = 0) or (compare($month, '08') = 0) or (compare($month, '10') = 0) or (compare($month, '12') = 0))) |
478 | + then 1 |
479 | + else |
480 | + if(($dayscomplement + number($day) > 29) and (compare($month, '02') = 0) and (number($year) mod 4 = 0)) |
481 | + then 1 |
482 | + else 0 |
483 | + |
484 | + let $ryear := |
485 | + if ($monthscomplement + number($month) > 12) |
486 | + then string(number($year) + 1) |
487 | + else $year |
488 | + |
489 | + let $daywcompl := |
490 | + if ($monthscomplement = 1) |
491 | + then 1 |
492 | + else number($day) + $dayscomplement |
493 | + |
494 | + let $monthwcompl := |
495 | + if($monthscomplement + number($month) <= 12) |
496 | + then number($month) + $monthscomplement |
497 | + else 1 |
498 | + |
499 | + let $rday := |
500 | + if (string-length(string($daywcompl)) = 1) |
501 | + then concat ('0', string($daywcompl)) |
502 | + else string($daywcompl) |
503 | + |
504 | + let $rmonth := |
505 | + if (string-length(string($monthwcompl)) = 1) |
506 | + then concat ('0', string($monthwcompl)) |
507 | + else string($monthwcompl) |
508 | + |
509 | let $rhours := |
510 | if (string-length(string( |
511 | (number($complement) + number($hours) + |
512 | @@ -845,7 +852,7 @@ |
513 | index-of($format-tokens, "Z")]]),4,2))) mod 60)) |
514 | |
515 | |
516 | - return concat($year, "-", $month, "-", $day, "T", $rhours, ":", $rminutes, ":", $seconds) |
517 | + return concat($ryear, "-", $rmonth, "-", $rday, "T", $rhours, ":", $rminutes, ":", $seconds) |
518 | else |
519 | |
520 | if (substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() = |
521 | @@ -853,10 +860,61 @@ |
522 | then |
523 | let $complement := |
524 | if (number($minutes)-number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() = |
525 | - index-of($format-tokens, "Z")]]),4,2)) < 0) |
526 | + index-of($format-tokens, "Z")]]),2,2)) < 0) |
527 | then -1 |
528 | else 0 |
529 | |
530 | + let $dayscomplement := |
531 | + if (number($complement) - number($hours) - number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position()= |
532 | + index-of($format-tokens, "Z")]]),2,2)) < 0) |
533 | + then -1 |
534 | + else 0 |
535 | + |
536 | + let $monthcomplement := |
537 | + if(number($day) + $dayscomplement < 1) |
538 | + then -1 |
539 | + else 0 |
540 | + |
541 | + let $yearcomplement := |
542 | + if(number($month) + $monthcomplement< 1) |
543 | + then -1 |
544 | + else 0 |
545 | + |
546 | + let $daywcompl := |
547 | + if ($monthcomplement = 0) |
548 | + then number($day) + $dayscomplement |
549 | + else |
550 | + if ( (number($month) = 5) or (number($month) = 7) or (number($month) = 10) or (number($month) = 12)) |
551 | + then 30 |
552 | + else |
553 | + if((number($month) = 4) or (number($month) = 6) or (number($month) = 9) or (number($month) = 11) or (number($month) = 2) or (number($month) = 1) or (number($month) = 8)) |
554 | + then 31 |
555 | + else |
556 | + if((number($month) = 3) and (number($year) mod 4 != 0)) |
557 | + then 28 |
558 | + else |
559 | + if((number($month) = 3) and (number($year) mod 4 = 0)) |
560 | + then 29 |
561 | + else number($day) + $dayscomplement |
562 | + |
563 | + let $monthwcompl:= |
564 | + if($yearcomplement = 0) |
565 | + then number($month) + $monthcomplement |
566 | + else 12 |
567 | + |
568 | + let $ryear := |
569 | + number($year) + $yearcomplement |
570 | + |
571 | + let $rday := |
572 | + if (string-length(string($daywcompl)) = 1) |
573 | + then concat ('0', string($daywcompl)) |
574 | + else string($daywcompl) |
575 | + |
576 | + let $rmonth := |
577 | + if (string-length(string($monthwcompl)) = 1) |
578 | + then concat ('0', string($monthwcompl)) |
579 | + else string($monthwcompl) |
580 | + |
581 | let $rhours := |
582 | if( ((number($complement) + number($hours) - |
583 | number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() = |
584 | @@ -923,7 +981,7 @@ |
585 | number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() = |
586 | index-of($format-tokens, "Z")]]),2,2)))) mod 60)) |
587 | |
588 | - return concat($year, "-", $month, "-", $day, "T", $rhours, ":", $rminutes, ":", $seconds) |
589 | + return concat($ryear, "-", $rmonth, "-", $rday, "T", $rhours, ":", $rminutes, ":", $seconds) |
590 | else () |
591 | else |
592 | |
593 | @@ -931,8 +989,52 @@ |
594 | if (count(index-of($format-tokens, "z")) != 0) |
595 | then if (substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),1,1)='+') |
596 | then let $complement := |
597 | - if (number($minutes)+number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2)) > 59) then 1 |
598 | - else 0 |
599 | + if (number($minutes)+number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2)) > 59) then 1 |
600 | + else 0 |
601 | + |
602 | + let $dayscomplement := |
603 | + if (number($complement) + number($hours) + number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),2,2)) >= 24) |
604 | + then 1 |
605 | + else 0 |
606 | + |
607 | + let $monthscomplement := |
608 | + if(($dayscomplement + number($day) > 28) and (compare($month, '02') = 0) and (number($year) mod 4 != 0)) |
609 | + then 1 |
610 | + else |
611 | + if(($dayscomplement + number($day) > 30) and ((compare($month, '04') = 0) or (compare($month, '06') = 0) or (compare($month, '09') = 0) or (compare($month, '11') = 0))) |
612 | + then 1 |
613 | + else |
614 | + if(($dayscomplement + number($day) > 31) and ((compare($month, '04') = 0) or (compare($month, '01') = 0) or (compare($month, '03') = 0) or (compare($month, '05') = 0) or (compare($month, '07') = 0) or (compare($month, '08') = 0) or (compare($month, '10') = 0) or (compare($month, '12') = 0))) |
615 | + then 1 |
616 | + else |
617 | + if(($dayscomplement + number($day) > 29) and (compare($month, '02') = 0) and (number($year) mod 4 = 0)) |
618 | + then 1 |
619 | + else 0 |
620 | + |
621 | + let $ryear := |
622 | + if ($monthscomplement + number($month) > 12) |
623 | + then string(number($year) + 1) |
624 | + else $year |
625 | + |
626 | + let $daywcompl := |
627 | + if ($monthscomplement = 1) |
628 | + then 1 |
629 | + else number($day) + $dayscomplement |
630 | + |
631 | + let $monthwcompl := |
632 | + if($monthscomplement + number($month) <= 12) |
633 | + then number($month) + $monthscomplement |
634 | + else 1 |
635 | + |
636 | + let $rday := |
637 | + if (string-length(string($daywcompl)) = 1) |
638 | + then concat ('0', string($daywcompl)) |
639 | + else string($daywcompl) |
640 | + |
641 | + let $rmonth := |
642 | + if (string-length(string($monthwcompl)) = 1) |
643 | + then concat ('0', string($monthwcompl)) |
644 | + else string($monthwcompl) |
645 | |
646 | let $rhours := |
647 | if (string-length(string( |
648 | @@ -959,15 +1061,65 @@ |
649 | number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2))) mod 60)) |
650 | |
651 | |
652 | - return concat($year, "-", $month, "-", $day, "T", $rhours, ":", $rminutes, ":", $seconds) |
653 | + return concat($ryear, "-", $rmonth, "-", $rday, "T", $rhours, ":", $rminutes, ":", $seconds) |
654 | else |
655 | |
656 | if (substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),1,1)='-') |
657 | then |
658 | let $complement := |
659 | - if (number($minutes)-number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2)) < 0) then -1 |
660 | - else 0 |
661 | - |
662 | + if (number($minutes)-number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2)) < 0) then -1 |
663 | + else 0 |
664 | + |
665 | + let $dayscomplement := |
666 | + if (number($complement) - number($hours) - number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),2,2)) < 0) |
667 | + then -1 |
668 | + else 0 |
669 | + |
670 | + let $monthcomplement := |
671 | + if(number($day) + $dayscomplement< 1) |
672 | + then -1 |
673 | + else 0 |
674 | + |
675 | + let $yearcomplement := |
676 | + if(number($month) + $monthcomplement< 1) |
677 | + then -1 |
678 | + else 0 |
679 | + |
680 | + let $daywcompl := |
681 | + if ($monthcomplement = 0) |
682 | + then number($day) + $dayscomplement |
683 | + else |
684 | + if ( (number($month) = 5) or (number($month) = 7) or (number($month) = 10) or (number($month) = 12)) |
685 | + then 30 |
686 | + else |
687 | + if((number($month) = 4) or (number($month) = 6) or (number($month) = 9) or (number($month) = 11) or (number($month) = 2) or (number($month) = 1) or (number($month) = 8)) |
688 | + then 31 |
689 | + else |
690 | + if((number($month) = 3) and (number($year) mod 4 != 0)) |
691 | + then 28 |
692 | + else |
693 | + if((number($month) = 3) and (number($year) mod 4 = 0)) |
694 | + then 29 |
695 | + else number($day) + $dayscomplement |
696 | + |
697 | + let $monthwcompl:= |
698 | + if($yearcomplement = 0) |
699 | + then number($month) + $monthcomplement |
700 | + else 12 |
701 | + |
702 | + let $ryear := |
703 | + number($year) + $yearcomplement |
704 | + |
705 | + let $rday := |
706 | + if (string-length(string($daywcompl)) = 1) |
707 | + then concat ('0', string($daywcompl)) |
708 | + else string($daywcompl) |
709 | + |
710 | + let $rmonth := |
711 | + if (string-length(string($monthwcompl)) = 1) |
712 | + then concat ('0', string($monthwcompl)) |
713 | + else string($monthwcompl) |
714 | + |
715 | let $rhours := |
716 | if( ((number($complement) + number($hours) - |
717 | number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),2,2))) mod 24) >= 0 ) |
718 | @@ -1020,31 +1172,20 @@ |
719 | (60 - -(number($minutes) - |
720 | number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),2,2)))) mod 60)) |
721 | |
722 | - return concat($year, "-", $month, "-", $day, "T", $rhours, ":", $rminutes, ":", $seconds) |
723 | + return concat($ryear, "-", $rmonth, "-", $rday, "T", $rhours, ":", $rminutes, ":", $seconds) |
724 | else () |
725 | else |
726 | concat($year, "-", $month, "-", $day, "T", $hours, ":", $minutes, ":", $seconds) |
727 | |
728 | - return |
729 | - |
730 | - if (matches(string($result),"[0-9]+-((0[1-9])|(1[0-2]))-((0[1-9])|([12][0-9])|(3[01]))T(([01][0-9])|(2[0-3])):[0-5][0-9]:[0-5][0-9]")) |
731 | - then $result |
732 | - else |
733 | - (error(QName('http://www.zorba-xquery.com/modules/data-cleaning/normalization', |
734 | - 'err:notsupported'),data(concat($result, " - ", concat("hours: ", $hours), concat(" minutes: ", $minutes), concat(" seconds:", $seconds))))) |
735 | - |
736 | + return |
737 | + normalization:check-dateTime($result) |
738 | else() |
739 | -:)"" |
740 | }; |
741 | |
742 | (:~ |
743 | : Uses an address normalization Web service to convert a postal address given as input into a |
744 | : cannonical representation format. |
745 | : |
746 | - : <br/> |
747 | - : Example usage : <pre> normalize-address ( ( 'Marques de Pombal' , 'Lisboa' ) ) </pre> |
748 | - : <br/> |
749 | - : The function invocation in the example above returns : <pre> ( 'Portugal' , 'Lisbon' , 'praca Marques de Pombal' ) </pre> |
750 | : |
751 | : @param $addr A sequence of strings encoding an address, where each string in the sequence corresponds to a different component (e.g., street, city, country, etc.) of the address. |
752 | : @return A sequence of strings with the address encoded in a cannonical format, where each string in the sequence corresponds to a different component (e.g., street, city, country, etc.) of the address. |
753 | @@ -1315,7 +1456,7 @@ |
754 | : Internal auxiliary function that returns an XML representation for a dictionary that contains a |
755 | : numeric value associated to different month name abbreviations. |
756 | :) |
757 | -declare %private function normalization:month-dictionary() as node(){ |
758 | +declare %private function normalization:month-dictionary() as element(){ |
759 | let $dictionary := |
760 | <dictionary> |
761 | <month name="January" value="01"> |
762 | @@ -1380,3 +1521,40 @@ |
763 | </dictionary> |
764 | return $dictionary |
765 | }; |
766 | + |
767 | +(:~ |
768 | + : Internal auxiliary function that checks if a string is in xs:dateTime format |
769 | + : |
770 | + : |
771 | + : @param $dateTime The string representation for the dateTime. |
772 | + : @return The dateTime string if it represents the xs:dateTime format. |
773 | + :) |
774 | +declare %private function normalization:check-dateTime($dateTime as xs:string) as xs:string{ |
775 | + concat(string(year-from-dateTime(xs:dateTime($dateTime))), substring($dateTime,5)) |
776 | +}; |
777 | + |
778 | +(:~ |
779 | + : Internal auxiliary function that checks if a string is in xs:date format |
780 | + : |
781 | + : |
782 | + : @param $dateTime The string representation for the date. |
783 | + : @return The date string if it represents the xs:date format. |
784 | + :) |
785 | +declare %private function normalization:check-date($date as xs:string) as xs:string{ |
786 | + concat(string(year-from-date(xs:date($date))), substring($date,5)) |
787 | +}; |
788 | + |
789 | +(:~ |
790 | + : Internal auxiliary function that checks if a string is in xs:time format |
791 | + : |
792 | + : |
793 | + : @param $dateTime The string representation for the time. |
794 | + : @return The time string if it represents the xs:time format. |
795 | + :) |
796 | +declare %private function normalization:check-time($Time as xs:string) as xs:string{ |
797 | + if(string(hours-from-time(xs:time($Time)))) |
798 | + then $Time |
799 | + else() |
800 | +}; |
801 | + |
802 | + |
803 | |
804 | === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq' |
805 | --- src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq 2011-08-01 11:26:53 +0000 |
806 | +++ src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq 2011-11-16 18:49:23 +0000 |
807 | @@ -91,7 +91,7 @@ |
808 | let $aux3 := replace(replace($aux2,"MB","M"),"B$","") |
809 | let $aux4 := replace(replace(replace(replace(replace($aux3,"CIA","XIA"),"SCH","SKH"),"CH","XH"),"C([IEY])","S$1"),"C","K") |
810 | let $aux5 := replace(replace($aux4,"DG([EYI])","JG$1"),"D","T") |
811 | - let $aux6 := replace(replace($aux5,"GH([^AEIOU])","H$1"),"G(N(ED)?)^","$1") |
812 | + let $aux6 := replace(replace($aux5,"GH([^AEIOU])","H$1"),"G(N(ED)?)$","$1") |
813 | let $aux7 := replace(replace(replace($aux6,"([^G]?)G([IEY])","$1J$2"),"([^G]?)G","$1K"),"GG","G") |
814 | let $aux8 := replace(replace(replace(replace($aux7,"([AEIOU])H([^AEIOU])","$1$2"),"CK","K"),"PH","F"),"Q","K") |
815 | let $aux9 := replace(replace(replace(replace(replace($aux8,"S(H|(IO)|(IA))","X$1"),"T((IO)|(IA))","X$1"),"TH","0"),"TCH","CH"),"V","F") |
816 | |
817 | === modified file 'test/ExpQueryResults/data-cleaning/conversion/address-from-user.xml.res' |
818 | --- test/ExpQueryResults/data-cleaning/conversion/address-from-user.xml.res 2011-07-19 19:12:03 +0000 |
819 | +++ test/ExpQueryResults/data-cleaning/conversion/address-from-user.xml.res 2011-11-16 18:49:23 +0000 |
820 | @@ -1,1 +1,1 @@ |
821 | -3362 Walden Ave, Depew, NY, US 222 E 53rd St, Los Angeles, CA, US |
822 | \ No newline at end of file |
823 | +222 E 53rd St, Los Angeles, CA, US |
824 | |
825 | === modified file 'test/ExpQueryResults/data-cleaning/conversion/geocode-from-address.xml.res' |
826 | --- test/ExpQueryResults/data-cleaning/conversion/geocode-from-address.xml.res 2011-07-19 19:12:03 +0000 |
827 | +++ test/ExpQueryResults/data-cleaning/conversion/geocode-from-address.xml.res 2011-11-16 18:49:23 +0000 |
828 | @@ -1,1 +1,1 @@ |
829 | -38.725735 -9.15021 |
830 | \ No newline at end of file |
831 | +38 -10 |
832 | |
833 | === modified file 'test/ExpQueryResults/data-cleaning/conversion/phone-from-user.xml.res' |
834 | --- test/ExpQueryResults/data-cleaning/conversion/phone-from-user.xml.res 2011-07-19 19:12:03 +0000 |
835 | +++ test/ExpQueryResults/data-cleaning/conversion/phone-from-user.xml.res 2011-11-16 18:49:23 +0000 |
836 | @@ -1,1 +1,1 @@ |
837 | -(716) 686-4500 |
838 | \ No newline at end of file |
839 | +(661) 397-4236 (310) 513-0752 (510) 259-0456 (831) 385-3605 (213) 627-0188 (323) 846-1235 (661) 224-1072 (909) 820-3137 (916) 627-1090 (707) 938-9861 (805) 648-6417 |
840 | |
841 | === modified file 'test/ExpQueryResults/data-cleaning/conversion/user-from-phone.xml.res' |
842 | --- test/ExpQueryResults/data-cleaning/conversion/user-from-phone.xml.res 2011-09-02 09:14:39 +0000 |
843 | +++ test/ExpQueryResults/data-cleaning/conversion/user-from-phone.xml.res 2011-11-16 18:49:23 +0000 |
844 | @@ -1,1 +1,1 @@ |
845 | -Homer V Simpson Homer Simpson Sue M Simpson |
846 | \ No newline at end of file |
847 | +Gene Simpson Homer V Simpson Homer Simpson Sue M Simpson |
848 | |
849 | === modified file 'test/ExpQueryResults/data-cleaning/normalization/to-date.xml.res' |
850 | --- test/ExpQueryResults/data-cleaning/normalization/to-date.xml.res 2011-07-28 23:25:13 +0000 |
851 | +++ test/ExpQueryResults/data-cleaning/normalization/to-date.xml.res 2011-11-16 18:49:23 +0000 |
852 | @@ -0,0 +1,1 @@ |
853 | +2002-10-24 |
854 | |
855 | === modified file 'test/ExpQueryResults/data-cleaning/normalization/to-dateTime.xml.res' |
856 | --- test/ExpQueryResults/data-cleaning/normalization/to-dateTime.xml.res 2011-07-19 19:12:03 +0000 |
857 | +++ test/ExpQueryResults/data-cleaning/normalization/to-dateTime.xml.res 2011-11-16 18:49:23 +0000 |
858 | @@ -0,0 +1,1 @@ |
859 | +2002-10-24T21:22:00 |
860 | |
861 | === modified file 'test/ExpQueryResults/data-cleaning/normalization/to-time.xml.res' |
862 | --- test/ExpQueryResults/data-cleaning/normalization/to-time.xml.res 2011-07-19 19:12:03 +0000 |
863 | +++ test/ExpQueryResults/data-cleaning/normalization/to-time.xml.res 2011-11-16 18:49:23 +0000 |
864 | @@ -0,0 +1,1 @@ |
865 | +09:10:00 |
866 | |
867 | === modified file 'test/Queries/data-cleaning/conversion/geocode-from-address.xq' |
868 | --- test/Queries/data-cleaning/conversion/geocode-from-address.xq 2011-07-19 19:12:03 +0000 |
869 | +++ test/Queries/data-cleaning/conversion/geocode-from-address.xq 2011-11-16 18:49:23 +0000 |
870 | @@ -1,3 +1,5 @@ |
871 | import module namespace conversion = "http://www.zorba-xquery.com/modules/data-cleaning/conversion"; |
872 | |
873 | -conversion:geocode-from-address ( ("Lisboa", "Portugal") ) |
874 | +let $geocode := conversion:geocode-from-address ( ("Lisboa", "Portugal") ) |
875 | +for $result in $geocode |
876 | +return floor($result) |
877 | |
878 | === removed file 'test/Queries/data-cleaning/conversion/unit-convert.spec' |
879 | --- test/Queries/data-cleaning/conversion/unit-convert.spec 2011-07-28 23:25:13 +0000 |
880 | +++ test/Queries/data-cleaning/conversion/unit-convert.spec 1970-01-01 00:00:00 +0000 |
881 | @@ -1,1 +0,0 @@ |
882 | -Error: http://expath.org/ns/error:HC002 |
Validation queue starting for merge proposal. zorbatest. lambda. nu:8080/ remotequeue/ data-cleaning- 2011-10- 17T10-22- 35.152Z/ log.html
Log at: http://