Merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module

Proposed by Diogo Simões
Status: Merged
Approved by: Matthias Brantner
Approved revision: 38
Merged at revision: 33
Proposed branch: lp:~diogo-simoes89/zorba/data-cleaning
Merge into: lp:zorba/data-cleaning-module
Diff against target: 882 lines (+374/-144)
12 files modified
src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq (+103/-55)
src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq (+260/-82)
src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq (+1/-1)
test/ExpQueryResults/data-cleaning/conversion/address-from-user.xml.res (+1/-1)
test/ExpQueryResults/data-cleaning/conversion/geocode-from-address.xml.res (+1/-1)
test/ExpQueryResults/data-cleaning/conversion/phone-from-user.xml.res (+1/-1)
test/ExpQueryResults/data-cleaning/conversion/user-from-phone.xml.res (+1/-1)
test/ExpQueryResults/data-cleaning/normalization/to-date.xml.res (+1/-0)
test/ExpQueryResults/data-cleaning/normalization/to-dateTime.xml.res (+1/-0)
test/ExpQueryResults/data-cleaning/normalization/to-time.xml.res (+1/-0)
test/Queries/data-cleaning/conversion/geocode-from-address.xq (+3/-1)
test/Queries/data-cleaning/conversion/unit-convert.spec (+0/-1)
To merge this branch: bzr merge lp:~diogo-simoes89/zorba/data-cleaning
Reviewer Review Type Date Requested Status
Matthias Brantner Approve
Bruno Martins Approve
Review via email: mp+79530@code.launchpad.net

Commit message

Changes on normalization functions (to-time, to-dateTime, to date)
Changes on conversion expected results (address-from-phone, phone-from-user, user-from-phone)

Description of the change

Changes on normalization functions:
- to-dateTime: uncomment the function, resolve the bugs
- to-time: uncomment the function, resolve the bugs
- implementation of check-functions that verifies the if a string corresponds to a xs:date, xs: time or xs:dateTime

Changes on conversion tests (changing the test result):
- address-from-user
- phone-from-user
- user-from-phone

To post a comment you must log in.
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job data-cleaning-2011-10-17T10-22-35.152Z is finished.
  The final status was:

  3 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job data-cleaning-2011-10-17T20-13-34.611Z is finished. The final status was:

All tests succeeded!

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1. Got: 1 Pending.

Revision history for this message
Bruno Martins (bgmartins) wrote :

Some minor things that should be changed before approving the merge:

* The documentation for functions like conversion:address-from-phone, conversion:user-from-phone and conversion:address-from-user should be revised, in order to keep just one example (i.e., the one from @example).

* The documentation for functions like normalization:to-date or normalization:to-time should present an example invocation.

* The documentation for the private functions like normalization:check-time should also provide an example invocation. From looking at the source code, it seems to me that these functions are not really necessary, and the corresponding instructions could instead be used directly in the code that
invokes these functions.

review: Needs Fixing
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job data-cleaning-2011-10-26T16-18-38.76Z is finished. The final status was:

All tests succeeded!

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1. Got: 1 Needs Fixing.

Revision history for this message
Bruno Martins (bgmartins) wrote :

Checked the latest revisions from Diogo and they seem ok.

review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job data-cleaning-2011-11-11T03-01-39.94Z is finished.
  The final status was:

  2 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job data-cleaning-2011-11-16T17-37-12.546Z is finished.
  The final status was:

  1 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job data-cleaning-2011-11-16T18-48-39.387Z is finished.
  The final status was:

  4 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

The attempt to merge lp:~diogo-simoes89/zorba/data-cleaning into lp:zorba/data-cleaning-module failed. Below is the output from the failed tests.

CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:272 (message):
  Validation queue job data-cleaning-2011-11-21T23-35-38.044Z is finished.
  The final status was:

  3 tests did not succeed - changes not commited.

Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job data-cleaning-2011-11-23T17-21-36.139Z is finished. The final status was:

All tests succeeded!

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1. Got: 1 Approve.

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job data-cleaning-2011-11-24T21-35-36.072Z is finished. The final status was:

All tests succeeded!

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1. Got: 1 Approve.

Revision history for this message
Matthias Brantner (matthias-brantner) :
review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job data-cleaning-2011-11-25T20-19-36.742Z is finished. The final status was:

All tests succeeded!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq'
2--- src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq 2011-08-16 23:45:59 +0000
3+++ src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq 2011-11-16 18:49:23 +0000
4@@ -35,6 +35,8 @@
5
6 import module namespace http = "http://www.zorba-xquery.com/modules/http-client";
7
8+import module namespace reflection = "http://www.zorba-xquery.com/modules/reflection";
9+
10 declare namespace ver = "http://www.zorba-xquery.com/options/versioning";
11 declare option ver:module-version "2.0";
12
13@@ -45,10 +47,6 @@
14 : Uses a White-pages Web service to discover information about a given name,
15 : returning a sequence of strings for the phone numbers associated to the name.
16 :
17- : <br/>
18- : Example usage : <pre> phone-from-user ('Maria Lurdes') </pre>
19- : <br/>
20- : The function invocation in the example above returns : <pre> (716) 686-4500 </pre>
21 :
22 : @param $name The name of person or organization.
23 : @return A sequence of strings for the phone numbers associated to the name.
24@@ -66,11 +64,6 @@
25 : Uses a White-pages Web service to discover information about a given name,
26 : returning a sequence of strings for the addresses associated to the name.
27 :
28- : <br/>
29- : Example usage : <pre> address-from-user ('Maria Lurdes') </pre>
30- : <br/>
31- : The function invocation in the example above returns : <pre> 222 E 53rd St, Los Angeles, CA, US </pre>
32- : <pre> 3362 Walden Ave, Depew, NY, US </pre>
33 :
34 : @param $name The name of person or organization.
35 : @return A sequence of strings for the addresses associated to the name.
36@@ -93,11 +86,6 @@
37 : Uses a White-pages Web service to discover information about a given phone number,
38 : returning a sequence of strings for the name associated to the phone number.
39 :
40- : <br/>
41- : Example usage : <pre> user-from-phone ('8654582358') </pre>
42- : <br/>
43- : The function invocation in the example above returns : <pre> Homer Simpson </pre>
44- : <pre> Sue M Simpson </pre>
45 :
46 : @param $phone-number A string with 10 digits corresponding to the phone number.
47 : @return A sequence of strings for the person or organization's name associated to the phone number.
48@@ -113,10 +101,6 @@
49 : Uses a White-pages Web service to discover information about a given phone number,
50 : returning a string for the address associated to the phone number.
51 :
52- : <br/>
53- : Example usage : <pre> address-from-phone ('8654582358') </pre>
54- : <br/>
55- : The function invocation in the example above returns : <pre> 4610 Harrison Bend Rd, Loudon, TN, US </pre>
56 :
57 : @param $phone-number A string with 10 digits corresponding to the phone number.
58 : @return A string for the addresses associated to the phone number.
59@@ -139,10 +123,6 @@
60 : Uses a White-pages Web service to discover information about a given address,
61 : returning a sequence of strings for the names associated to the address.
62 :
63- : <br/>
64- : Example usage : <pre> user-from-address('5655 E Gaskill Rd, Willcox, AZ, US') </pre>
65- : <br/>
66- : The function invocation in the example above returns : <pre> Stan Smith </pre>
67 :
68 : @param $address A string corresponding to the address (ex: 5655 E Gaskill Rd, Willcox, AZ, US).
69 : @return A sequence of strings for the person or organization's names associated to the address.
70@@ -169,10 +149,6 @@
71 : Uses a White-pages Web service to discover information about a given address,
72 : returning a sequence of strings for the phone number associated to the address.
73 :
74- : <br/>
75- : Example usage : <pre> phone-from-address('5655 E Gaskill Rd, Willcox, AZ, US') </pre>
76- : <br/>
77- : The function invocation in the example above returns : <pre> (520) 824-3160 </pre>
78 :
79 : @param $address A string corresponding to the address (ex: 5655 E Gaskill Rd, Willcox, AZ, US).
80 : @return A sequence of strings for the phone number or organization's names associated to the address.
81@@ -206,41 +182,122 @@
82 (:~
83 : Conversion function for units of measurement, acting as a wrapper over the CuppaIT WebService.
84 : <br/>
85- : WebService documentation at http://www.cuppait.com/UnitConversionGateway-war/UnitConversion?format=XML
86 :
87- : <br/>
88- : Example usage : <pre> unit-convert ( 1 , "Distance", "mile", "kilometer" ) </pre>
89- : <br/>
90- : The function invocation in the example above returns : <pre> 1.609344 </pre>
91 :
92 : @param $v The amount we wish to convert.
93 : @param $t The type of metric (e.g., "Distance")
94 : @param $m1 The source measurement unit metric (e.g., "meter")
95 : @param $m2 The target measurement unit metric (e.g., "mile")
96 : @return The value resulting from the conversion
97- : @error conversion:notsupported if the type of metric, the source unit or the target unit are not known to the service.
98- : @see http://www.cuppait.com/UnitConversionGateway-war/UnitConversion?format=XML
99 : @example test/Queries/data-cleaning/conversion/unit-convert.xq
100 :)
101 declare %ann:nondeterministic function conversion:unit-convert ( $v as xs:double, $t as xs:string, $m1 as xs:string, $m2 as xs:string ) {
102- let $url := "http://www.cuppait.com/UnitConversionGateway-war/UnitConversion?format=XML"
103- let $ctype := concat("ctype=",$t)
104- let $cfrom := concat("cfrom=",$m1)
105- let $cto := concat("cto=",$m2)
106- let $camount := concat("camount=",$v)
107- let $par := string-join(($url,$ctype,$cfrom,$cto,$camount),"&amp;")
108- let $result := data(http:get-node($par)[2])
109- return if (matches(data($result),"-?[0-9]+(\.[0-9]+)?")) then data($result)
110- else (error(QName('http://www.zorba-xquery.com/modules/data-cleaning/conversion', 'conversion:notsupported'), data($result)))
111+ if ( $m1 = $m2 ) then $v else
112+
113+let $conversion-table :=
114+ <unit-conversion-rules>
115+ <unit type="Distance" from="mile" to="kilometer" value="1.609344" />
116+ <unit type="Distance" from="mile" to="angstrom" value="16100000000000" />
117+ <unit type="Distance" from="mile" to="picometer" value="1610000000000000" />
118+ <unit type="Distance" from="mile" to="nanometer" value="1610000000000" />
119+ <unit type="Distance" from="mile" to="microometer" value="1610000000" />
120+ <unit type="Distance" from="mile" to="millimeter" value="1610000" />
121+ <unit type="Distance" from="mile" to="centimeter" value="161000" />
122+ <unit type="Distance" from="mile" to="meter" value="1610" />
123+ <unit type="Distance" from="mile" to="inch" value="63400" />
124+ <unit type="Distance" from="mile" to="feet" value="5280" />
125+ <unit type="Distance" from="kilometer" to="meter" value="1000" />
126+ <unit type="Distance" from="kilometer" to="picometer" value="1000000000000000" />
127+ <unit type="Distance" from="kilometer" to="angstrom" value="10000000000000" />
128+ <unit type="Distance" from="kilometer" to="nanometer" value="1000000000000" />
129+ <unit type="Distance" from="kilometer" to="micrometer" value="1000000000" />
130+ <unit type="Distance" from="kilometer" to="millimeter" value="1000000" />
131+ <unit type="Distance" from="kilometer" to="centimeter" value="100000" />
132+ <unit type="Distance" from="kilometer" to="inch" value="39400" />
133+ <unit type="Distance" from="kilometer" to="feet" value="3280" />
134+ <unit type="Distance" from="meter" to="centimeter" value="100" />
135+ <unit type="Distance" from="meter" to="picometer" value="1000000000000" />
136+ <unit type="Distance" from="meter" to="angstrom" value="10000000000" />
137+ <unit type="Distance" from="meter" to="nanometer" value="1000000000" />
138+ <unit type="Distance" from="meter" to="micrometer" value="1000000" />
139+ <unit type="Distance" from="meter" to="millimeter" value="1000" />
140+ <unit type="Distance" from="meter" to="inch" value="39.4" />
141+ <unit type="Distance" from="meter" to="feet" value="3.28" />
142+ <unit type="Distance" from="centimeter" to="millimeter" value="10" />
143+ <unit type="Distance" from="millimeter" to="micrometer" value="1000" />
144+ <unit type="Distance" from="micrometer" to="nanometer" value="1000" />
145+ <unit type="Distance" from="nanometer" to="angstrom" value="10" />
146+ <unit type="Distance" from="angstrom" to="picometer" value="100" />
147+ <unit type="Distance" from="inch" to="feet" value="0.0833" />
148+ <unit type="Mass" from="tons" to="kilograms" value="907.18474" />
149+ <unit type="Mass" from="tons" to="pounds" value="2000" />
150+ <unit type="Mass" from="tons" to="ounces" value="32000" />
151+ <unit type="Mass" from="tons" to="grams" value="907184.74" />
152+ <unit type="Mass" from="tons" to="milligrams" value="907184740" />
153+ <unit type="Mass" from="kilograms" to="pounds" value="2.2046226" />
154+ <unit type="Mass" from="kilograms" to="grams" value="1000" />
155+ <unit type="Mass" from="kilograms" to="milligrams" value="1000000" />
156+ <unit type="Mass" from="grams" to="milligrams" value="1000" />
157+ <unit type="Mass" from="pounds" to="ounces" value="16" />
158+ <unit type="Mass" from="pounds" to="grams" value="453.59237" />
159+ <unit type="Mass" from="pounds" to="milligrams" value="453592.37" />
160+ <unit type="Mass" from="ounces" to="kilograms" value="0.028349523" />
161+ <unit type="Mass" from="ounces" to="grams" value="28.349523" />
162+ <unit type="Mass" from="ounces" to="milligrams" value="28349.523" />
163+ <unit type="Volume" from="liters" to="cubic centimeters" value="1000" />
164+ <unit type="Energy" from="jouls" to="calories" value="0.239" />
165+ <unit type="Pressure" from="pascals" to="kilopascals" value="0.001" />
166+ <unit type="Pressure" from="pascals" to="bars" value="0.000001" />
167+ <unit type="Pressure" from="pascals" to="mmHg" value="0.00750064" />
168+ <unit type="Pressure" from="pascals" to="torrs" value="0.00750064" />
169+ <unit type="Pressure" from="atmospheres" to="pascals" value="101325" />
170+ <unit type="Pressure" from="atmospheres" to="kilopascals" value="101.325" />
171+ <unit type="Pressure" from="atmospheres" to="bars" value="1.01325" />
172+ <unit type="Pressure" from="atmospheres" to="mmHg" value="760" />
173+ <unit type="Pressure" from="atmospheres" to="torrs" value="760" />
174+ <unit type="Pressure" from="atmospheres" to="psi" value="14.7" />
175+ <unit type="Pressure" from="psi" to="pascals" value="6890" />
176+ <unit type="Pressure" from="psi" to="kilopascals" value="6.89" />
177+ <unit type="Pressure" from="psi" to="bars" value="0.0689" />
178+ <unit type="Pressure" from="psi" to="mmHg" value="51.7" />
179+ <unit type="Pressure" from="psi" to="torrs" value="51.7" />
180+ <unit type="Pressure" from="bars" to="kilopascals" value="100" />
181+ <unit type="Pressure" from="bars" to="mmHg" value="750.064" />
182+ <unit type="Pressure" from="bars" to="torrs" value="750.064" />
183+ <unit type="Pressure" from="kilopascals" to="mmHg" value="7.50064" />
184+ <unit type="Pressure" from="kilopascals" to="torrs" value="7.50064" />
185+ <unit type="Pressure" from="mmHg" to="torrs" value="1" />
186+ <unit type="Temperature" from="celsius" to="fahrenheit" value="* 9 div 5 + 32" />
187+ <unit type="Temperature" from="celsius" to="kelvin" value="+ 273.15" />
188+ <unit type="Temperature" from="kelvin" to="celsius" value="- 273.15" />
189+ <unit type="Temperature" from="kelvin" to="fahrenheit" value="* 9 div 5 - 273.15 * 9 div 5 + 32" />
190+ <unit type="Temperature" from="fahrenheit" to="celsius" value="* 5 div 9 - 32 * 5 div 9" />
191+ <unit type="Temperature" from="fahrenheit" to="kelvin" value="* 5 div 9 - 32 * 5 div 9 + 273.15" />
192+</unit-conversion-rules>
193+
194+let $from := $conversion-table/unit[@type=$t and @from=$m1] |
195+ ( for $it in $conversion-table/unit[@type=$t and @to=$m1] return
196+ if (compare($t, "Temperature") != 0) then
197+ copy $aux := $it
198+ modify (
199+ replace value of node $aux/@value with 1.0 div $aux/@value,
200+ replace value of node $aux/@from with $aux/@to,
201+ replace value of node $aux/@to with $aux/@from
202+ )
203+ return $aux
204+ else()
205+ )
206+
207+return
208+if (compare($t, "Temperature") = 0) then reflection:eval(concat($v , $conversion-table//unit[@from=$m1][@to=$m2]/@value))
209+else
210+ if ( $from[@to=$m2]) then ( $v * $from[@to=$m2]/@value )
211+ else ( for $i in $from return conversion:unit-convert ( $v * $i/@value , $t , $i/@to , $m2 ) )[1]
212 };
213
214 (:~
215 : Placename to geospatial coordinates converter, acting as a wrapper over the Yahoo! geocoder service.
216 :
217- : <br/>
218- : Example usage : <pre> geocode-from-address ( ("Lisboa", "Portugal") ) </pre>
219- : <br/>
220- : The function invocation in the example above returns : <pre> ( 38.725735 , -9.15021 ) </pre>
221 :
222 : @param $q A sequence of strings corresponding to the different components (e.g., street, city, country, etc.) of the place name.
223 : @return The pair of latitude and longitude coordinates associated with the input address.
224@@ -258,10 +315,6 @@
225 (:~
226 : Geospatial coordinates to placename converter, acting as a wrapper over the Yahoo! reverse geocoder service.
227 :
228- : <br/>
229- : Example usage : <pre> address-from-geocode ( 38.725735 , -9.15021 ) </pre>
230- : <br/>
231- : The function invocation in the example above returns : <pre> ( 'Portugal' , 'Lisbon' , 'praca Marques de Pombal' ) </pre>
232 :
233 : @param $lat Geospatial latitude.
234 : @param $lon Geospatial longitude.
235@@ -288,10 +341,6 @@
236 :
237 : WebService documentation at http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html
238 :
239- : <br/>
240- : Example usage : <pre> currency-convert ( 1, "USD", "EUR", "2011-01-18" ) </pre>
241- : <br/>
242- : The function invocation in the example above returns : <pre> 0.747887218607434 </pre>
243 :
244 : @param $v The amount we wish to convert.
245 : @param $m1 The source currency (e.g., "EUR").
246@@ -356,4 +405,3 @@
247 declare function conversion:name-from-domain ( $domain as xs:string ) {
248 ()
249 };
250-
251
252=== modified file 'src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq'
253--- src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq 2011-08-01 11:26:53 +0000
254+++ src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq 2011-11-16 18:49:23 +0000
255@@ -40,10 +40,6 @@
256 : Converts a given string representation of a date value into a date representation valid according
257 : to the corresponding XML Schema type.
258 :
259- : <br/>
260- : Example usage : <pre> to-date ( "24OCT2002" , "%d%b%Y" ) </pre>
261- : <br/>
262- : The function invocation in the example above returns : <pre> 2002-10-24 </pre>
263 :
264 : @param $sd The string representation for the date
265 : @param $format An optional parameter denoting the format used to represent the date in the string, according to a
266@@ -51,16 +47,10 @@
267 : by a single letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion
268 : specification is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows:
269 : <pre>
270- : '%a' Abbreviated weekday name in the current locale.<br/>
271- : '%A' Full weekday name in the current locale.<br/>
272 : '%b' Abbreviated month name in the current locale.<br/>
273 : '%B' Full month name in the current locale.<br/>
274 : '%d' Day of the month as decimal number (01-31).<br/>
275- : '%j' Day of year as decimal number (001-366).<br/>
276 : '%m' Month as decimal number (01-12).<br/>
277- : '%U' Week of the year as decimal number (00-53) using Sunday as the first day of the week (and typically with the first Sunday of the year as day 1 of week 1). This is the US convention.<br/>
278- : '%w' Weekday as decimal number (0-6, Sunday is 0).<br/>
279- : '%W' Week of the year as decimal number (00-53) using Monday as the first day of the week (and typically with the first Monday of the year as day 1 of week 1). This is the UK convention.<br/>
280 : '%x' Date, locale-specific.<br/>
281 : '%y' Year without century (00-99).<br/>
282 : '%Y' Year with century.<br/>
283@@ -68,23 +58,20 @@
284 : '%D' Locale-specific date format such as '%m/%d/%y'.<br/>
285 : '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number.<br/>
286 : '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format).<br/>
287- : '%g' The last two digits of the week-based year (see '%V').<br/>
288- : '%G' The week-based year (see '%V') as a decimal number.<br/>
289- : '%h' Equivalent to '%b'.<br/>
290- : '%u' Weekday as a decimal number (1-7, Monday is 1).<br/>
291- : '%V' Week of the year as decimal number (00-53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1.
292+ : '%h' Equivalent to '%b'.<br/>
293 :</pre>
294 :
295 : @return The date value resulting from the conversion.
296- : <br/><br/><b> Attention : This function is still not implemented. </b> <br/>
297+ : @example test/Queries/data-cleaning/normalization/to-date.xq
298 :)
299 declare function normalization:to-date ( $sd as xs:string, $format as xs:string? ) as xs:string{
300-(:
301+
302+
303 let $dictionary := normalization:month-dictionary()
304- let $format-tokens := tokenize($format, "%")[position()>1]
305+ let $format-tokens := tokenize($format, "[ %\-/:]+")[position()>1]
306 let $sd-tokens :=
307 if (contains($sd, "-") or contains($sd, "/") or contains($sd, " "))
308- then tokenize ($sd, "[ \-/]")
309+ then tokenize ($sd, "[ \-/]+")
310 else let $ydtoken := tokenize(replace($sd, "[A-Za-z]", " "), " ")
311 let $ft := $ydtoken[position()=1]
312 let $lt := $ydtoken[last()]
313@@ -154,25 +141,15 @@
314
315 let $result := concat($year, "-", $month, "-", $day)
316
317- return
318-
319- if (matches(string($result),"[0-9]+-((0[1-9])|(1[0-2]))-((0[1-9])|([12][0-9])|(3[01]))"))
320- then $result
321- else
322- (error(QName('http://www.zorba-xquery.com/modules/data-cleaning/normalization',
323- 'err:notsupported'),data(concat($result, " - ", concat("year: ", $year), concat(" month: ", $month), concat(" day:", $day)))))
324+ return normalization:check-date($result)
325 else()
326- :)""
327+
328 };
329
330 (:~
331 : Converts a given string representation of a time value into a time representation valid according to
332 : the corresponding XML Schema type.
333 :
334- : <br/>
335- : Example usage : <pre> to-time ( "09 hours 10 minutes" , "%H hours %M minutes" ) </pre>
336- : <br/>
337- : The function invocation in the example above returns : <pre> 09:10:00 </pre>
338 :
339 : @param $sd The string representation for the time.
340 : @param $format An optional parameter denoting the format used to represent the time in the string, according to a sequence of
341@@ -197,10 +174,12 @@
342 :</pre>
343 :
344 : @return The time value resulting from the conversion.
345+ : @example test/Queries/data-cleaning/normalization/to-time.xq
346 :)
347-declare function normalization:to-time ( $sd as xs:string, $format as xs:string? ) as xs:string{
348+declare function normalization:to-time ( $sd as xs:string, $format as xs:string? ) as xs:string?{
349 let $timezoneDict := normalization:timeZone-dictionary()
350- let $format-tokens := tokenize($format, "%")[position()>1]
351+ let $format-string := replace(replace ($format, '%R', '%H:%M'), '%T', '%H:%M:%S')
352+ let $format-tokens := tokenize($format-string, "( |%|:)+")[position()>1]
353 let $sd-tokens :=
354 if (contains($sd, ":") or contains($sd, ".") or contains($sd, " "))
355 then tokenize ($sd, "[ :\.]")
356@@ -313,7 +292,7 @@
357
358 if (count(index-of($format-tokens, "e")) != 0)
359 then concat("0", string($sd-tokens[position() = index-of($format-tokens, "e")]))
360- else "SND"
361+ else "00"
362
363 let $result :=
364
365@@ -439,7 +418,7 @@
366 else ()
367 else
368
369- (:z:)
370+
371 if (count(index-of($format-tokens, "z")) != 0)
372 then if (substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),1,1)='+')
373 then let $complement :=
374@@ -539,11 +518,7 @@
375
376 return
377
378- if (matches(string($result),"(([01][0-9])|(2[0-3])):[0-5][0-9]:[0-5][0-9]"))
379- then $result
380- else
381- (error(QName('http://www.zorba-xquery.com/modules/data-cleaning/normalization',
382- 'err:notsupported'),data(concat($result, " - ", concat("hours: ", $hours), concat(" minutes: ", $minutes), concat(" seconds:", $seconds)))))
383+ normalization:check-time($result)
384 else()
385
386 };
387@@ -552,10 +527,6 @@
388 : Converts a given string representation of a dateTime value into a dateTime representation
389 : valid according to the corresponding XML Schema type.
390 :
391- : <br/>
392- : Example usage : <pre> to-dateTime( "24OCT2002 21:22" , "%d%b%Y %H%M" ) </pre>
393- : <br/>
394- : The function invocation in the example above returns : <pre> 2002-20-24T21:22:00 </pre>
395 :
396 : @param $sd The string representation for the dateTime.
397 : @param $format An optional parameter denoting the format used to represent the dateTime in the string, according to a sequence
398@@ -564,11 +535,10 @@
399 : is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows:
400 :
401 : <pre>
402- : '%a' Abbreviated weekday name in the current locale.<br/>
403- : '%A' Full weekday name in the current locale.<br/>
404 : '%b' Abbreviated month name in the current locale.<br/>
405 : '%B' Full month name in the current locale.<br/>
406 : '%c' Date and time, locale-specific.<br/>
407+ : '%C' Century (00-99): the integer part of the year divided by 100.<br/>
408 : '%d' Day of the month as decimal number (01-31).<br/>
409 : '%H' Hours as decimal number (00-23).<br/>
410 : '%I' Hours as decimal number (01-12).<br/>
411@@ -577,16 +547,12 @@
412 : '%M' Minute as decimal number (00-59).<br/>
413 : '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'.<br/>
414 : '%S' Second as decimal number (00-61), allowing for up to two leap-seconds.<br/>
415- : '%U' Week of the year as decimal number (00-53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). This is the US convention.<br/>
416- : '%w' Weekday as decimal number (0-6, Sunday is 0).<br/>
417- : '%W' Week of the year as decimal number (00-53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). This is the UK convention.<br/>
418 : '%x' Date, locale-specific.<br/>
419 : '%X' Time, locale-specific.<br/>
420 : '%y' Year without century (00-99).<br/>
421 : '%Y' Year with century.<br/>
422 : '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich.<br/>
423 : '%Z' Time zone as a character string.<br/>
424- : '%C' Century (00-99): the integer part of the year divided by 100.<br/>
425 : '%D' Locale-specific date format such as '%m/%d/%y': ISO C99 says it should be that exact format.<br/>
426 : '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number.<br/>
427 : '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format).<br/>
428@@ -598,19 +564,16 @@
429 : '%r' The 12-hour clock time (using the locale's AM or PM).<br/>
430 : '%R' Equivalent to '%H:%M'.<br/>
431 : '%T' Equivalent to '%H:%M:%S'.<br/>
432- : '%u' Weekday as a decimal number (1-7, Monday is 1).<br/>
433- : '%V' Week of the year as decimal number (00-53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1.
434 :</pre>
435 :
436 : @return The dateTime value resulting from the conversion.
437- : <br/><br/><b> Attention : This function is still not implemented. </b> <br/>
438- :
439+ : @example test/Queries/data-cleaning/normalization/to-dateTime.xq
440 :)
441 declare function normalization:to-dateTime ( $sd as xs:string, $format as xs:string? ) as xs:string {
442-(:
443 let $timezoneDict := normalization:timeZone-dictionary()
444- let $monthDict := normalization:month-dictionary()
445- let $format-tokens := tokenize($format, "[ \-%]+")[position()>1]
446+ let $monthDict := normalization:month-dictionary()
447+ let $format-string := replace(replace(replace ($format, '%R', '%H:%M'), '%T', '%H:%M:%S'), '%F', '%Y-%m-%d')
448+ let $format-tokens := tokenize($format-string, "[ %\-/:\.]+")[position()>1]
449 let $sdt :=
450 if (contains($sd, ":") or contains($sd, ".") or contains($sd, " ") or contains($sd, "-")
451 or contains($sd, "/"))
452@@ -801,7 +764,7 @@
453
454 if (count(index-of($format-tokens, "e")) != 0)
455 then concat("0", string($sd-tokens[position() = index-of($format-tokens, "e")]))
456- else "SND"
457+ else "00"
458
459 let $result :=
460
461@@ -814,6 +777,50 @@
462 then 1
463 else 0
464
465+ let $dayscomplement :=
466+ if (number($complement) + number($hours) + number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() =index-of($format-tokens, "Z")]]),2,2)) >= 24)
467+ then 1
468+ else 0
469+
470+ let $monthscomplement :=
471+ if(($dayscomplement + number($day) > 28) and (compare($month, '02') = 0) and (number($year) mod 4 != 0))
472+ then 1
473+ else
474+ if(($dayscomplement + number($day) > 30) and ((compare($month, '04') = 0) or (compare($month, '06') = 0) or (compare($month, '09') = 0) or (compare($month, '11') = 0)))
475+ then 1
476+ else
477+ if(($dayscomplement + number($day) > 31) and ((compare($month, '04') = 0) or (compare($month, '01') = 0) or (compare($month, '03') = 0) or (compare($month, '05') = 0) or (compare($month, '07') = 0) or (compare($month, '08') = 0) or (compare($month, '10') = 0) or (compare($month, '12') = 0)))
478+ then 1
479+ else
480+ if(($dayscomplement + number($day) > 29) and (compare($month, '02') = 0) and (number($year) mod 4 = 0))
481+ then 1
482+ else 0
483+
484+ let $ryear :=
485+ if ($monthscomplement + number($month) > 12)
486+ then string(number($year) + 1)
487+ else $year
488+
489+ let $daywcompl :=
490+ if ($monthscomplement = 1)
491+ then 1
492+ else number($day) + $dayscomplement
493+
494+ let $monthwcompl :=
495+ if($monthscomplement + number($month) <= 12)
496+ then number($month) + $monthscomplement
497+ else 1
498+
499+ let $rday :=
500+ if (string-length(string($daywcompl)) = 1)
501+ then concat ('0', string($daywcompl))
502+ else string($daywcompl)
503+
504+ let $rmonth :=
505+ if (string-length(string($monthwcompl)) = 1)
506+ then concat ('0', string($monthwcompl))
507+ else string($monthwcompl)
508+
509 let $rhours :=
510 if (string-length(string(
511 (number($complement) + number($hours) +
512@@ -845,7 +852,7 @@
513 index-of($format-tokens, "Z")]]),4,2))) mod 60))
514
515
516- return concat($year, "-", $month, "-", $day, "T", $rhours, ":", $rminutes, ":", $seconds)
517+ return concat($ryear, "-", $rmonth, "-", $rday, "T", $rhours, ":", $rminutes, ":", $seconds)
518 else
519
520 if (substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() =
521@@ -853,10 +860,61 @@
522 then
523 let $complement :=
524 if (number($minutes)-number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() =
525- index-of($format-tokens, "Z")]]),4,2)) < 0)
526+ index-of($format-tokens, "Z")]]),2,2)) < 0)
527 then -1
528 else 0
529
530+ let $dayscomplement :=
531+ if (number($complement) - number($hours) - number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position()=
532+ index-of($format-tokens, "Z")]]),2,2)) < 0)
533+ then -1
534+ else 0
535+
536+ let $monthcomplement :=
537+ if(number($day) + $dayscomplement < 1)
538+ then -1
539+ else 0
540+
541+ let $yearcomplement :=
542+ if(number($month) + $monthcomplement< 1)
543+ then -1
544+ else 0
545+
546+ let $daywcompl :=
547+ if ($monthcomplement = 0)
548+ then number($day) + $dayscomplement
549+ else
550+ if ( (number($month) = 5) or (number($month) = 7) or (number($month) = 10) or (number($month) = 12))
551+ then 30
552+ else
553+ if((number($month) = 4) or (number($month) = 6) or (number($month) = 9) or (number($month) = 11) or (number($month) = 2) or (number($month) = 1) or (number($month) = 8))
554+ then 31
555+ else
556+ if((number($month) = 3) and (number($year) mod 4 != 0))
557+ then 28
558+ else
559+ if((number($month) = 3) and (number($year) mod 4 = 0))
560+ then 29
561+ else number($day) + $dayscomplement
562+
563+ let $monthwcompl:=
564+ if($yearcomplement = 0)
565+ then number($month) + $monthcomplement
566+ else 12
567+
568+ let $ryear :=
569+ number($year) + $yearcomplement
570+
571+ let $rday :=
572+ if (string-length(string($daywcompl)) = 1)
573+ then concat ('0', string($daywcompl))
574+ else string($daywcompl)
575+
576+ let $rmonth :=
577+ if (string-length(string($monthwcompl)) = 1)
578+ then concat ('0', string($monthwcompl))
579+ else string($monthwcompl)
580+
581 let $rhours :=
582 if( ((number($complement) + number($hours) -
583 number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() =
584@@ -923,7 +981,7 @@
585 number(substring(string($timezoneDict//timeZone/@value[../@name=$sd-tokens[position() =
586 index-of($format-tokens, "Z")]]),2,2)))) mod 60))
587
588- return concat($year, "-", $month, "-", $day, "T", $rhours, ":", $rminutes, ":", $seconds)
589+ return concat($ryear, "-", $rmonth, "-", $rday, "T", $rhours, ":", $rminutes, ":", $seconds)
590 else ()
591 else
592
593@@ -931,8 +989,52 @@
594 if (count(index-of($format-tokens, "z")) != 0)
595 then if (substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),1,1)='+')
596 then let $complement :=
597- if (number($minutes)+number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2)) > 59) then 1
598- else 0
599+ if (number($minutes)+number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2)) > 59) then 1
600+ else 0
601+
602+ let $dayscomplement :=
603+ if (number($complement) + number($hours) + number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),2,2)) >= 24)
604+ then 1
605+ else 0
606+
607+ let $monthscomplement :=
608+ if(($dayscomplement + number($day) > 28) and (compare($month, '02') = 0) and (number($year) mod 4 != 0))
609+ then 1
610+ else
611+ if(($dayscomplement + number($day) > 30) and ((compare($month, '04') = 0) or (compare($month, '06') = 0) or (compare($month, '09') = 0) or (compare($month, '11') = 0)))
612+ then 1
613+ else
614+ if(($dayscomplement + number($day) > 31) and ((compare($month, '04') = 0) or (compare($month, '01') = 0) or (compare($month, '03') = 0) or (compare($month, '05') = 0) or (compare($month, '07') = 0) or (compare($month, '08') = 0) or (compare($month, '10') = 0) or (compare($month, '12') = 0)))
615+ then 1
616+ else
617+ if(($dayscomplement + number($day) > 29) and (compare($month, '02') = 0) and (number($year) mod 4 = 0))
618+ then 1
619+ else 0
620+
621+ let $ryear :=
622+ if ($monthscomplement + number($month) > 12)
623+ then string(number($year) + 1)
624+ else $year
625+
626+ let $daywcompl :=
627+ if ($monthscomplement = 1)
628+ then 1
629+ else number($day) + $dayscomplement
630+
631+ let $monthwcompl :=
632+ if($monthscomplement + number($month) <= 12)
633+ then number($month) + $monthscomplement
634+ else 1
635+
636+ let $rday :=
637+ if (string-length(string($daywcompl)) = 1)
638+ then concat ('0', string($daywcompl))
639+ else string($daywcompl)
640+
641+ let $rmonth :=
642+ if (string-length(string($monthwcompl)) = 1)
643+ then concat ('0', string($monthwcompl))
644+ else string($monthwcompl)
645
646 let $rhours :=
647 if (string-length(string(
648@@ -959,15 +1061,65 @@
649 number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2))) mod 60))
650
651
652- return concat($year, "-", $month, "-", $day, "T", $rhours, ":", $rminutes, ":", $seconds)
653+ return concat($ryear, "-", $rmonth, "-", $rday, "T", $rhours, ":", $rminutes, ":", $seconds)
654 else
655
656 if (substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),1,1)='-')
657 then
658 let $complement :=
659- if (number($minutes)-number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2)) < 0) then -1
660- else 0
661-
662+ if (number($minutes)-number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),4,2)) < 0) then -1
663+ else 0
664+
665+ let $dayscomplement :=
666+ if (number($complement) - number($hours) - number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),2,2)) < 0)
667+ then -1
668+ else 0
669+
670+ let $monthcomplement :=
671+ if(number($day) + $dayscomplement< 1)
672+ then -1
673+ else 0
674+
675+ let $yearcomplement :=
676+ if(number($month) + $monthcomplement< 1)
677+ then -1
678+ else 0
679+
680+ let $daywcompl :=
681+ if ($monthcomplement = 0)
682+ then number($day) + $dayscomplement
683+ else
684+ if ( (number($month) = 5) or (number($month) = 7) or (number($month) = 10) or (number($month) = 12))
685+ then 30
686+ else
687+ if((number($month) = 4) or (number($month) = 6) or (number($month) = 9) or (number($month) = 11) or (number($month) = 2) or (number($month) = 1) or (number($month) = 8))
688+ then 31
689+ else
690+ if((number($month) = 3) and (number($year) mod 4 != 0))
691+ then 28
692+ else
693+ if((number($month) = 3) and (number($year) mod 4 = 0))
694+ then 29
695+ else number($day) + $dayscomplement
696+
697+ let $monthwcompl:=
698+ if($yearcomplement = 0)
699+ then number($month) + $monthcomplement
700+ else 12
701+
702+ let $ryear :=
703+ number($year) + $yearcomplement
704+
705+ let $rday :=
706+ if (string-length(string($daywcompl)) = 1)
707+ then concat ('0', string($daywcompl))
708+ else string($daywcompl)
709+
710+ let $rmonth :=
711+ if (string-length(string($monthwcompl)) = 1)
712+ then concat ('0', string($monthwcompl))
713+ else string($monthwcompl)
714+
715 let $rhours :=
716 if( ((number($complement) + number($hours) -
717 number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),2,2))) mod 24) >= 0 )
718@@ -1020,31 +1172,20 @@
719 (60 - -(number($minutes) -
720 number(substring(string($sd-tokens[position() = index-of($format-tokens, "z")]),2,2)))) mod 60))
721
722- return concat($year, "-", $month, "-", $day, "T", $rhours, ":", $rminutes, ":", $seconds)
723+ return concat($ryear, "-", $rmonth, "-", $rday, "T", $rhours, ":", $rminutes, ":", $seconds)
724 else ()
725 else
726 concat($year, "-", $month, "-", $day, "T", $hours, ":", $minutes, ":", $seconds)
727
728- return
729-
730- if (matches(string($result),"[0-9]+-((0[1-9])|(1[0-2]))-((0[1-9])|([12][0-9])|(3[01]))T(([01][0-9])|(2[0-3])):[0-5][0-9]:[0-5][0-9]"))
731- then $result
732- else
733- (error(QName('http://www.zorba-xquery.com/modules/data-cleaning/normalization',
734- 'err:notsupported'),data(concat($result, " - ", concat("hours: ", $hours), concat(" minutes: ", $minutes), concat(" seconds:", $seconds)))))
735-
736+ return
737+ normalization:check-dateTime($result)
738 else()
739-:)""
740 };
741
742 (:~
743 : Uses an address normalization Web service to convert a postal address given as input into a
744 : cannonical representation format.
745 :
746- : <br/>
747- : Example usage : <pre> normalize-address ( ( 'Marques de Pombal' , 'Lisboa' ) ) </pre>
748- : <br/>
749- : The function invocation in the example above returns : <pre> ( 'Portugal' , 'Lisbon' , 'praca Marques de Pombal' ) </pre>
750 :
751 : @param $addr A sequence of strings encoding an address, where each string in the sequence corresponds to a different component (e.g., street, city, country, etc.) of the address.
752 : @return A sequence of strings with the address encoded in a cannonical format, where each string in the sequence corresponds to a different component (e.g., street, city, country, etc.) of the address.
753@@ -1315,7 +1456,7 @@
754 : Internal auxiliary function that returns an XML representation for a dictionary that contains a
755 : numeric value associated to different month name abbreviations.
756 :)
757-declare %private function normalization:month-dictionary() as node(){
758+declare %private function normalization:month-dictionary() as element(){
759 let $dictionary :=
760 <dictionary>
761 <month name="January" value="01">
762@@ -1380,3 +1521,40 @@
763 </dictionary>
764 return $dictionary
765 };
766+
767+(:~
768+ : Internal auxiliary function that checks if a string is in xs:dateTime format
769+ :
770+ :
771+ : @param $dateTime The string representation for the dateTime.
772+ : @return The dateTime string if it represents the xs:dateTime format.
773+ :)
774+declare %private function normalization:check-dateTime($dateTime as xs:string) as xs:string{
775+ concat(string(year-from-dateTime(xs:dateTime($dateTime))), substring($dateTime,5))
776+};
777+
778+(:~
779+ : Internal auxiliary function that checks if a string is in xs:date format
780+ :
781+ :
782+ : @param $dateTime The string representation for the date.
783+ : @return The date string if it represents the xs:date format.
784+ :)
785+declare %private function normalization:check-date($date as xs:string) as xs:string{
786+ concat(string(year-from-date(xs:date($date))), substring($date,5))
787+};
788+
789+(:~
790+ : Internal auxiliary function that checks if a string is in xs:time format
791+ :
792+ :
793+ : @param $dateTime The string representation for the time.
794+ : @return The time string if it represents the xs:time format.
795+ :)
796+declare %private function normalization:check-time($Time as xs:string) as xs:string{
797+ if(string(hours-from-time(xs:time($Time))))
798+ then $Time
799+ else()
800+};
801+
802+
803
804=== modified file 'src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq'
805--- src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq 2011-08-01 11:26:53 +0000
806+++ src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq 2011-11-16 18:49:23 +0000
807@@ -91,7 +91,7 @@
808 let $aux3 := replace(replace($aux2,"MB","M"),"B$","")
809 let $aux4 := replace(replace(replace(replace(replace($aux3,"CIA","XIA"),"SCH","SKH"),"CH","XH"),"C([IEY])","S$1"),"C","K")
810 let $aux5 := replace(replace($aux4,"DG([EYI])","JG$1"),"D","T")
811- let $aux6 := replace(replace($aux5,"GH([^AEIOU])","H$1"),"G(N(ED)?)^","$1")
812+ let $aux6 := replace(replace($aux5,"GH([^AEIOU])","H$1"),"G(N(ED)?)$","$1")
813 let $aux7 := replace(replace(replace($aux6,"([^G]?)G([IEY])","$1J$2"),"([^G]?)G","$1K"),"GG","G")
814 let $aux8 := replace(replace(replace(replace($aux7,"([AEIOU])H([^AEIOU])","$1$2"),"CK","K"),"PH","F"),"Q","K")
815 let $aux9 := replace(replace(replace(replace(replace($aux8,"S(H|(IO)|(IA))","X$1"),"T((IO)|(IA))","X$1"),"TH","0"),"TCH","CH"),"V","F")
816
817=== modified file 'test/ExpQueryResults/data-cleaning/conversion/address-from-user.xml.res'
818--- test/ExpQueryResults/data-cleaning/conversion/address-from-user.xml.res 2011-07-19 19:12:03 +0000
819+++ test/ExpQueryResults/data-cleaning/conversion/address-from-user.xml.res 2011-11-16 18:49:23 +0000
820@@ -1,1 +1,1 @@
821-3362 Walden Ave, Depew, NY, US 222 E 53rd St, Los Angeles, CA, US
822\ No newline at end of file
823+222 E 53rd St, Los Angeles, CA, US
824
825=== modified file 'test/ExpQueryResults/data-cleaning/conversion/geocode-from-address.xml.res'
826--- test/ExpQueryResults/data-cleaning/conversion/geocode-from-address.xml.res 2011-07-19 19:12:03 +0000
827+++ test/ExpQueryResults/data-cleaning/conversion/geocode-from-address.xml.res 2011-11-16 18:49:23 +0000
828@@ -1,1 +1,1 @@
829-38.725735 -9.15021
830\ No newline at end of file
831+38 -10
832
833=== modified file 'test/ExpQueryResults/data-cleaning/conversion/phone-from-user.xml.res'
834--- test/ExpQueryResults/data-cleaning/conversion/phone-from-user.xml.res 2011-07-19 19:12:03 +0000
835+++ test/ExpQueryResults/data-cleaning/conversion/phone-from-user.xml.res 2011-11-16 18:49:23 +0000
836@@ -1,1 +1,1 @@
837-(716) 686-4500
838\ No newline at end of file
839+(661) 397-4236 (310) 513-0752 (510) 259-0456 (831) 385-3605 (213) 627-0188 (323) 846-1235 (661) 224-1072 (909) 820-3137 (916) 627-1090 (707) 938-9861 (805) 648-6417
840
841=== modified file 'test/ExpQueryResults/data-cleaning/conversion/user-from-phone.xml.res'
842--- test/ExpQueryResults/data-cleaning/conversion/user-from-phone.xml.res 2011-09-02 09:14:39 +0000
843+++ test/ExpQueryResults/data-cleaning/conversion/user-from-phone.xml.res 2011-11-16 18:49:23 +0000
844@@ -1,1 +1,1 @@
845-Homer V Simpson Homer Simpson Sue M Simpson
846\ No newline at end of file
847+Gene Simpson Homer V Simpson Homer Simpson Sue M Simpson
848
849=== modified file 'test/ExpQueryResults/data-cleaning/normalization/to-date.xml.res'
850--- test/ExpQueryResults/data-cleaning/normalization/to-date.xml.res 2011-07-28 23:25:13 +0000
851+++ test/ExpQueryResults/data-cleaning/normalization/to-date.xml.res 2011-11-16 18:49:23 +0000
852@@ -0,0 +1,1 @@
853+2002-10-24
854
855=== modified file 'test/ExpQueryResults/data-cleaning/normalization/to-dateTime.xml.res'
856--- test/ExpQueryResults/data-cleaning/normalization/to-dateTime.xml.res 2011-07-19 19:12:03 +0000
857+++ test/ExpQueryResults/data-cleaning/normalization/to-dateTime.xml.res 2011-11-16 18:49:23 +0000
858@@ -0,0 +1,1 @@
859+2002-10-24T21:22:00
860
861=== modified file 'test/ExpQueryResults/data-cleaning/normalization/to-time.xml.res'
862--- test/ExpQueryResults/data-cleaning/normalization/to-time.xml.res 2011-07-19 19:12:03 +0000
863+++ test/ExpQueryResults/data-cleaning/normalization/to-time.xml.res 2011-11-16 18:49:23 +0000
864@@ -0,0 +1,1 @@
865+09:10:00
866
867=== modified file 'test/Queries/data-cleaning/conversion/geocode-from-address.xq'
868--- test/Queries/data-cleaning/conversion/geocode-from-address.xq 2011-07-19 19:12:03 +0000
869+++ test/Queries/data-cleaning/conversion/geocode-from-address.xq 2011-11-16 18:49:23 +0000
870@@ -1,3 +1,5 @@
871 import module namespace conversion = "http://www.zorba-xquery.com/modules/data-cleaning/conversion";
872
873-conversion:geocode-from-address ( ("Lisboa", "Portugal") )
874+let $geocode := conversion:geocode-from-address ( ("Lisboa", "Portugal") )
875+for $result in $geocode
876+return floor($result)
877
878=== removed file 'test/Queries/data-cleaning/conversion/unit-convert.spec'
879--- test/Queries/data-cleaning/conversion/unit-convert.spec 2011-07-28 23:25:13 +0000
880+++ test/Queries/data-cleaning/conversion/unit-convert.spec 1970-01-01 00:00:00 +0000
881@@ -1,1 +0,0 @@
882-Error: http://expath.org/ns/error:HC002

Subscribers

People subscribed via source and target branches

to all changes: