Merge lp:~zorba-coders/zorba/fix-soundex_key into lp:zorba/data-cleaning-module

Proposed by Matthias Brantner
Status: Merged
Approved by: Matthias Brantner
Approved revision: 47
Merged at revision: 46
Proposed branch: lp:~zorba-coders/zorba/fix-soundex_key
Merge into: lp:zorba/data-cleaning-module
Diff against target: 170 lines (+137/-11)
3 files modified
src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq (+7/-8)
test/ExpQueryResults/data-cleaning/phonetic-string-similarity/soundex-key.xml.res (+1/-1)
test/Queries/data-cleaning/phonetic-string-similarity/soundex-key.xq (+129/-2)
To merge this branch: bzr merge lp:~zorba-coders/zorba/fix-soundex_key
Reviewer Review Type Date Requested Status
Bruno Martins Approve
Matthias Brantner Approve
Review via email: mp+164561@code.launchpad.net

Commit message

fix and tests for soundex-key function

To post a comment you must log in.
Revision history for this message
Matthias Brantner (matthias-brantner) :
review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job fix-soundex_key-2013-05-18T00-44-28.084Z is finished. The final status was:

All tests succeeded!

Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Voting does not meet specified criteria. Required: Approve > 1, Disapprove < 1, Needs Fixing < 1, Pending < 1, Needs Information < 1, Resubmit < 1. Got: 1 Approve.

Revision history for this message
Bruno Martins (bgmartins) wrote :

The previous version of the soundex-key function had indeed a problem with the string "3-D Adventure", returning a stack overflow since it as recursively trying to produce a soundex key with just 4 characters. The new implementation fixes this, and also seems to be correct.

I've approved the revision.

Nonetheless, it makes no sense to call "soundex-key" with an input that does not correspond to a single English word. Maybe the documentation should be changed in order to indicate this.

review: Approve
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :
Revision history for this message
Zorba Build Bot (zorba-buildbot) wrote :

Validation queue job fix-soundex_key-2013-05-21T14-25-30.124Z is finished. The final status was:

All tests succeeded!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq'
2--- src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq 2012-09-28 13:34:20 +0000
3+++ src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq 2013-05-18 00:43:24 +0000
4@@ -45,14 +45,13 @@
5 : @return The Soundex key for the given input string.
6 : @example test/Queries/data-cleaning/phonetic-string-similarity/soundex-key.xq
7 :)
8-declare function simp:soundex-key ( $s1 as xs:string ) as xs:string {
9- let $group1 := replace(upper-case(substring($s1,2)),"[BFPV]","1")
10- let $groups := replace(replace(replace(replace(replace(replace($group1,"[CGJKQSXZ]","2"),"[DT]","3"),"L","4"),"[MN]","5"),"R","6"),"[^1-6]","")
11- let $merge := replace($groups,"([1-6])\1","$1")
12- let $result := concat(upper-case(substring($s1,1,1)), $merge)
13- return if (string-length($result) > 4 and matches($result,"([1-6])\1"))
14- then (simp:soundex-key($result))
15- else (substring(concat($result,"0000"),1,4))
16+declare function simp:soundex-key ( $s1 as xs:string ) as xs:string {
17+ let $clean := replace(replace(replace(replace(replace(replace(replace(upper-case($s1),"[^1-9A-Z]",""),"([BFPV])[HW]*[BFPV]","$1"),"([CGJKQSXZ])[HW]*[CGJKQSXZ]","$1"),"([DT])[HW]*[DT]","$1"),"([L])[HW]*[L]","$1"),"([MN])[HW]*[MN]","$1"),"([R])[HW]*[R]","$1")
18+ let $first := substring($clean,1,1)
19+ let $suffix := replace(replace(replace(replace(replace(replace(substring($clean,2),"[BFPV]","1"),"[CGJKQSXZ]","2"),"[DT]","3"),"L","4"),"[MN]","5"),"[R]","6")
20+ let $merge := replace(replace($suffix, "([1-6])\1","$1"),"[^1-6]", "")
21+ let $result := concat($first, $merge)
22+ return substring(concat($result,"0000"),1,4)
23 };
24
25 (:~
26
27=== modified file 'test/ExpQueryResults/data-cleaning/phonetic-string-similarity/soundex-key.xml.res'
28--- test/ExpQueryResults/data-cleaning/phonetic-string-similarity/soundex-key.xml.res 2011-07-19 19:12:03 +0000
29+++ test/ExpQueryResults/data-cleaning/phonetic-string-similarity/soundex-key.xml.res 2013-05-18 00:43:24 +0000
30@@ -1,1 +1,1 @@
31-R163
32\ No newline at end of file
33+R163 true
34
35=== modified file 'test/Queries/data-cleaning/phonetic-string-similarity/soundex-key.xq'
36--- test/Queries/data-cleaning/phonetic-string-similarity/soundex-key.xq 2011-07-19 19:12:03 +0000
37+++ test/Queries/data-cleaning/phonetic-string-similarity/soundex-key.xq 2013-05-18 00:43:24 +0000
38@@ -1,3 +1,130 @@
39-import module namespace simp = "http://www.zorba-xquery.com/modules/data-cleaning/phonetic-string-similarity";
40+import module namespace simpl = "http://www.zorba-xquery.com/modules/data-cleaning/phonetic-string-similarity";
41
42-simp:soundex-key("Robert")
43+simpl:soundex-key("Robert"),
44+simpl:soundex-key("BARHAM") eq "B650" and
45+simpl:soundex-key("BARONE") eq "B650" and
46+simpl:soundex-key("BARRON") eq "B650" and
47+simpl:soundex-key("BERNA") eq "B650" and
48+simpl:soundex-key("BIRNEY") eq "B650" and
49+simpl:soundex-key("BIRNIE") eq "B650" and
50+simpl:soundex-key("BOOROM") eq "B650" and
51+simpl:soundex-key("BOREN") eq "B650" and
52+simpl:soundex-key("BORN") eq "B650" and
53+simpl:soundex-key("BOURN") eq "B650" and
54+simpl:soundex-key("BOURNE") eq "B650" and
55+simpl:soundex-key("BOWRON") eq "B650" and
56+simpl:soundex-key("BRAIN") eq "B650" and
57+simpl:soundex-key("BRAME") eq "B650" and
58+simpl:soundex-key("BRANN") eq "B650" and
59+simpl:soundex-key("BRAUN") eq "B650" and
60+simpl:soundex-key("BREEN") eq "B650" and
61+simpl:soundex-key("BRIEN") eq "B650" and
62+simpl:soundex-key("BRIM") eq "B650" and
63+simpl:soundex-key("BRIMM") eq "B650" and
64+simpl:soundex-key("BRINN") eq "B650" and
65+simpl:soundex-key("BRION") eq "B650" and
66+simpl:soundex-key("BROOM") eq "B650" and
67+simpl:soundex-key("BROOME") eq "B650" and
68+simpl:soundex-key("BROWN") eq "B650" and
69+simpl:soundex-key("BROWNE") eq "B650" and
70+simpl:soundex-key("BRUEN") eq "B650" and
71+simpl:soundex-key("BRUHN") eq "B650" and
72+simpl:soundex-key("BRUIN") eq "B650" and
73+simpl:soundex-key("BRUMM") eq "B650" and
74+simpl:soundex-key("BRUN") eq "B650" and
75+simpl:soundex-key("BRUNO") eq "B650" and
76+simpl:soundex-key("BRYAN") eq "B650" and
77+simpl:soundex-key("BURIAN") eq "B650" and
78+simpl:soundex-key("BURN") eq "B650" and
79+simpl:soundex-key("BURNEY") eq "B650" and
80+simpl:soundex-key("BYRAM") eq "B650" and
81+simpl:soundex-key("BYRNE") eq "B650" and
82+simpl:soundex-key("BYRON") eq "B650" and
83+simpl:soundex-key("BYRUM") eq "B650" and
84+"T235" eq simpl:soundex-key("testing") and
85+"T000" eq simpl:soundex-key("The") and
86+"Q200" eq simpl:soundex-key("quick") and
87+"B650" eq simpl:soundex-key("brown") and
88+"F200" eq simpl:soundex-key("fox") and
89+"J513" eq simpl:soundex-key("jumped") and
90+"O160" eq simpl:soundex-key("over") and
91+"T000" eq simpl:soundex-key("the") and
92+"L200" eq simpl:soundex-key("lazy") and
93+"D200" eq simpl:soundex-key("dogs") and
94+"A462" eq simpl:soundex-key("Allricht") and
95+"E166" eq simpl:soundex-key("Eberhard") and
96+"E521" eq simpl:soundex-key("Engebrethson") and
97+"H512" eq simpl:soundex-key("Heimbach") and
98+"H524" eq simpl:soundex-key("Hanselmann") and
99+"H431" eq simpl:soundex-key("Hildebrand") and
100+"K152" eq simpl:soundex-key("Kavanagh") and
101+"L530" eq simpl:soundex-key("Lind") and
102+"L222" eq simpl:soundex-key("Lukaschowsky") and
103+"M235" eq simpl:soundex-key("McDonnell") and
104+"M200" eq simpl:soundex-key("McGee") and
105+"O155" eq simpl:soundex-key("Opnian") and
106+"O155" eq simpl:soundex-key("Oppenheimer") and
107+"R355" eq simpl:soundex-key("Riedemanas") and
108+"Z300" eq simpl:soundex-key("Zita") and
109+"Z325" eq simpl:soundex-key("Zitzmeinn") and
110+"W252" eq simpl:soundex-key("Washington") and
111+"L000" eq simpl:soundex-key("Lee") and
112+"G362" eq simpl:soundex-key("Gutierrez") and
113+"P236" eq simpl:soundex-key("Pfister") and
114+"J250" eq simpl:soundex-key("Jackson") and
115+"T522" eq simpl:soundex-key("Tymczak") and
116+"V532" eq simpl:soundex-key("VanDeusen") and
117+"H452" eq simpl:soundex-key("HOLMES") and
118+"A355" eq simpl:soundex-key("ADOMOMI") and
119+"V536" eq simpl:soundex-key("VONDERLEHR") and
120+"B400" eq simpl:soundex-key("BALL") and
121+"S000" eq simpl:soundex-key("SHAW") and
122+"J250" eq simpl:soundex-key("JACKSON") and
123+"S545" eq simpl:soundex-key("SCANLON") and
124+"S532" eq simpl:soundex-key("SAINTJOHN") and
125+simpl:soundex-key("KINGSMITH") eq "K525" and
126+simpl:soundex-key("-KINGSMITH") eq "K525" and
127+simpl:soundex-key("K-INGSMITH") eq "K525" and
128+simpl:soundex-key("KI-NGSMITH") eq "K525" and
129+simpl:soundex-key("KIN-GSMITH") eq "K525" and
130+simpl:soundex-key("KING-SMITH") eq "K525" and
131+simpl:soundex-key("KINGS-MITH") eq "K525" and
132+simpl:soundex-key("KINGSM-ITH") eq "K525" and
133+simpl:soundex-key("KINGSMI-TH") eq "K525" and
134+simpl:soundex-key("KINGSMIT-H") eq "K525" and
135+simpl:soundex-key("KINGSMITH-") eq "K525" and
136+simpl:soundex-key("Ashcraft") eq "A261" and
137+simpl:soundex-key("BOOTHDAVIS") eq "B312" and
138+simpl:soundex-key("BOOTH-DAVIS") eq "B312" and
139+simpl:soundex-key("Sgler") eq "S460" and
140+simpl:soundex-key("Swhgler") eq "S460" and
141+simpl:soundex-key("Swhgler") eq "S460" and
142+simpl:soundex-key("SAILOR") eq "S460" and
143+simpl:soundex-key("SALYER") eq "S460" and
144+simpl:soundex-key("SAYLOR") eq "S460" and
145+simpl:soundex-key("SCHALLER") eq "S460" and
146+simpl:soundex-key("SCHELLER") eq "S460" and
147+simpl:soundex-key("SCHILLER") eq "S460" and
148+simpl:soundex-key("SCHOOLER") eq "S460" and
149+simpl:soundex-key("SCHULER") eq "S460" and
150+simpl:soundex-key("SCHUYLER") eq "S460" and
151+simpl:soundex-key("SEILER") eq "S460" and
152+simpl:soundex-key("SEYLER") eq "S460" and
153+simpl:soundex-key("SHOLAR") eq "S460" and
154+simpl:soundex-key("SHULER") eq "S460" and
155+simpl:soundex-key("SILAR") eq "S460" and
156+simpl:soundex-key("SILER") eq "S460" and
157+simpl:soundex-key("SILLER") eq "S460" and
158+simpl:soundex-key("Smith") eq "S530" and
159+simpl:soundex-key("Smythe") eq "S530" and
160+"A500" eq simpl:soundex-key("Ann") and
161+"A536" eq simpl:soundex-key("Andrew") and
162+"J530" eq simpl:soundex-key("Janet") and
163+"M626" eq simpl:soundex-key("Margaret") and
164+"S315" eq simpl:soundex-key("Steven") and
165+"M240" eq simpl:soundex-key("Michael") and
166+"R163" eq simpl:soundex-key("Robert") and
167+"L600" eq simpl:soundex-key("Laura") and
168+"A500" eq simpl:soundex-key("Anne") and
169+"W452" eq simpl:soundex-key("Williams") and
170+"3331" eq simpl:soundex-key("3-D ADVENTURE")

Subscribers

People subscribed via source and target branches

to all changes: