Merge lp:~cyphermox/libcolumbus/abi1 into lp:libcolumbus

Proposed by Mathieu Trudel-Lapierre
Status: Merged
Approved by: Jussi Pakkanen
Approved revision: 455
Merged at revision: 452
Proposed branch: lp:~cyphermox/libcolumbus/abi1
Merge into: lp:libcolumbus
Diff against target: 1828 lines (+562/-247)
57 files modified
CMakeLists.txt (+6/-5)
cmake/isclang.cc (+0/-26)
cmake/pch.cmake (+4/-2)
cmake/python.cmake (+10/-7)
coding style.txt (+2/-1)
debian/changelog (+6/-0)
debian/control (+9/-6)
debian/libcolumbus1-common.install (+1/-1)
debian/libcolumbus1-dev.install (+1/-1)
debian/rules (+1/-1)
include/CMakeLists.txt (+4/-4)
include/ColumbusCore.hh.in (+1/-1)
include/ColumbusHelpers.hh (+2/-2)
include/Corpus.hh (+3/-3)
include/Document.hh (+3/-1)
include/ErrorMatrix.hh (+4/-1)
include/ErrorValues.hh (+9/-3)
include/IndexMatches.hh (+3/-5)
include/IndexWeights.hh (+2/-1)
include/LevenshteinIndex.hh (+3/-5)
include/MatchResults.hh (+6/-1)
include/Matcher.hh (+14/-6)
include/MatcherStatistics.hh (+1/-1)
include/ResultFilter.hh (+3/-1)
include/SearchParameters.hh (+53/-0)
include/Trie.hh (+4/-1)
include/Word.hh (+1/-1)
include/WordList.hh (+3/-1)
include/WordStore.hh (+4/-1)
include/columbus.h (+1/-1)
python/CMakeLists.txt (+9/-10)
python/columbus.cc (+8/-6)
python/columbus.py (+0/-28)
share/CMakeLists.txt (+2/-1)
share/greekAccentedLetterGroups.txt (+7/-0)
src/CMakeLists.txt (+1/-0)
src/ColumbusCAPI.cc (+5/-3)
src/ColumbusHelpers.cc (+6/-4)
src/Document.cc (+5/-3)
src/ErrorValues.cc (+10/-2)
src/MatchResults.cc (+26/-0)
src/Matcher.cc (+57/-56)
src/SearchParameters.cc (+86/-0)
src/WordList.cc (+13/-0)
test/CAPITest.c (+2/-2)
test/CMakeLists.txt (+1/-0)
test/ErrorValuesTest.cc (+20/-2)
test/HelpersTest.cc (+2/-4)
test/MatchResultsTest.cc (+21/-0)
test/MatcherTest.cc (+3/-3)
test/ResultFilterTest.cc (+15/-19)
test/SearchParametersTest.cc (+86/-0)
test/pythontest.py (+6/-7)
tools/hudtest.cc (+3/-3)
tools/numberpad.cc (+1/-1)
tools/queryapp.cc (+1/-1)
tools/sctest.cc (+2/-2)
To merge this branch: bzr merge lp:~cyphermox/libcolumbus/abi1
Reviewer Review Type Date Requested Status
Jussi Pakkanen (community) Approve
PS Jenkins bot (community) continuous-integration Approve
Review via email: mp+179524@code.launchpad.net

Commit message

Packaging changes for ABI 1.

Description of the change

Packaging changes for ABI 1.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

Looks good. The only change I can see is that the pkg-config file needs to be renamed, because it will list the package name as "libcolumbus1" while everyone else calls it "libcolumbus".

The change is simple, just remove ${SO_VERSION_MAJOR} from include/CMakeLists.txt line 23.

review: Needs Fixing
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I disagree.

If you want to maintain this current behavior of allowing this libcolumbus1 coinstallable with a future libcolumbus2, then the pkgconfig file would need to also include the so major version in the file name. This is also coherent with the other installation paths.

It also appears to be as being coherent with other libaries installed in /usr/lib/x86_64-linux-gnu/pkgconfig on my system: libxml-2.0.pc, libpng12.pc.

Perhaps one missing part would instead be to make sure the version is always also included to the lib name for use with -l (the file name for the library) ?

Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

If we want parallel installability then the library name must have the ABI major version. If it does not, there will be a clash. Suppose you want to install both libcolumbus1-dev and libcolumbus2-dev at the same time. They would both try to grab the symlink libcolumbus.so to themselves, which would be a conflict. I guess we could fix that by putting a conflicts: libcolumbus2-dev to libcolumbus1-dev. You could still install either one but only one at a time.

lp:~cyphermox/libcolumbus/abi1 updated
454. By Mathieu Trudel-Lapierre

Remove soversion from pkconfig file name

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
lp:~cyphermox/libcolumbus/abi1 updated
455. By Mathieu Trudel-Lapierre

Also remove soversion from pkgconfig path for files to install

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

Looks fine by me. Thanks.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'CMakeLists.txt'
2--- CMakeLists.txt 2013-04-23 11:05:57 +0000
3+++ CMakeLists.txt 2013-08-13 21:28:18 +0000
4@@ -37,20 +37,21 @@
5 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fvisibility=hidden")
6 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fvisibility=hidden")
7
8-set(SO_VERSION_MAJOR "0")
9-set(SO_VERSION_MINOR "5")
10+set(SO_VERSION_MAJOR "1")
11+set(SO_VERSION_MINOR "0")
12 set(SO_VERSION_PATCH "0")
13
14 set(SO_VERSION "${SO_VERSION_MAJOR}.${SO_VERSION_MINOR}.${SO_VERSION_PATCH}")
15
16-set(COL_LIB_BASENAME "columbus${SO_VERSION_MAJOR}")
17+set(COL_LIB_BASENAME "columbus")
18
19 # Increment this manually whenever breaking ABI.
20 # http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html#AEN135
21-set(ABI_VERSION 0)
22+set(ABI_VERSION 1)
23
24+include(GNUInstallDirs)
25+set(LIBDIR ${CMAKE_INSTALL_LIBDIR})
26 # Set as cache variable so packaging can override.
27-set(LIBDIR "lib" CACHE PATH "Destination install dir for the library")
28 set(PYTHONDIR "lib/python3/dist-packages" CACHE PATH "Destination install dir for Python module")
29
30 include(TestBigEndian)
31
32=== removed file 'cmake/isclang.cc'
33--- cmake/isclang.cc 2012-12-07 11:01:33 +0000
34+++ cmake/isclang.cc 1970-01-01 00:00:00 +0000
35@@ -1,26 +0,0 @@
36-/*
37- * Copyright (C) 2012 Canonical, Ltd.
38- *
39- * Authors:
40- * Jussi Pakkanen <jussi.pakkanen@canonical.com>
41- *
42- * This library is free software; you can redistribute it and/or modify it under
43- * the terms of version 3 of the GNU Lesser General Public License as published
44- * by the Free Software Foundation.
45- *
46- * This library is distributed in the hope that it will be useful, but WITHOUT
47- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
48- * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
49- * details.
50- *
51- * You should have received a copy of the GNU Lesser General Public License
52- * along with this program. If not, see <http://www.gnu.org/licenses/>.
53- */
54-
55-int main(int argc, char **argv) {
56-#ifdef __clang__
57- return 1; // This gets assigned to a CMake variable so "1" means "true".
58-#else
59- return 0;
60-#endif
61-}
62
63=== modified file 'cmake/pch.cmake'
64--- cmake/pch.cmake 2012-12-21 10:32:38 +0000
65+++ cmake/pch.cmake 2013-08-13 21:28:18 +0000
66@@ -65,7 +65,8 @@
67 separate_arguments(compile_args)
68 add_custom_command(OUTPUT ${gch_filename}
69 COMMAND ${CMAKE_CXX_COMPILER} ${compile_args}
70- DEPENDS ${header_filename})
71+ DEPENDS ${header_filename}
72+ VERBATIM)
73 add_custom_target(${gch_target_name} DEPENDS ${gch_filename})
74 add_dependencies(${target_name} ${gch_target_name})
75
76@@ -77,7 +78,8 @@
77
78 endfunction()
79
80-try_run(IS_CLANG did_build ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_SOURCE_DIR}/cmake/isclang.cc)
81+include(CheckCXXSourceCompiles)
82+CHECK_CXX_SOURCE_COMPILES("#ifdef __clang__\n#else\n#error \"Not clang.\"\n#endif\nint main(int argc, char **argv) { return 0; }" IS_CLANG)
83
84 if(UNIX)
85 if(NOT APPLE)
86
87=== modified file 'cmake/python.cmake'
88--- cmake/python.cmake 2013-01-24 09:25:50 +0000
89+++ cmake/python.cmake 2013-08-13 21:28:18 +0000
90@@ -1,14 +1,20 @@
91 set(build_python FALSE)
92
93-find_package(Boost 1.49.0 COMPONENTS python)
94+# CMake's Boost.Python detector is completely and utterly
95+# broken. We have to do this manually.
96+#
97+# Upstream bug:
98+# http://public.kitware.com/Bug/view.php?id=12955
99+find_file(BP_HEADER boost/python.hpp)
100+
101 if(use_python2)
102 pkg_search_module(PYTHONLIBS python)
103 else()
104 pkg_search_module(PYTHONLIBS python3)
105 endif()
106
107-if(NOT Boost_FOUND)
108- message(STATUS "Boost not found, not building Python bindings.")
109+if(NOT BP_HEADER)
110+ message(STATUS "Boost.Python not found, not building Python bindings.")
111 else()
112 if(NOT PYTHONLIBS_FOUND)
113 message(STATUS "Python dev libraries not found, not building Python bindings.")
114@@ -19,11 +25,8 @@
115 if(NOT use_python2)
116 execute_process(COMMAND ${CMAKE_SOURCE_DIR}/cmake/pysoabi.py OUTPUT_VARIABLE pysoabi OUTPUT_STRIP_TRAILING_WHITESPACE)
117 endif()
118-
119- # Linking against libboost_python does not work with Python 3.
120- # Working around this bug:
121- # http://public.kitware.com/Bug/view.php?id=12955
122 find_library(BOOST_PYTHON_HACK boost_python-py${PYTHON_MAJOR}${PYTHON_MINOR})
123+
124 if(NOT BOOST_PYTHON_HACK)
125 message(STATUS "Boost.Python hack library not found, not building Python bindings")
126 else()
127
128=== modified file 'coding style.txt'
129--- coding style.txt 2012-06-07 11:48:28 +0000
130+++ coding style.txt 2013-08-13 21:28:18 +0000
131@@ -4,5 +4,6 @@
132 - indentation is 4 spaces, tabs are forbidden
133 - opening brace always on the same line
134 - class header files must be minimal
135- - no STL #includes because they slow down compilation massively
136+ - no STL #includes because they slow down compilation massively,
137+ the only exception is string, which is necessary for interoperation
138 - forward declarations instead of #includes
139
140=== modified file 'debian/changelog'
141--- debian/changelog 2013-06-06 05:33:21 +0000
142+++ debian/changelog 2013-08-13 21:28:18 +0000
143@@ -1,3 +1,9 @@
144+libcolumbus (1.0.0daily13.08.09-0ubuntu1) UNRELEASED; urgency=low
145+
146+ *
147+
148+ -- Mathieu Trudel-Lapierre <mathieu-tl@ubuntu.com> Fri, 09 Aug 2013 10:31:48 -0400
149+
150 libcolumbus (0.5.0daily13.06.06-0ubuntu1) saucy; urgency=low
151
152 [ Jussi Pakkanen ]
153
154=== modified file 'debian/control'
155--- debian/control 2013-02-11 20:12:45 +0000
156+++ debian/control 2013-08-13 21:28:18 +0000
157@@ -13,19 +13,21 @@
158 Homepage: https://launchpad.net/libcolombus
159 Vcs-Bzr: https://code.launchpad.net/~canonical-product-strategy/libcolumbus/trunk
160
161-Package: libcolumbus0-0
162+Package: libcolumbus1
163 Section: libs
164 Architecture: any
165+Multi-Arch: same
166 Pre-Depends: ${misc:Pre-Depends}
167-Depends: libcolumbus0-0-common (= ${source:Version}),
168+Depends: libcolumbus1-common (= ${source:Version}),
169 ${shlibs:Depends},
170 ${misc:Depends},
171 Description: error tolerant matching engine - shared library
172 Libcolumbus is a search engine designed to work with unclean data.
173
174-Package: libcolumbus0-0-common
175+Package: libcolumbus1-common
176 Section: libs
177 Architecture: all
178+Multi-Arch: foreign
179 Depends: ${shlibs:Depends},
180 ${misc:Depends},
181 Description: error tolerant matching engine - common files
182@@ -33,11 +35,12 @@
183 .
184 This package contains the common files to have the library working.
185
186-Package: libcolumbus0-dev
187+Package: libcolumbus1-dev
188 Section: libdevel
189 Architecture: any
190+Multi-Arch: same
191 Pre-Depends: ${misc:Pre-Depends}
192-Depends: libcolumbus0-0 (= ${binary:Version}),
193+Depends: libcolumbus1 (= ${binary:Version}),
194 ${misc:Depends},
195 Description: error tolerant matching engine - development files
196 Libcolumbus is a search engine designed to work with unclean data.
197@@ -48,7 +51,7 @@
198 Section: python
199 Architecture: any
200 Pre-Depends: ${misc:Pre-Depends}
201-Depends: libcolumbus0-0 (= ${binary:Version}),
202+Depends: libcolumbus1 (= ${binary:Version}),
203 ${misc:Depends},
204 ${shlibs:Depends},
205 Description: error tolerant matching engine - Python bindings
206
207=== renamed file 'debian/libcolumbus0-0-common.install' => 'debian/libcolumbus1-common.install'
208--- debian/libcolumbus0-0-common.install 2013-01-16 10:07:16 +0000
209+++ debian/libcolumbus1-common.install 2013-08-13 21:28:18 +0000
210@@ -1,1 +1,1 @@
211-usr/share/columbus0/*
212+usr/share/columbus1/*
213
214=== renamed file 'debian/libcolumbus0-dev.install' => 'debian/libcolumbus1-dev.install'
215--- debian/libcolumbus0-dev.install 2012-11-27 09:10:31 +0000
216+++ debian/libcolumbus1-dev.install 2013-08-13 21:28:18 +0000
217@@ -1,3 +1,3 @@
218-usr/include/columbus0/*
219+usr/include/columbus1/*
220 usr/lib/*/lib*.so
221 usr/lib/*/pkgconfig/*
222
223=== renamed file 'debian/libcolumbus0-0.install' => 'debian/libcolumbus1.install'
224=== modified file 'debian/rules'
225--- debian/rules 2013-06-05 14:29:17 +0000
226+++ debian/rules 2013-08-13 21:28:18 +0000
227@@ -16,7 +16,7 @@
228 dh $@ --parallel
229
230 override_dh_auto_configure:
231- dh_auto_configure -- -DLIBDIR=/usr/lib/$(DEB_HOST_MULTIARCH) -DCMAKE_BUILD_TYPE=''
232+ dh_auto_configure -- -DCMAKE_BUILD_TYPE=''
233
234 override_dh_install:
235 dh_install --fail-missing
236
237=== modified file 'include/CMakeLists.txt'
238--- include/CMakeLists.txt 2012-12-11 10:31:46 +0000
239+++ include/CMakeLists.txt 2013-08-13 21:28:18 +0000
240@@ -12,13 +12,13 @@
241 Document.hh
242 ColumbusHelpers.hh
243 IndexWeights.hh
244-DESTINATION include/${COL_LIB_BASENAME})
245+DESTINATION include/${COL_LIB_BASENAME}${SO_VERSION_MAJOR})
246
247 # Build and install a pkg-config file
248 set(prefix ${CMAKE_INSTALL_PREFIX})
249 set(exec_prefix ${prefix}/bin)
250 set(libdir ${prefix}/${LIBDIR})
251-set(includedir ${prefix}/include/${COL_LIB_BASENAME})
252+set(includedir ${prefix}/include/${COL_LIB_BASENAME}${SO_VERSION_MAJOR})
253 set(pkg-name "lib${COL_LIB_BASENAME}")
254-configure_file(libcolumbus.pc.in libcolumbus${SO_VERSION_MAJOR}.pc @ONLY)
255-install(FILES ${CMAKE_CURRENT_BINARY_DIR}/libcolumbus${SO_VERSION_MAJOR}.pc DESTINATION ${LIBDIR}/pkgconfig)
256+configure_file(libcolumbus.pc.in libcolumbus.pc @ONLY)
257+install(FILES ${CMAKE_CURRENT_BINARY_DIR}/libcolumbus.pc DESTINATION ${LIBDIR}/pkgconfig)
258
259=== modified file 'include/ColumbusCore.hh.in'
260--- include/ColumbusCore.hh.in 2013-04-03 13:50:54 +0000
261+++ include/ColumbusCore.hh.in 2013-08-13 21:28:18 +0000
262@@ -77,7 +77,7 @@
263 #define COLUMBUS_VERSION_STRING "${SO_VERSION}"
264 #define COLUMBUS_ABI_VERSION ${ABI_VERSION}
265 #define COLUMBUS_INSTALL_PREFIX "${CMAKE_INSTALL_PREFIX}"
266-#define COLUMBUS_DATADIR COLUMBUS_INSTALL_PREFIX "/share/${COL_LIB_BASENAME}/"
267+#define COLUMBUS_DATADIR COLUMBUS_INSTALL_PREFIX "/share/${COL_LIB_BASENAME}${SO_VERSION_MAJOR}/"
268
269 typedef ${LETTER_TYPE} Letter;
270 #define INTERNAL_ENCODING "${INTERNAL_ENCODING}"
271
272=== modified file 'include/ColumbusHelpers.hh'
273--- include/ColumbusHelpers.hh 2012-12-07 11:01:33 +0000
274+++ include/ColumbusHelpers.hh 2013-08-13 21:28:18 +0000
275@@ -30,8 +30,8 @@
276 Letter* utf8ToInternal(const char *utf8Text, unsigned int &resultStringSize);
277 void internalToUtf8(const Letter *source, unsigned int characters, char *buf, unsigned int bufsize);
278 COL_PUBLIC COL_PUBLIC double hiresTimestamp();
279-COL_PUBLIC void splitToWords(const char *utf8Text, WordList &list);
280-COL_PUBLIC void split(const char *utf8Text, WordList &list, const Letter *splitChars, int numChars);
281+COL_PUBLIC WordList splitToWords(const char *utf8Text);
282+COL_PUBLIC WordList split(const char *utf8Text, const Letter *splitChars, int numChars);
283 COL_PUBLIC bool isWhitespace(Letter l);
284
285 COL_NAMESPACE_END
286
287=== modified file 'include/Corpus.hh'
288--- include/Corpus.hh 2012-12-07 11:01:33 +0000
289+++ include/Corpus.hh 2013-08-13 21:28:18 +0000
290@@ -27,15 +27,15 @@
291 struct CorpusPrivate;
292 class Document;
293
294-class COL_PUBLIC Corpus {
295+class COL_PUBLIC Corpus final {
296 private:
297 CorpusPrivate *p;
298- Corpus(const Corpus &c);
299- const Corpus& operator=(const Corpus &c);
300
301 public:
302 Corpus();
303 ~Corpus();
304+ Corpus(const Corpus &c) = delete;
305+ const Corpus& operator=(const Corpus &c) = delete;
306
307 void addDocument(const Document &d);
308 size_t size() const;
309
310=== modified file 'include/Document.hh'
311--- include/Document.hh 2012-12-07 11:01:33 +0000
312+++ include/Document.hh 2013-08-13 21:28:18 +0000
313@@ -21,6 +21,7 @@
314 #define DOCUMENT_HH_
315
316 #include "ColumbusCore.hh"
317+#include<string>
318
319 COL_NAMESPACE_START
320
321@@ -29,7 +30,7 @@
322
323 struct DocumentPrivate;
324
325-class COL_PUBLIC Document {
326+class COL_PUBLIC Document final {
327 private:
328 DocumentPrivate *p;
329
330@@ -41,6 +42,7 @@
331 const Document& operator=(const Document &d);
332 void addText(const Word &field, const WordList &words);
333 void addText(const Word &field, const char *textAsUtf8);
334+ void addText(const Word &field, const std::string &textAsUtf8);
335 const WordList& getText(const Word &field) const;
336 size_t fieldCount() const;
337 DocumentID getID() const;
338
339=== modified file 'include/ErrorMatrix.hh'
340--- include/ErrorMatrix.hh 2012-12-07 11:01:33 +0000
341+++ include/ErrorMatrix.hh 2013-08-13 21:28:18 +0000
342@@ -34,13 +34,16 @@
343
344 COL_NAMESPACE_START
345
346-class ErrorMatrix {
347+class ErrorMatrix final {
348 size_t rows, columns;
349 int **m;
350
351 public:
352 ErrorMatrix(const size_t rows_, const size_t columns_, const int insertError, const int deletionError);
353 ~ErrorMatrix();
354+ ErrorMatrix(const ErrorMatrix &em) = delete;
355+ const ErrorMatrix & operator=(const ErrorMatrix &other) = delete;
356+
357
358 void set(const size_t rowNum, const size_t colNum, const int error);
359 // No bounds checking because this is in the hot path.
360
361=== modified file 'include/ErrorValues.hh'
362--- include/ErrorValues.hh 2012-12-07 11:01:33 +0000
363+++ include/ErrorValues.hh 2013-08-13 21:28:18 +0000
364@@ -24,10 +24,15 @@
365
366 COL_NAMESPACE_START
367
368+enum accentGroups {
369+ latinAccentGroup,
370+ greekAccentGroup,
371+};
372+
373 struct ErrorValuesPrivate;
374 class Word;
375
376-class COL_PUBLIC ErrorValues {
377+class COL_PUBLIC ErrorValues final {
378 private:
379 static const int DEFAULT_ERROR = 100;
380 static const int DEFAULT_GROUP_ERROR = 30;
381@@ -56,6 +61,7 @@
382
383 ErrorValues();
384 ~ErrorValues();
385+ const ErrorValues& operator=(const ErrorValues &other) = delete;
386
387 int getInsertionError() const { return insertionError; }
388 int getDeletionError() const { return deletionError; }
389@@ -81,10 +87,10 @@
390
391 void setError(Letter l1, Letter l2, const int error);
392 void setGroupError(const Word &groupLetters, const int error);
393- void addLatinAccents();
394+ void addAccents(accentGroups group);
395 void addKeyboardErrors();
396 void addNumberpadErrors();
397- void addStandardErrors() { addLatinAccents(); addKeyboardErrors(); }
398+ void addStandardErrors();
399 bool isInGroup(Letter l);
400 void clearErrors();
401 void setSubstringMode();
402
403=== modified file 'include/IndexMatches.hh'
404--- include/IndexMatches.hh 2012-12-07 11:01:33 +0000
405+++ include/IndexMatches.hh 2013-08-13 21:28:18 +0000
406@@ -34,7 +34,7 @@
407 * in growing error order.
408 *
409 */
410-class COL_PUBLIC IndexMatches {
411+class COL_PUBLIC IndexMatches final {
412 friend class LevenshteinIndex;
413
414 private:
415@@ -44,13 +44,11 @@
416 void addMatch(const Word &queryWord, const WordID matchedWord, int error);
417 void sort();
418
419- // Disable copy and assignment.
420- IndexMatches(const IndexMatches &other);
421- const IndexMatches & operator=(const IndexMatches &other);
422-
423 public:
424 IndexMatches();
425 ~IndexMatches();
426+ IndexMatches(const IndexMatches &other) = delete;
427+ const IndexMatches & operator=(const IndexMatches &other) = delete;
428
429 size_t size() const;
430 const WordID& getMatch(size_t num) const;
431
432=== modified file 'include/IndexWeights.hh'
433--- include/IndexWeights.hh 2012-12-07 11:01:33 +0000
434+++ include/IndexWeights.hh 2013-08-13 21:28:18 +0000
435@@ -27,11 +27,12 @@
436 struct IndexWeightsPrivate;
437 class Word;
438
439-class COL_PUBLIC IndexWeights {
440+class COL_PUBLIC IndexWeights final {
441 IndexWeightsPrivate *p;
442 public:
443 IndexWeights();
444 ~IndexWeights();
445+ const IndexWeights & operator=(const IndexWeights &other) = delete;
446
447 void setWeight(const Word &w, double weigth);
448 double getWeight(const Word &w) const;
449
450=== modified file 'include/LevenshteinIndex.hh'
451--- include/LevenshteinIndex.hh 2013-01-29 09:36:25 +0000
452+++ include/LevenshteinIndex.hh 2013-08-13 21:28:18 +0000
453@@ -20,7 +20,6 @@
454 #ifndef LEVENSHTEININDEX_HH
455 #define LEVENSHTEININDEX_HH
456
457-#include <vector>
458 #include "ColumbusCore.hh"
459 #include "IndexMatches.hh"
460
461@@ -32,7 +31,7 @@
462 class Word;
463 class ErrorValues;
464
465-class COL_PUBLIC LevenshteinIndex {
466+class COL_PUBLIC LevenshteinIndex final {
467 private:
468 LevenshteinIndexPrivate *p;
469
470@@ -40,15 +39,14 @@
471 const Letter letter, const Letter previousLetter, const size_t depth, ErrorMatrix &em,
472 IndexMatches &matches, const int max_error) const;
473
474- // Disable copy and move.
475- LevenshteinIndex(const LevenshteinIndex &other);
476- LevenshteinIndex& operator=(const LevenshteinIndex &other);
477 int findOptimalError(const Letter letter, const Letter previousLetter, const Word &query,
478 const size_t i, const size_t depth, const ErrorMatrix &em, const ErrorValues &e) const;
479
480 public:
481 LevenshteinIndex();
482 ~LevenshteinIndex();
483+ LevenshteinIndex(const LevenshteinIndex &other) = delete;
484+ const LevenshteinIndex & operator=(const LevenshteinIndex &other) = delete;
485
486 static int getDefaultError();
487
488
489=== modified file 'include/MatchResults.hh'
490--- include/MatchResults.hh 2012-12-07 11:01:33 +0000
491+++ include/MatchResults.hh 2013-08-13 21:28:18 +0000
492@@ -27,7 +27,7 @@
493 struct MatchResultsPrivate;
494 class Word;
495
496-class COL_PUBLIC MatchResults {
497+class COL_PUBLIC MatchResults final {
498 MatchResultsPrivate *p;
499
500 void sortIfRequired() const;
501@@ -35,6 +35,11 @@
502 public:
503 MatchResults();
504 ~MatchResults();
505+ MatchResults(const MatchResults &other);
506+ MatchResults(MatchResults &&other);
507+
508+ const MatchResults& operator=(MatchResults &&other);
509+ const MatchResults& operator=(const MatchResults &other);
510
511 void addResult(DocumentID docID, double relevancy);
512 void addResults(const MatchResults &r);
513
514=== modified file 'include/Matcher.hh'
515--- include/Matcher.hh 2013-01-30 14:17:07 +0000
516+++ include/Matcher.hh 2013-08-13 21:28:18 +0000
517@@ -21,6 +21,7 @@
518 #define MATCHER_HH_
519
520 #include "ColumbusCore.hh"
521+#include<string>
522
523 COL_NAMESPACE_START
524
525@@ -33,22 +34,29 @@
526 class ErrorValues;
527 class IndexWeights;
528 class ResultFilter;
529+class SearchParameters;
530
531-class COL_PUBLIC Matcher {
532+class COL_PUBLIC Matcher final {
533 private:
534 MatcherPrivate *p;
535
536 void buildIndexes(const Corpus &c);
537 void addToIndex(const Word &word, const WordID wordID, const WordID indexID);
538- void matchWithRelevancy(const WordList &query, const bool dynamicError, const int extraError, MatchResults &matchedDocuments);
539+ void matchWithRelevancy(const WordList &query, const SearchParameters &params, const int extraError, MatchResults &matchedDocuments);
540
541 public:
542 Matcher();
543 ~Matcher();
544-
545- void match(const WordList &query, MatchResults &matchedDocuments);
546- void match(const char *queryAsUtf8, MatchResults &matchedDocuments);
547- void match(const char *queryAsUtf8, MatchResults &matchedDocuments, const ResultFilter &filter);
548+ Matcher& operator=(const Matcher &m) = delete;
549+
550+ // The simple API
551+ MatchResults match(const char *queryAsUtf8);
552+ MatchResults match(const WordList &query);
553+ MatchResults match(const std::string &queryAsUtf8);
554+
555+ // When you want to specify search parameters exactly.
556+ MatchResults match(const char *queryAsUtf8, const SearchParameters &params);
557+ MatchResults match(const WordList &query, const SearchParameters &params);
558 void index(const Corpus &c);
559 ErrorValues& getErrorValues();
560 IndexWeights& getIndexWeights();
561
562=== modified file 'include/MatcherStatistics.hh'
563--- include/MatcherStatistics.hh 2012-12-07 11:01:33 +0000
564+++ include/MatcherStatistics.hh 2013-08-13 21:28:18 +0000
565@@ -27,7 +27,7 @@
566 struct MatcherStatisticsPrivate;
567 class Word;
568
569-class MatcherStatistics {
570+class MatcherStatistics final {
571 private:
572
573 MatcherStatisticsPrivate *p;
574
575=== modified file 'include/ResultFilter.hh'
576--- include/ResultFilter.hh 2012-12-07 11:01:33 +0000
577+++ include/ResultFilter.hh 2013-08-13 21:28:18 +0000
578@@ -27,7 +27,7 @@
579 struct ResultFilterPrivate;
580 class Word;
581
582-class COL_PUBLIC ResultFilter {
583+class COL_PUBLIC ResultFilter final {
584 private:
585
586 ResultFilterPrivate *p;
587@@ -35,6 +35,8 @@
588 public:
589 ResultFilter();
590 ~ResultFilter();
591+ ResultFilter(const ResultFilter &rf) = delete;
592+ const ResultFilter & operator=(const ResultFilter &other) = delete;
593
594 void addNewTerm();
595 void addNewSubTerm(const Word &field, const Word &word);
596
597=== added file 'include/SearchParameters.hh'
598--- include/SearchParameters.hh 1970-01-01 00:00:00 +0000
599+++ include/SearchParameters.hh 2013-08-13 21:28:18 +0000
600@@ -0,0 +1,53 @@
601+/*
602+ * Copyright (C) 2013 Canonical, Ltd.
603+ *
604+ * Authors:
605+ * Jussi Pakkanen <jussi.pakkanen@canonical.com>
606+ *
607+ * This library is free software; you can redistribute it and/or modify it under
608+ * the terms of version 3 of the GNU Lesser General Public License as published
609+ * by the Free Software Foundation.
610+ *
611+ * This library is distributed in the hope that it will be useful, but WITHOUT
612+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
613+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
614+ * details.
615+ *
616+ * You should have received a copy of the GNU Lesser General Public License
617+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
618+ */
619+
620+#ifndef SEARCHPARAMETERS_H_
621+#define SEARCHPARAMETERS_H_
622+
623+#include "ColumbusCore.hh"
624+
625+COL_NAMESPACE_START
626+
627+struct SearchParametersPrivate;
628+class Word;
629+class ResultFilter;
630+
631+class COL_PUBLIC SearchParameters final {
632+private:
633+ SearchParametersPrivate *p;
634+
635+public:
636+ SearchParameters();
637+ ~SearchParameters();
638+ SearchParameters & operator=(const SearchParameters &other) = delete;
639+
640+ bool isDynamic() const;
641+ void setDynamic(bool dyn);
642+ int getDynamicError(const Word &w) const;
643+ ResultFilter& getResultFilter();
644+ const ResultFilter& getResultFilter() const;
645+
646+ void addNonsearchingField(const Word &w);
647+ bool isNonsearchingField(const Word &w) const;
648+
649+ int looseningIterations() const;
650+};
651+
652+COL_NAMESPACE_END
653+#endif
654
655=== modified file 'include/Trie.hh'
656--- include/Trie.hh 2013-01-31 10:23:45 +0000
657+++ include/Trie.hh 2013-08-13 21:28:18 +0000
658@@ -27,7 +27,7 @@
659 struct TriePrivate;
660 class Word;
661
662-class COL_PUBLIC Trie {
663+class COL_PUBLIC Trie final {
664 private:
665 TriePrivate *p;
666 void expand();
667@@ -38,6 +38,9 @@
668 public:
669 Trie();
670 ~Trie();
671+ Trie(const Trie &other) = delete;
672+ const Trie & operator=(const Trie &other) = delete;
673+
674
675 bool hasWord(const Word &word) const;
676 TrieOffset findWord(const Word &word) const;
677
678=== modified file 'include/Word.hh'
679--- include/Word.hh 2013-01-31 09:26:44 +0000
680+++ include/Word.hh 2013-08-13 21:28:18 +0000
681@@ -31,7 +31,7 @@
682 *
683 * A word's contents are immutable.
684 */
685-class COL_PUBLIC Word {
686+class COL_PUBLIC Word final {
687 private:
688
689 Letter *text; // Change this to a shared pointer to save memory.
690
691=== modified file 'include/WordList.hh'
692--- include/WordList.hh 2012-12-07 11:01:33 +0000
693+++ include/WordList.hh 2013-08-13 21:28:18 +0000
694@@ -27,18 +27,20 @@
695 struct WordListPrivate;
696 class Word;
697
698-class COL_PUBLIC WordList {
699+class COL_PUBLIC WordList final {
700 private:
701 WordListPrivate *p;
702
703 public:
704 WordList();
705 WordList(const WordList &wl);
706+ WordList(WordList &&wl);
707 ~WordList();
708
709 size_t size() const;
710 const Word& operator[](const size_t i) const;
711 const WordList& operator=(const WordList &l);
712+ const WordList& operator=(WordList &&wl);
713 bool operator==(const WordList &l) const;
714 bool operator!=(const WordList &l) const;
715 void addWord(const Word &w); // This is more of an implementation detail and should not be exposed in a base class or interface.
716
717=== modified file 'include/WordStore.hh'
718--- include/WordStore.hh 2013-01-31 10:23:45 +0000
719+++ include/WordStore.hh 2013-08-13 21:28:18 +0000
720@@ -36,7 +36,7 @@
721 struct WordStorePrivate;
722 class Word;
723
724-class COL_PUBLIC WordStore {
725+class COL_PUBLIC WordStore final {
726 private:
727
728 WordStorePrivate *p;
729@@ -44,6 +44,9 @@
730 public:
731 WordStore();
732 ~WordStore();
733+ WordStore(const WordStore &other) = delete;
734+ const WordStore & operator=(const WordStore &other) = delete;
735+
736
737 WordID getID(const Word &w);
738 Word getWord(const WordID id) const;
739
740=== modified file 'include/columbus.h'
741--- include/columbus.h 2013-01-08 12:55:36 +0000
742+++ include/columbus.h 2013-08-13 21:28:18 +0000
743@@ -56,7 +56,7 @@
744 COL_PUBLIC ColMatcher col_matcher_new();
745 COL_PUBLIC void col_matcher_delete(ColMatcher m);
746 COL_PUBLIC void col_matcher_index(ColMatcher m, ColCorpus c);
747-COL_PUBLIC void col_matcher_match(ColMatcher m, const char *query_as_utf8, ColMatchResults mr);
748+COL_PUBLIC ColMatchResults col_matcher_match(ColMatcher m, const char *query_as_utf8);
749 COL_PUBLIC ColErrorValues col_matcher_get_error_values(ColMatcher m);
750 COL_PUBLIC ColIndexWeights col_matcher_get_index_weights(ColMatcher m);
751
752
753=== modified file 'python/CMakeLists.txt'
754--- python/CMakeLists.txt 2013-01-24 09:25:50 +0000
755+++ python/CMakeLists.txt 2013-08-13 21:28:18 +0000
756@@ -2,16 +2,15 @@
757 include_directories(${PYTHONLIBS_INCLUDE_DIRS})
758
759 if(use_python2)
760- set(python_lib_name "_columbus")
761+ set(python_lib_name "columbus")
762 else()
763- set(python_lib_name "_columbus.${pysoabi}")
764+ set(python_lib_name "columbus.${pysoabi}")
765 endif()
766
767-add_library(_columbus_ext SHARED _columbus.cc)
768-target_link_libraries(_columbus_ext ${COL_LIB_BASENAME} ${BOOST_PYTHON_HACK} ${PYTHONLIBS_LIBRARIES})
769-set_target_properties(_columbus_ext PROPERTIES OUTPUT_NAME ${python_lib_name} PREFIX "")
770-
771-add_pch(pch/colpython_pch.hh _columbus_ext)
772-
773-install(TARGETS _columbus_ext DESTINATION ${PYTHONDIR})
774-install(FILES columbus.py DESTINATION ${PYTHONDIR})
775+add_library(columbus_ext SHARED columbus.cc)
776+target_link_libraries(columbus_ext ${COL_LIB_BASENAME} ${BOOST_PYTHON_HACK} ${PYTHONLIBS_LIBRARIES})
777+set_target_properties(columbus_ext PROPERTIES OUTPUT_NAME ${python_lib_name} PREFIX "")
778+
779+add_pch(pch/colpython_pch.hh columbus_ext)
780+
781+install(TARGETS columbus_ext DESTINATION ${PYTHONDIR})
782
783=== renamed file 'python/_columbus.cc' => 'python/columbus.cc'
784--- python/_columbus.cc 2013-01-23 13:50:20 +0000
785+++ python/columbus.cc 2013-08-13 21:28:18 +0000
786@@ -24,11 +24,10 @@
787 using namespace Columbus;
788
789
790-void (Document::*addAdaptor) (const Word &, const WordList &) = &Document::addText;
791-void (Matcher::*queryAdaptor) (const WordList &, MatchResults &) = &Matcher::match;
792+void (Document::*addAdaptor) (const Word &, const std::string &) = &Document::addText;
793+MatchResults (Matcher::*queryAdaptor) (const std::string &) = &Matcher::match;
794
795-BOOST_PYTHON_MODULE(_columbus)
796-{
797+BOOST_PYTHON_MODULE(columbus) {
798 class_<Corpus, boost::noncopyable>("Corpus", init<>())
799 .def("size", &Corpus::size)
800 .def("add_document", &Corpus::addDocument)
801@@ -48,7 +47,7 @@
802 .def("add_word", &WordList::addWord)
803 ;
804
805- def("_split_to_words", splitToWords);
806+ def("split_to_words", splitToWords);
807
808 class_<Document>("Document", init<DocumentID>())
809 .def(init<const Document&>())
810@@ -75,13 +74,16 @@
811 return_internal_reference<>())
812 ;
813
814- class_<ErrorValues>("ErrorValues")
815+ class_<ErrorValues>("ErrorValues", init<>())
816 .def("add_standard_errors", &ErrorValues::addStandardErrors)
817 .def("set_substring_mode", &ErrorValues::setSubstringMode)
818+ .def("set_end_deletion_error", &ErrorValues::setEndDeletionError)
819 .def("set_error", &ErrorValues::setError)
820 .def("get_substitute_error", &ErrorValues::getSubstituteError)
821 .def("get_default_error", &ErrorValues::getDefaultError)
822 .staticmethod("get_default_error")
823+ .def("get_substring_default_end_deletion_error", &ErrorValues::getSubstringDefaultEndDeletionError)
824+ .staticmethod("get_substring_default_end_deletion_error")
825 .def("clear_errors", &ErrorValues::clearErrors)
826 ;
827
828
829=== removed file 'python/columbus.py'
830--- python/columbus.py 2012-12-11 14:45:07 +0000
831+++ python/columbus.py 1970-01-01 00:00:00 +0000
832@@ -1,28 +0,0 @@
833-#!/usr/bin/python3 -tt
834-# -*- coding: utf-8 -*-
835-
836-# Copyright (C) 2012 Canonical, Ltd.
837-
838-# Authors:
839-# Jussi Pakkanen <jussi.pakkanen@canonical.com>
840-
841-# This library is free software; you can redistribute it and/or modify it under
842-# the terms of version 3 of the GNU Lesser General Public License as published
843-# by the Free Software Foundation.
844-
845-# This library is distributed in the hope that it will be useful, but WITHOUT
846-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
847-# FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
848-# details.
849-
850-# You should have received a copy of the GNU Lesser General Public License
851-# along with this program. If not, see <http://www.gnu.org/licenses/>.
852-
853-from _columbus import Corpus, Word, WordList, _split_to_words, Document, \
854-MatchResults, Matcher, ErrorValues, IndexWeights
855-
856-def split_to_words(text):
857- list = WordList()
858- _split_to_words(text, list)
859- return list
860-
861
862=== modified file 'share/CMakeLists.txt'
863--- share/CMakeLists.txt 2012-11-26 10:25:57 +0000
864+++ share/CMakeLists.txt 2013-08-13 21:28:18 +0000
865@@ -1,3 +1,4 @@
866 install(FILES
867 latinAccentedLetterGroups.txt
868-DESTINATION share/${COL_LIB_BASENAME})
869+greekAccentedLetterGroups.txt
870+DESTINATION share/${COL_LIB_BASENAME}${SO_VERSION_MAJOR})
871
872=== added file 'share/greekAccentedLetterGroups.txt'
873--- share/greekAccentedLetterGroups.txt 1970-01-01 00:00:00 +0000
874+++ share/greekAccentedLetterGroups.txt 2013-08-13 21:28:18 +0000
875@@ -0,0 +1,7 @@
876+αά
877+εέ
878+ηή
879+ιίϊΐ
880+οό
881+υύϋΰ
882+ωώ
883
884=== modified file 'src/CMakeLists.txt'
885--- src/CMakeLists.txt 2013-01-28 11:21:16 +0000
886+++ src/CMakeLists.txt 2013-08-13 21:28:18 +0000
887@@ -17,6 +17,7 @@
888 ErrorMatrix.cc
889 ResultFilter.cc
890 Trie.cc
891+SearchParameters.cc
892 )
893
894 if(ICONV_LIBRARIES)
895
896=== modified file 'src/ColumbusCAPI.cc'
897--- src/ColumbusCAPI.cc 2013-01-17 09:17:28 +0000
898+++ src/ColumbusCAPI.cc 2013-08-13 21:28:18 +0000
899@@ -133,13 +133,15 @@
900 }
901 }
902
903-void col_matcher_match(ColMatcher m, const char *query_as_utf8, ColMatchResults mr) {
904+ColMatchResults col_matcher_match(ColMatcher m, const char *query_as_utf8) {
905 try {
906 Matcher *matcher = reinterpret_cast<Matcher*>(m);
907- MatchResults *results = reinterpret_cast<MatchResults*>(mr);
908- matcher->match(query_as_utf8, *results);
909+ MatchResults *results =
910+ new MatchResults(matcher->match(query_as_utf8));
911+ return reinterpret_cast<ColMatchResults>(results);
912 } catch(exception &e) {
913 fprintf(stderr, "Exception when matching: %s\n", e.what());
914+ return nullptr;
915 }
916 }
917
918
919=== modified file 'src/ColumbusHelpers.cc'
920--- src/ColumbusHelpers.cc 2013-01-21 14:22:59 +0000
921+++ src/ColumbusHelpers.cc 2013-08-13 21:28:18 +0000
922@@ -125,8 +125,8 @@
923
924 }
925
926-void splitToWords(const char *utf8Text, WordList &list) {
927- split(utf8Text, list, whitespaceLetters, numWhitespaceLetters);
928+WordList splitToWords(const char *utf8Text) {
929+ return split(utf8Text, whitespaceLetters, numWhitespaceLetters);
930 }
931
932 static bool isInList(const Letter l, const Letter *chars, int numChars) {
933@@ -136,7 +136,8 @@
934 return false;
935 }
936
937-void split(const char *utf8Text, WordList &list, const Letter *splitChars, int numChars) {
938+WordList split(const char *utf8Text, const Letter *splitChars, int numChars) {
939+ WordList list;
940 unsigned int strSize = strlen(utf8Text);
941 size_t begin, end;
942 end = 0;
943@@ -150,7 +151,7 @@
944 }
945 if(begin >= strSize) {
946 delete []word;
947- return;
948+ return list;
949 }
950 end = begin+1;
951 while(!isInList(utf8Text[end], splitChars, numChars) && end < strSize) {
952@@ -174,6 +175,7 @@
953 }
954 } while(end < strSize);
955 delete []word;
956+ return list;
957 }
958
959 bool isWhitespace(Letter l) {
960
961=== modified file 'src/Document.cc'
962--- src/Document.cc 2012-12-07 11:01:33 +0000
963+++ src/Document.cc 2013-08-13 21:28:18 +0000
964@@ -55,9 +55,11 @@
965 }
966
967 void Document::addText(const Word &field, const char *textAsUtf8) {
968- WordList l;
969- splitToWords(textAsUtf8, l);
970- addText(field, l);
971+ addText(field, splitToWords(textAsUtf8));
972+}
973+
974+void Document::addText(const Word &field, const std::string &textAsUtf8) {
975+ addText(field, textAsUtf8.c_str());
976 }
977
978 const WordList& Document::getText(const Word &field) const {
979
980=== modified file 'src/ErrorValues.cc'
981--- src/ErrorValues.cc 2013-01-11 15:18:30 +0000
982+++ src/ErrorValues.cc 2013-08-13 21:28:18 +0000
983@@ -35,6 +35,9 @@
984 COL_NAMESPACE_START
985 using namespace std;
986
987+static const char *accentGroupDataFile[] = {"latinAccentedLetterGroups.txt",
988+ "greekAccentedLetterGroups.txt"};
989+
990 const int LUT_BITS = 9;
991 const int LUT_LETTERS = 1 << LUT_BITS;
992 const int LUT_SIZE = (LUT_LETTERS*LUT_LETTERS);
993@@ -161,8 +164,8 @@
994 return p->groupMap.find(l) != p->groupMap.end();
995 }
996
997-void ErrorValues::addLatinAccents() {
998- const char *baseName = "latinAccentedLetterGroups.txt";
999+void ErrorValues::addAccents(accentGroups group) {
1000+ const char *baseName = accentGroupDataFile[group];
1001 string dataFile = findDataFile(baseName);
1002 string line;
1003 if(dataFile.length() == 0) {
1004@@ -257,6 +260,11 @@
1005 }
1006 }
1007
1008+void ErrorValues::addStandardErrors() {
1009+ addAccents(latinAccentGroup);
1010+ addAccents(greekAccentGroup);
1011+ addKeyboardErrors();
1012+}
1013
1014 void ErrorValues::addToLUT(Letter l1, Letter l2, int value) {
1015 if(l1 < LUT_LETTERS && l2 < LUT_LETTERS) {
1016
1017=== modified file 'src/MatchResults.cc'
1018--- src/MatchResults.cc 2012-12-07 11:01:33 +0000
1019+++ src/MatchResults.cc 2013-08-13 21:28:18 +0000
1020@@ -36,10 +36,36 @@
1021 p->sorted = true;;
1022 }
1023
1024+MatchResults::MatchResults(const MatchResults &other) {
1025+ p = new MatchResultsPrivate();
1026+ *p = *other.p;
1027+}
1028+
1029+MatchResults::MatchResults(MatchResults &&other) {
1030+ p = other.p;
1031+ other.p = nullptr;
1032+}
1033+
1034 MatchResults::~MatchResults() {
1035 delete p;
1036 }
1037
1038+const MatchResults& MatchResults::operator=(MatchResults &&other) {
1039+ if(this != &other) {
1040+ delete p;
1041+ p = other.p;
1042+ other.p = nullptr;
1043+ }
1044+ return *this;
1045+}
1046+
1047+const MatchResults& MatchResults::operator=(const MatchResults &other) {
1048+ if(this != &other) {
1049+ *p = *other.p;
1050+ }
1051+ return *this;
1052+}
1053+
1054 void MatchResults::addResult(DocumentID id, double relevancy) {
1055 pair<double, DocumentID> n;
1056 n.first = relevancy;
1057
1058=== modified file 'src/Matcher.cc'
1059--- src/Matcher.cc 2013-04-16 09:02:17 +0000
1060+++ src/Matcher.cc 2013-08-13 21:28:18 +0000
1061@@ -31,6 +31,7 @@
1062 #include "MatcherStatistics.hh"
1063 #include "WordStore.hh"
1064 #include "ResultFilter.hh"
1065+#include "SearchParameters.hh"
1066 #include <cassert>
1067 #include <stdexcept>
1068 #include <map>
1069@@ -138,20 +139,6 @@
1070 }
1071
1072 /*
1073- * Long words should allow for more error than short ones.
1074- * This is a simple function which is meant to be strict
1075- * so there won't be too many matches.
1076- */
1077-
1078-static int getDynamicError(const Word &w) {
1079- size_t len = w.length();
1080- if(len < 2)
1081- return LevenshteinIndex::getDefaultError();
1082- else
1083- return 2*LevenshteinIndex::getDefaultError();
1084-}
1085-
1086-/*
1087 * These are helper functions for Matcher. They are not member functions to avoid polluting the header
1088 * with STL includes.
1089 */
1090@@ -197,17 +184,20 @@
1091 }
1092
1093
1094-static void matchIndexes(MatcherPrivate *p, const WordList &query, const bool dynamicError, const int extraError, BestIndexMatches &bestIndexMatches) {
1095+static void matchIndexes(MatcherPrivate *p, const WordList &query, const SearchParameters &params, const int extraError, BestIndexMatches &bestIndexMatches) {
1096 for(size_t i=0; i<query.size(); i++) {
1097 const Word &w = query[i];
1098 int maxError;
1099- if(dynamicError)
1100- maxError = getDynamicError(w);
1101+ if(params.isDynamic())
1102+ maxError = params.getDynamicError(w);
1103 else
1104 maxError = 2*LevenshteinIndex::getDefaultError();
1105 maxError += extraError;
1106
1107 for(IndIterator it = p->indexes.begin(); it != p->indexes.end(); it++) {
1108+ if(params.isNonsearchingField(p->store.getWord(it->first))) {
1109+ continue;
1110+ }
1111 IndexMatches m;
1112 it->second->findWords(w, p->e, maxError, m);
1113 addMatches(p, bestIndexMatches, w, it->first, m);
1114@@ -249,6 +239,19 @@
1115 }
1116 }
1117
1118+static bool subtermsMatch(MatcherPrivate *p, const ResultFilter &filter, size_t term, DocumentID id) {
1119+ for(size_t subTerm=0; subTerm < filter.numSubTerms(term); subTerm++) {
1120+ const Word &filterName = filter.getField(term, subTerm);
1121+ const Word &value = filter.getWord(term, subTerm);
1122+ bool termFound = p->reverseIndex.documentHasTerm(
1123+ p->store.getID(value), p->store.getID(filterName), id);
1124+ if(!termFound) {
1125+ return false;
1126+ }
1127+ }
1128+ return true;
1129+}
1130+
1131 Matcher::Matcher() {
1132 p = new MatcherPrivate();
1133 }
1134@@ -308,13 +311,13 @@
1135 }
1136
1137
1138-void Matcher::matchWithRelevancy(const WordList &query, const bool dynamicError, const int extraError, MatchResults &matchedDocuments) {
1139+void Matcher::matchWithRelevancy(const WordList &query, const SearchParameters &params, const int extraError, MatchResults &matchedDocuments) {
1140 map<DocumentID, double> docs;
1141 BestIndexMatches bestIndexMatches;
1142 double start, indexMatchEnd, gatherEnd, finish;
1143
1144 start = hiresTimestamp();
1145- matchIndexes(p, query, dynamicError, extraError, bestIndexMatches);
1146+ matchIndexes(p, query, params, extraError, bestIndexMatches);
1147 indexMatchEnd = hiresTimestamp();
1148 // Now we know all matched words in all indexes. Gather up the corresponding documents.
1149 gatherMatchedDocuments(p, bestIndexMatches, docs);
1150@@ -328,54 +331,29 @@
1151 indexMatchEnd - start, gatherEnd - indexMatchEnd, finish - gatherEnd);
1152 }
1153
1154-void Matcher::match(const WordList &query, MatchResults &matchedDocuments) {
1155+MatchResults Matcher::match(const WordList &query, const SearchParameters &params) {
1156+ MatchResults matchedDocuments;
1157 const int maxIterations = 1;
1158 const int increment = LevenshteinIndex::getDefaultError();
1159 const size_t minMatches = 10;
1160 WordList expandedQuery;
1161+ MatchResults allMatches;
1162
1163 if(query.size() == 0)
1164- return;
1165+ return matchedDocuments;
1166 expandQuery(query, expandedQuery);
1167 // Try to search with ever growing error until we find enough matches.
1168 for(int i=0; i<maxIterations; i++) {
1169 MatchResults matches;
1170- matchWithRelevancy(expandedQuery, true, i*increment, matches);
1171+ matchWithRelevancy(expandedQuery, params, i*increment, matches);
1172 if(matches.size() >= minMatches || i == maxIterations-1) {
1173- matchedDocuments.addResults(matches);
1174- return;
1175- }
1176- }
1177-
1178-}
1179-
1180-void Matcher::match(const char *queryAsUtf8, MatchResults &matchedDocuments) {
1181- WordList l;
1182- splitToWords(queryAsUtf8, l);
1183- match(l, matchedDocuments);
1184-}
1185-
1186-ErrorValues& Matcher::getErrorValues() {
1187- return p->e;
1188-}
1189-
1190-static bool subtermsMatch(MatcherPrivate *p, const ResultFilter &filter, size_t term, DocumentID id) {
1191- for(size_t subTerm=0; subTerm < filter.numSubTerms(term); subTerm++) {
1192- const Word &filterName = filter.getField(term, subTerm);
1193- const Word &value = filter.getWord(term, subTerm);
1194- bool termFound = p->reverseIndex.documentHasTerm(
1195- p->store.getID(value), p->store.getID(filterName), id);
1196- if(!termFound) {
1197- return false;
1198- }
1199- }
1200- return true;
1201-
1202-}
1203-
1204-void Matcher::match(const char *queryAsUtf8, MatchResults &matchedDocuments, const ResultFilter &filter) {
1205- MatchResults allMatches;
1206- match(queryAsUtf8, allMatches);
1207+ allMatches.addResults(matches);
1208+ break;
1209+ }
1210+ }
1211+
1212+ /* Filter results into final set. */
1213+ auto &filter = params.getResultFilter();
1214 for(size_t i=0; i<allMatches.size(); i++) {
1215 DocumentID id = allMatches.getDocumentID(i);
1216 for(size_t term=0; term < filter.numTerms(); term++) {
1217@@ -385,6 +363,29 @@
1218 }
1219 }
1220 }
1221+ return matchedDocuments;
1222+}
1223+
1224+MatchResults Matcher::match(const char *queryAsUtf8) {
1225+ return match(splitToWords(queryAsUtf8));
1226+}
1227+
1228+MatchResults Matcher::match(const std::string &queryAsUtf8) {
1229+ return match(queryAsUtf8.c_str());
1230+}
1231+
1232+
1233+MatchResults Matcher::match(const WordList &query) {
1234+ SearchParameters defaults;
1235+ return match(query, defaults);
1236+}
1237+
1238+ErrorValues& Matcher::getErrorValues() {
1239+ return p->e;
1240+}
1241+
1242+MatchResults Matcher::match(const char *queryAsUtf8, const SearchParameters &params) {
1243+ return match(splitToWords(queryAsUtf8), params);
1244 }
1245
1246 IndexWeights& Matcher::getIndexWeights() {
1247
1248=== added file 'src/SearchParameters.cc'
1249--- src/SearchParameters.cc 1970-01-01 00:00:00 +0000
1250+++ src/SearchParameters.cc 2013-08-13 21:28:18 +0000
1251@@ -0,0 +1,86 @@
1252+/*
1253+ * Copyright (C) 2013 Canonical, Ltd.
1254+ *
1255+ * Authors:
1256+ * Jussi Pakkanen <jussi.pakkanen@canonical.com>
1257+ *
1258+ * This library is free software; you can redistribute it and/or modify it under
1259+ * the terms of version 3 of the GNU Lesser General Public License as published
1260+ * by the Free Software Foundation.
1261+ *
1262+ * This library is distributed in the hope that it will be useful, but WITHOUT
1263+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
1264+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
1265+ * details.
1266+ *
1267+ * You should have received a copy of the GNU Lesser General Public License
1268+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
1269+ */
1270+
1271+#include"SearchParameters.hh"
1272+#include"Word.hh"
1273+#include"LevenshteinIndex.hh"
1274+#include"ResultFilter.hh"
1275+#include<set>
1276+
1277+COL_NAMESPACE_START
1278+
1279+using namespace std;
1280+struct SearchParametersPrivate {
1281+ bool dynamic;
1282+ ResultFilter filter;
1283+ set<Word> nosearchFields;
1284+};
1285+
1286+SearchParameters::SearchParameters() {
1287+ p = new SearchParametersPrivate();
1288+ p->dynamic = true;
1289+}
1290+
1291+SearchParameters::~SearchParameters() {
1292+ delete p;
1293+}
1294+
1295+bool SearchParameters::isDynamic() const {
1296+ return p->dynamic;
1297+}
1298+void SearchParameters::setDynamic(bool dyn) {
1299+ p->dynamic = dyn;
1300+}
1301+
1302+/*
1303+ * Long words should allow for more error than short ones.
1304+ * This is a simple function which is meant to be strict
1305+ * so there won't be too many matches.
1306+ */
1307+
1308+int SearchParameters::getDynamicError(const Word &w) const {
1309+ size_t len = w.length();
1310+ if(len < 2)
1311+ return LevenshteinIndex::getDefaultError();
1312+ else
1313+ return 2*LevenshteinIndex::getDefaultError();
1314+}
1315+
1316+ResultFilter& SearchParameters::getResultFilter() {
1317+ return p->filter;
1318+}
1319+
1320+const ResultFilter& SearchParameters::getResultFilter() const {
1321+ return p->filter;
1322+}
1323+
1324+void SearchParameters::addNonsearchingField(const Word &w) {
1325+ p->nosearchFields.insert(w);
1326+}
1327+
1328+bool SearchParameters::isNonsearchingField(const Word &w) const {
1329+ return p->nosearchFields.find(w) != p->nosearchFields.end();
1330+}
1331+
1332+int SearchParameters::looseningIterations() const {
1333+ return 1;
1334+}
1335+
1336+COL_NAMESPACE_END
1337+
1338
1339=== modified file 'src/WordList.cc'
1340--- src/WordList.cc 2012-12-07 11:01:33 +0000
1341+++ src/WordList.cc 2013-08-13 21:28:18 +0000
1342@@ -39,6 +39,10 @@
1343 p->words = wl.p->words;
1344 }
1345
1346+WordList::WordList(WordList &&wl) {
1347+ p = wl.p;
1348+ wl.p = nullptr;
1349+}
1350
1351 WordList::~WordList() {
1352 delete p;
1353@@ -64,6 +68,15 @@
1354 return *this;
1355 }
1356
1357+const WordList& WordList::operator=(WordList &&wl) {
1358+ if(this != &wl) {
1359+ delete p;
1360+ p = wl.p;
1361+ wl.p = nullptr;
1362+ }
1363+ return *this;
1364+}
1365+
1366 bool WordList::operator==(const WordList &l) const {
1367 return p->words == l.p->words;
1368 }
1369
1370=== modified file 'test/CAPITest.c'
1371--- test/CAPITest.c 2013-04-03 13:50:54 +0000
1372+++ test/CAPITest.c 2013-08-13 21:28:18 +0000
1373@@ -98,7 +98,7 @@
1374 void testMatching() {
1375 ColCorpus c = buildCorpus();
1376 ColMatcher m = col_matcher_new();
1377- ColMatchResults matches = col_match_results_new();
1378+ ColMatchResults matches;
1379 DocumentID dFarName = 1000;
1380 DocumentID name1 = 0;
1381 DocumentID name2 = 10;
1382@@ -106,7 +106,7 @@
1383 col_matcher_index(m, c);
1384 col_corpus_delete(c);
1385
1386- col_matcher_match(m, "abe", matches);
1387+ matches = col_matcher_match(m, "abe");
1388 assert(col_match_results_size(matches) == 2);
1389 assert(col_match_results_get_id(matches, 0) != dFarName);
1390 assert(col_match_results_get_id(matches, 1) != dFarName);
1391
1392=== modified file 'test/CMakeLists.txt'
1393--- test/CMakeLists.txt 2013-01-31 10:01:17 +0000
1394+++ test/CMakeLists.txt 2013-08-13 21:28:18 +0000
1395@@ -19,6 +19,7 @@
1396 coltest(indexweights IndexWeightsTest.cc)
1397 coltest(wordstore WordStoreTest.cc)
1398 coltest(filtering ResultFilterTest.cc)
1399+coltest(searchparameters SearchParametersTest.cc)
1400 coltest(capi CAPITest.c)
1401
1402 add_executable(lev_scalability LevScalabilityTest.cc)
1403
1404=== modified file 'test/ErrorValuesTest.cc'
1405--- test/ErrorValuesTest.cc 2013-04-03 13:50:54 +0000
1406+++ test/ErrorValuesTest.cc 2013-08-13 21:28:18 +0000
1407@@ -61,7 +61,7 @@
1408 assert(ev.getSubstituteError(a, aacute) == defaultError);
1409 assert(ev.getSubstituteError(e, aacute) == defaultError);
1410
1411- ev.addLatinAccents();
1412+ ev.addAccents(latinAccentGroup);
1413 assert(ev.isInGroup(e));
1414 assert(ev.isInGroup(eacute));
1415 assert(ev.isInGroup(ebreve));
1416@@ -69,7 +69,6 @@
1417 assert(ev.isInGroup(aacute));
1418 assert(ev.isInGroup(abreve));
1419
1420-
1421 assert(ev.getSubstituteError(e, eacute) == defaultGroupError);
1422 assert(ev.getSubstituteError(eacute, e) == defaultGroupError);
1423 assert(ev.getSubstituteError(eacute, ebreve) == defaultGroupError);
1424@@ -106,12 +105,31 @@
1425 assert(ev.getSubstituteError('j', '6') < ErrorValues::getDefaultError());
1426 }
1427
1428+void testBigError() {
1429+ ErrorValues ev;
1430+ Letter l1 = 1000; // Big values, so they are guaranteed to be outside of the LUT.
1431+ Letter l2 = 10000;
1432+ int smallError = 1;
1433+
1434+ assert(smallError < ErrorValues::getDefaultError());
1435+ assert(ev.getSubstituteError(l1, l2) == ErrorValues::getDefaultError());
1436+ assert(ev.getSubstituteError(l2, l1) == ErrorValues::getDefaultError());
1437+ assert(ev.getSubstituteError(l2, l2) == 0);
1438+
1439+ ev.setError(l1, l2, smallError);
1440+ assert(ev.getSubstituteError(l1, l2) == smallError);
1441+ assert(ev.getSubstituteError(l2, l1) == smallError);
1442+ assert(ev.getSubstituteError(l2, l2) == 0);
1443+
1444+}
1445+
1446 int main(int /*argc*/, char **/*argv*/) {
1447 try {
1448 testError();
1449 testGroupError();
1450 testKeyboardErrors();
1451 testNumberpadErrors();
1452+ testBigError();
1453 } catch(const std::exception &e) {
1454 fprintf(stderr, "Fail: %s\n", e.what());
1455 return 666;
1456
1457=== modified file 'test/HelpersTest.cc'
1458--- test/HelpersTest.cc 2013-04-03 13:50:54 +0000
1459+++ test/HelpersTest.cc 2013-08-13 21:28:18 +0000
1460@@ -25,8 +25,7 @@
1461 using namespace Columbus;
1462
1463 bool splitCorrectly(const char *txt, const WordList &l) {
1464- WordList result;
1465- splitToWords(txt, result);
1466+ WordList result = splitToWords(txt);
1467 return result == l;
1468 }
1469
1470@@ -57,8 +56,7 @@
1471 void testWeirdWord() {
1472 const unsigned char txt[] = {0x42, 0x6c, 0x75, 0x65, 0x73, 0x20, 0xe2, 0x80, 0x9a, 0xc3, 0x84, 0xc3, 0xb2, 0x6e, 0xe2, 0x80,
1473 0x9a, 0xc3, 0x84, 0xc3, 0xb4, 0x20, 0x54, 0x72, 0x6f, 0x75, 0x62, 0x6c, 0x65, 0x0d, 0x0a, 0};
1474- WordList l;
1475- splitToWords((const char*)txt, l);
1476+ WordList l = splitToWords((const char*)txt);
1477 assert(l.size() == 3);
1478 }
1479
1480
1481=== modified file 'test/MatchResultsTest.cc'
1482--- test/MatchResultsTest.cc 2013-04-03 13:50:54 +0000
1483+++ test/MatchResultsTest.cc 2013-08-13 21:28:18 +0000
1484@@ -46,9 +46,30 @@
1485 assert(r.getRelevancy(0) == r2);
1486 }
1487
1488+MatchResults gimme() {
1489+ MatchResults m;
1490+ m.addResult(1, 1);
1491+ m.addResult(2, 2);
1492+ return m;
1493+}
1494+
1495+/*
1496+ * For great Valgrind justice.
1497+ */
1498+void testAssignments() {
1499+ MatchResults m1, m2;
1500+ m1.addResult(3, 4);
1501+ m2 = m1;
1502+ MatchResults m3(m1);
1503+ MatchResults m4(m3);
1504+ MatchResults m5(gimme());
1505+ MatchResults m6 = gimme();
1506+}
1507+
1508 int main(int /*argc*/, char **/*argv*/) {
1509 try {
1510 testMatchResult();
1511+ testAssignments();
1512 } catch(const std::exception &e) {
1513 fprintf(stderr, "Fail: %s\n", e.what());
1514 return 666;
1515
1516=== modified file 'test/MatcherTest.cc'
1517--- test/MatcherTest.cc 2013-04-03 13:50:54 +0000
1518+++ test/MatcherTest.cc 2013-08-13 21:28:18 +0000
1519@@ -76,7 +76,7 @@
1520 delete(c);
1521
1522 queryList.addWord(w1);
1523- m.match(queryList, matches);
1524+ matches = m.match(queryList);
1525 assert(matches.size() == 2);
1526 assert(matches.getDocumentID(0) != dFarName);
1527 assert(matches.getDocumentID(1) != dFarName);
1528@@ -99,7 +99,7 @@
1529 delete c;
1530
1531 queryList.addWord(w1);
1532- m.match(queryList, matches);
1533+ matches = m.match(queryList);
1534 assert(matches.size() == 2);
1535 // Document doc1 has an exact match, so it should be the best match.
1536 assert(matches.getRelevancy(0) > matches.getRelevancy(1));
1537@@ -123,7 +123,7 @@
1538 c.addDocument(d2);
1539 m.index(c);
1540
1541- m.match("Sara Michell Geller", matches);
1542+ matches = m.match("Sara Michell Geller");
1543 assert(matches.getDocumentID(0) == correct);
1544 }
1545
1546
1547=== modified file 'test/ResultFilterTest.cc'
1548--- test/ResultFilterTest.cc 2013-04-03 13:50:54 +0000
1549+++ test/ResultFilterTest.cc 2013-08-13 21:28:18 +0000
1550@@ -17,6 +17,7 @@
1551 * along with this program. If not, see <http://www.gnu.org/licenses/>.
1552 */
1553
1554+#include "SearchParameters.hh"
1555 #include "ResultFilter.hh"
1556 #include "Word.hh"
1557 #include "Document.hh"
1558@@ -44,8 +45,8 @@
1559 Document d2(2);
1560 Corpus c;
1561 Matcher m;
1562- ResultFilter emptyFilter;
1563- ResultFilter onlyTakeFirst, onlyTakeSecond, orTest, andTest;
1564+ SearchParameters emptyFilter;
1565+ SearchParameters onlyTakeFirst, onlyTakeSecond, orTest, andTest;
1566
1567 d1.addText(textField, txt);
1568 d1.addText(filterField1, val1str);
1569@@ -57,33 +58,28 @@
1570 c.addDocument(d2);
1571
1572 m.index(c);
1573- MatchResults r1;
1574- m.match(txt, r1, emptyFilter);
1575+ MatchResults r1 = m.match(txt, emptyFilter);
1576 assert(r1.size() == 2);
1577
1578- onlyTakeFirst.addNewSubTerm(filterField1, val1);
1579- MatchResults r2;
1580- m.match(txt, r2, onlyTakeFirst);
1581+ onlyTakeFirst.getResultFilter().addNewSubTerm(filterField1, val1);
1582+ MatchResults r2 = m.match(txt, onlyTakeFirst);
1583 assert(r2.size() == 1);
1584 assert(r2.getDocumentID(0) == 1);
1585
1586- onlyTakeSecond.addNewSubTerm(filterField1, val2);
1587- MatchResults r3;
1588- m.match(txt, r3, onlyTakeSecond);
1589+ onlyTakeSecond.getResultFilter().addNewSubTerm(filterField1, val2);
1590+ MatchResults r3 = m.match(txt, onlyTakeSecond);
1591 assert(r3.size() == 1);
1592 assert(r3.getDocumentID(0) == 2);
1593
1594- orTest.addNewSubTerm(filterField1, val1);
1595- orTest.addNewTerm();
1596- orTest.addNewSubTerm(filterField1, val2);
1597- MatchResults orResults;
1598- m.match(txt, orResults, orTest);
1599+ orTest.getResultFilter().addNewSubTerm(filterField1, val1);
1600+ orTest.getResultFilter().addNewTerm();
1601+ orTest.getResultFilter().addNewSubTerm(filterField1, val2);
1602+ MatchResults orResults = m.match(txt, orTest);
1603 assert(orResults.size() == 2);
1604
1605- andTest.addNewSubTerm(filterField2, val2);
1606- andTest.addNewSubTerm(filterField1, val1);
1607- MatchResults andResults;
1608- m.match(txt, andResults, andTest);
1609+ andTest.getResultFilter().addNewSubTerm(filterField2, val2);
1610+ andTest.getResultFilter().addNewSubTerm(filterField1, val1);
1611+ MatchResults andResults = m.match(txt, andTest);
1612 assert(andResults.size() == 0);
1613 }
1614
1615
1616=== added file 'test/SearchParametersTest.cc'
1617--- test/SearchParametersTest.cc 1970-01-01 00:00:00 +0000
1618+++ test/SearchParametersTest.cc 2013-08-13 21:28:18 +0000
1619@@ -0,0 +1,86 @@
1620+/*
1621+ * Copyright (C) 2013 Canonical, Ltd.
1622+ *
1623+ * Authors:
1624+ * Jussi Pakkanen <jussi.pakkanen@canonical.com>
1625+ *
1626+ * This library is free software; you can redistribute it and/or modify it under
1627+ * the terms of version 3 of the GNU Lesser General Public License as published
1628+ * by the Free Software Foundation.
1629+ *
1630+ * This library is distributed in the hope that it will be useful, but WITHOUT
1631+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
1632+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
1633+ * details.
1634+ *
1635+ * You should have received a copy of the GNU Lesser General Public License
1636+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
1637+ */
1638+
1639+#include"SearchParameters.hh"
1640+#include"Word.hh"
1641+#include"Matcher.hh"
1642+#include"Document.hh"
1643+#include"Corpus.hh"
1644+#include"MatchResults.hh"
1645+#include<cassert>
1646+
1647+using namespace Columbus;
1648+
1649+void testDynamic() {
1650+ SearchParameters sp;
1651+ assert(sp.isDynamic());
1652+
1653+ sp.setDynamic(false);
1654+ assert(!sp.isDynamic());
1655+
1656+ sp.setDynamic(true);
1657+ assert(sp.isDynamic());
1658+}
1659+
1660+void testNosearch() {
1661+ SearchParameters sp;
1662+ Word w1("abc");
1663+ Word w2("def");
1664+
1665+ assert(!sp.isNonsearchingField(w1));
1666+ assert(!sp.isNonsearchingField(w2));
1667+
1668+ sp.addNonsearchingField(w1);
1669+ assert(sp.isNonsearchingField(w1));
1670+ assert(!sp.isNonsearchingField(w2));
1671+
1672+ sp.addNonsearchingField(w2);
1673+ assert(sp.isNonsearchingField(w1));
1674+ assert(sp.isNonsearchingField(w2));
1675+}
1676+
1677+void testNosearchMatching() {
1678+ Word textField("text");
1679+ Word search("field1");
1680+ Word nonSearch("field2");
1681+ const char *val1str = "one";
1682+ Corpus c;
1683+ Matcher m;
1684+ SearchParameters sp;
1685+ MatchResults r;
1686+ Document d1(1);
1687+ Document d2(2);
1688+
1689+ sp.addNonsearchingField(nonSearch);
1690+ d1.addText(search, val1str);
1691+ d2.addText(nonSearch, val1str);
1692+ c.addDocument(d1);
1693+ c.addDocument(d2);
1694+ m.index(c);
1695+
1696+ r = m.match(val1str, sp);
1697+ assert(r.size() == 1);
1698+ assert(r.getDocumentID(0) == 1);
1699+}
1700+
1701+int main(int /*argc*/, char **/*argv*/) {
1702+ testDynamic();
1703+ testNosearch();
1704+ testNosearchMatching();
1705+}
1706
1707=== modified file 'test/pythontest.py'
1708--- test/pythontest.py 2013-01-23 13:50:20 +0000
1709+++ test/pythontest.py 2013-08-13 21:28:18 +0000
1710@@ -89,7 +89,7 @@
1711 def test_doc(self):
1712 docid = 435
1713 field = columbus.Word('fieldname')
1714- text = columbus.split_to_words('ye olde butcherede englishe')
1715+ text = 'ye olde butcherede englishe'
1716 d = columbus.Document(docid)
1717
1718 self.assertEqual(d.get_id(), docid, 'Document ID got mangled.')
1719@@ -98,7 +98,7 @@
1720 d.add_text(field, text)
1721 self.assertEqual(d.field_count(), 1, 'field count did not increase')
1722 self.assertGreater(len(text), 0)
1723- self.assertEqual(len(d.get_text(field)), len(text), 'stored text got mangled')
1724+ self.assertEqual(len(d.get_text(field)), len(text.split()), 'stored text got mangled')
1725
1726 class TestCorpus(unittest.TestCase):
1727
1728@@ -138,24 +138,23 @@
1729 def test_simple_match(self):
1730 c = columbus.Corpus()
1731 m = columbus.Matcher()
1732- matches = columbus.MatchResults()
1733 name1 = 0;
1734 name2 = 10;
1735 name3 = 1000;
1736 textName = columbus.Word("title")
1737
1738 d1 = columbus.Document(name1)
1739- d1.add_text(textName, columbus.split_to_words("abc def"))
1740+ d1.add_text(textName, "abc def")
1741 d2 = columbus.Document(name2)
1742- d2.add_text(textName, columbus.split_to_words("abe test"))
1743+ d2.add_text(textName, "abe test")
1744 dFar = columbus.Document(name3)
1745- dFar.add_text(textName, columbus.split_to_words("faraway donotmatchme"))
1746+ dFar.add_text(textName, "faraway donotmatchme")
1747 c.add_document(d1)
1748 c.add_document(d2)
1749 c.add_document(dFar)
1750 m.index(c)
1751
1752- m.match(columbus.split_to_words("abe"), matches)
1753+ matches = m.match("abe")
1754 self.assertEqual(len(matches), 2)
1755 self.assertNotEqual(matches.get_document_id(0), name3);
1756 self.assertNotEqual(matches.get_document_id(1), name3);
1757
1758=== modified file 'tools/hudtest.cc'
1759--- tools/hudtest.cc 2013-04-03 13:50:54 +0000
1760+++ tools/hudtest.cc 2013-08-13 21:28:18 +0000
1761@@ -64,7 +64,7 @@
1762 double queryStart, queryEnd;
1763 try {
1764 queryStart = hiresTimestamp();
1765- app->m->match(gtk_entry_get_text(GTK_ENTRY(app->entry)), matches);
1766+ matches = app->m->match(gtk_entry_get_text(GTK_ENTRY(app->entry)));
1767 queryEnd = hiresTimestamp();
1768 } catch(exception &e) {
1769 printf("Matching failed: %s\n", e.what());
1770@@ -181,8 +181,8 @@
1771 if(line[line.size()-2] == '\r')
1772 line[line.size()-2] = '\0';
1773 splitShowableParts(line, pathText, commandText);
1774- splitToWords(pathText.c_str(), path);
1775- splitToWords(commandText.c_str(), command);
1776+ path = splitToWords(pathText.c_str());
1777+ command = splitToWords(commandText.c_str());
1778 if(command.size() == 0)
1779 continue;
1780 Document d(app.pathSource.size());
1781
1782=== modified file 'tools/numberpad.cc'
1783--- tools/numberpad.cc 2013-04-03 13:50:54 +0000
1784+++ tools/numberpad.cc 2013-08-13 21:28:18 +0000
1785@@ -89,7 +89,7 @@
1786 double queryStart, queryEnd;
1787 try {
1788 queryStart = hiresTimestamp();
1789- app->m->match(gtk_entry_get_text(GTK_ENTRY(app->entry)), matches);
1790+ matches = app->m->match(gtk_entry_get_text(GTK_ENTRY(app->entry)));
1791 queryEnd = hiresTimestamp();
1792 } catch(exception &e) {
1793 printf("Matching failed: %s\n", e.what());
1794
1795=== modified file 'tools/queryapp.cc'
1796--- tools/queryapp.cc 2013-04-03 13:50:54 +0000
1797+++ tools/queryapp.cc 2013-08-13 21:28:18 +0000
1798@@ -61,7 +61,7 @@
1799 double queryStart, queryEnd;
1800 try {
1801 queryStart = hiresTimestamp();
1802- app->m->match(gtk_entry_get_text(GTK_ENTRY(app->entry)), matches);
1803+ matches = app->m->match(gtk_entry_get_text(GTK_ENTRY(app->entry)));
1804 queryEnd = hiresTimestamp();
1805 } catch(exception &e) {
1806 printf("Matching failed: %s\n", e.what());
1807
1808=== modified file 'tools/sctest.cc'
1809--- tools/sctest.cc 2013-04-03 13:50:54 +0000
1810+++ tools/sctest.cc 2013-08-13 21:28:18 +0000
1811@@ -88,7 +88,7 @@
1812 double queryStart, queryEnd;
1813 try {
1814 queryStart = hiresTimestamp();
1815- app->m->match(gtk_entry_get_text(GTK_ENTRY(app->entry)), matches);
1816+ matches = app->m->match(gtk_entry_get_text(GTK_ENTRY(app->entry)));
1817 queryEnd = hiresTimestamp();
1818 } catch(exception &e) {
1819 printf("Matching failed: %s\n", e.what());
1820@@ -176,7 +176,7 @@
1821 Word n;
1822 size_t equalsLoc = line.find('=', 0);
1823 if(equalsLoc < line.length()) {
1824- splitToWords(line.c_str() + equalsLoc + 1, vals);
1825+ vals = splitToWords(line.c_str() + equalsLoc + 1);
1826 line[equalsLoc] = '\0';
1827 try {
1828 n = line.c_str();

Subscribers

People subscribed via source and target branches