Merge lp:~dbarth/zeitgeist-extensions/fts-cjk-support into lp:zeitgeist-extensions

Proposed by David Barth
Status: Merged
Merged at revision: 58
Proposed branch: lp:~dbarth/zeitgeist-extensions/fts-cjk-support
Merge into: lp:zeitgeist-extensions
Diff against target: 26 lines (+9/-0)
1 file modified
fts/fts.py (+9/-0)
To merge this branch: bzr merge lp:~dbarth/zeitgeist-extensions/fts-cjk-support
Reviewer Review Type Date Requested Status
Mikkel Kamstrup Erlandsen Approve
David Barth (community) Needs Resubmitting
Michael Vogt Pending
Review via email: mp+72903@code.launchpad.net

Description of the change

Add support for the new CJK tokenizer in Xapian

To post a comment you must log in.
59. By David Barth

Upgrade by forcing a reindex for older databases

Revision history for this message
David Barth (dbarth) wrote :

For reference, the upstream recommendation for activating the CJK tokenizer. This is in the perspective of future optional tokenizer, and to avoid an API change while a new official release is not ready yet.

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

The condition os.environ['XAPIAN_CJK_NGRAM'] == 1 will always be false, use "1". But maybe we really want to check != None since I thin I recall Olly mentioning that Xapian activates CJK if the envvar is set (disregarding value).

Stylistically we use double quotes " and not single quotes ' for strings.

review: Needs Fixing
Revision history for this message
David Barth (dbarth) wrote :

This should be better now

review: Needs Resubmitting
60. By David Barth

smarter test and better style

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

I merged this with some slight tweaks. Now trying to convince the test harness to run :-)

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'fts/fts.py'
2--- fts/fts.py 2011-07-04 10:25:10 +0000
3+++ fts/fts.py 2011-09-01 08:57:26 +0000
4@@ -176,6 +176,9 @@
5
6 def __init__ (self, engine):
7 self._engine = engine
8+
9+ # Activate support for the CJK tokenizer
10+ os.environ["XAPIAN_CJK_NGRAM"] = "1"
11
12 log.debug("Opening full text index: %s" % INDEX_FILE)
13 self._index = xapian.WritableDatabase(INDEX_FILE, xapian.DB_CREATE_OR_OPEN)
14@@ -217,6 +220,12 @@
15 self._check_index ()
16
17 def _check_index (self):
18+ if self._index.get_metadata("cjk_ngram") != "1" and os.environ["XAPIAN_CJK_NGRAM"] != None:
19+ # If the database was built prior to CJK support
20+ # force of a reindex
21+ self._index.set_metadata("cjk_ngram", "1")
22+ gobject.idle_add (self._reindex)
23+
24 if self._index.get_doccount() == 0:
25 # If the index is empty we trigger a rebuild
26 # We must delay reindexing until after the engine is done setting up

Subscribers

People subscribed via source and target branches

to all changes: