Merge lp:~zeitgeist/zeitgeist/fts++ into lp:~zeitgeist/zeitgeist/bluebird
- fts++
- Merge into bluebird
Status: | Merged |
---|---|
Merged at revision: | 390 |
Proposed branch: | lp:~zeitgeist/zeitgeist/fts++ |
Merge into: | lp:~zeitgeist/zeitgeist/bluebird |
Diff against target: |
5387 lines (+3135/-1900) 36 files modified
.bzrignore (+12/-1) configure.ac (+3/-1) extensions/Makefile.am (+1/-1) extensions/fts++/Makefile.am (+113/-0) extensions/fts++/controller.cpp (+136/-0) extensions/fts++/controller.h (+72/-0) extensions/fts++/ext-dummies.vala (+71/-0) extensions/fts++/fts.cpp (+136/-0) extensions/fts++/fts.h (+59/-0) extensions/fts++/fts.vapi (+25/-0) extensions/fts++/indexer.cpp (+897/-0) extensions/fts++/indexer.h (+115/-0) extensions/fts++/org.gnome.zeitgeist.fts.service.in (+3/-0) extensions/fts++/stringutils.cpp (+128/-0) extensions/fts++/stringutils.h (+42/-0) extensions/fts++/task.cpp (+47/-0) extensions/fts++/task.h (+100/-0) extensions/fts++/test/Makefile.am (+27/-0) extensions/fts++/test/test-fts.c (+37/-0) extensions/fts++/test/test-indexer.cpp (+531/-0) extensions/fts++/test/test-stringutils.cpp (+178/-0) extensions/fts++/zeitgeist-fts.vala (+301/-0) extensions/fts-python/Makefile.am (+0/-23) extensions/fts-python/constants.py (+0/-71) extensions/fts-python/datamodel.py (+0/-83) extensions/fts-python/fts.py (+0/-1273) extensions/fts-python/lrucache.py (+0/-125) extensions/fts-python/org.gnome.zeitgeist.fts.service.in (+0/-3) extensions/fts-python/sql.py (+0/-301) extensions/fts.vala (+13/-1) src/datamodel.vala (+0/-3) src/engine.vala (+1/-0) src/notify.vala (+65/-12) src/sql.vala (+1/-1) src/table-lookup.vala (+20/-0) src/zeitgeist-daemon.vala (+1/-1) |
To merge this branch: | bzr merge lp:~zeitgeist/zeitgeist/fts++ |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Siegfried Gevatter | Approve | ||
Mikkel Kamstrup Erlandsen | Approve | ||
Review via email: mp+92022@code.launchpad.net |
Commit message
Description of the change
More core changes so we can implement the new FTS daemon, plus the daemon itself.
- 432. By Michal Hruby
-
Remove debug warning
Michal Hruby (mhr3) wrote : | # |
> Awesome! C++ FTS ftw.
>
> - Add COPYING.GPL3, otherwise the tarball can't be re-distributed.
>
On it...
> - Considering sharing a get_flags_
> function between ZG and FTS?
>
I don't think that's really necessary, strictly speaking it'd be a utility function for a specific app, and has no place in a library.
> - s/ver != DatabaseSchema.
> DatabaseSchema.
> What's the rationale for this? We don't know changes won't break
> compatibility
>
Does that mean we should automatically assume that the possible changes do break stuff? This is only used with read-only database so I don't see any harm - either the reading will continue to work or you'll get some run-time errors, I find that better than just not working with even trying.
> - Can you explain the "// Don't disconnect monitors using service names"?
>
As said on IRC, it prevents some races by allowing the internal extensions to register a monitor with a service name (races that would otherwise cause missed notifications when the external daemon is starting and didn't have a chance to register a monitor)
> I didn't really review the C++ stuff (I'm asuming you and Mikkel reviewed each
> other's stuff already?).
Partially, but we have tests, so it has to work, right?! :)
Mikkel Kamstrup Erlandsen (kamstrup) wrote : | # |
Functionally tested in Unity and working well. Unit tests passing. However -
There seems to be a fairly bad leak somewhere. Try repeatedly searching for 'u' or something like that and you'll see the memory consumption go up fairly fast.
Mikkel Kamstrup Erlandsen (kamstrup) wrote : | # |
1583 +void Indexer::Flush ()
1584 +{
1585 + db->flush ();
1586 +}
This need to be Commit() and db->commit(). See http://
Mikkel Kamstrup Erlandsen (kamstrup) wrote : | # |
1550 +void Indexer::DropIndex ()
Are we not leaking db and enquire in this method?
Michal Hruby (mhr3) wrote : | # |
> Functionally tested in Unity and working well. Unit tests passing. However -
>
> There seems to be a fairly bad leak somewhere. Try repeatedly searching for
> 'u' or something like that and you'll see the memory consumption go up fairly
> fast.
Nope, sorry can't reproduce that, the first search does indeed increase the mem usage considerably, but that is just xapian initializing its caches afaict. If i search for the same thing over and over again the mem usage stays constant here.
> This need to be Commit() and db->commit(). You should probably also surround it with a try/catch.
Fixing...
> Are we not leaking db and enquire in this method?
db is closed and deleted, but yes enquire is leaked. Fixing.
Mikkel Kamstrup Erlandsen (kamstrup) wrote : | # |
> > Functionally tested in Unity and working well. Unit tests passing. However -
> >
> > There seems to be a fairly bad leak somewhere. Try repeatedly searching for
> > 'u' or something like that and you'll see the memory consumption go up
> fairly
> > fast.
>
> Nope, sorry can't reproduce that, the first search does indeed increase the
> mem usage considerably, but that is just xapian initializing its caches
> afaict. If i search for the same thing over and over again the mem usage stays
> constant here.
Odd, now I can't reproduce it here either... I swear I had it sitting at around 16mb writable, and while searching I could see it crawl 1mb at a time all the way past 30mb... But now it sits steady at around 14mb writable (which is still surprisingly much, but stable at least).
- 433. By Michal Hruby
-
We wanted to use GPL2+. Make it so
- 434. By Michal Hruby
-
Fix issues found during review
Mikkel Kamstrup Erlandsen (kamstrup) wrote : | # |
Looking good to me. I'd like someone else to +1 it before we merge though...
Outstanding work Michal!
Seif Lotfy (seif) wrote : | # |
I have been using it for 2 days now...
I noticed an small increase in memory consumption around 2-4 MB
However this is nothing that really bothers me
AWESOME WORK
On Thu, Feb 9, 2012 at 11:47 AM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:
> Review: Approve
>
> Looking good to me. I'd like someone else to +1 it before we merge
> though...
>
> Outstanding work Michal!
> --
> https:/
> You are subscribed to branch lp:zeitgeist.
>
- 435. By Michal Hruby
-
Update TableLookup if necessary
Siegfried Gevatter (rainct) wrote : | # |
OK, merging it, but there's some outstanding stuff:
* Important: the TableLookup in FTS can currently explode. The new schema version needs to add AUTOINCREMENT to the `id' row of all tables in TableLookup. We should do this before releasing a new tarball.
* TableLookup.
* Configure isn't checking for xapian being there
* Add a flag to disable FTS? (keeping the Xapian dependency avoidable)
Seif Lotfy (seif) wrote : | # |
I like the last one :)
On Fri, Feb 10, 2012 at 12:30 PM, Siegfried Gevatter <email address hidden>wrote:
> Review: Approve
>
> OK, merging it, but there's some outstanding stuff:
>
> * Important: the TableLookup in FTS can currently explode. The new schema
> version needs to add AUTOINCREMENT to the `id' row of all tables in
> TableLookup. We should do this before releasing a new tarball.
>
> * TableLookup.
>
> * Configure isn't checking for xapian being there
>
> * Add a flag to disable FTS? (keeping the Xapian dependency avoidable)
> --
> https:/
> You are subscribed to branch lp:zeitgeist.
>
Preview Diff
1 | === modified file '.bzrignore' |
2 | --- .bzrignore 2011-12-31 13:30:23 +0000 |
3 | +++ .bzrignore 2012-02-09 22:47:22 +0000 |
4 | @@ -44,12 +44,23 @@ |
5 | extensions/*.stamp |
6 | extensions/*.la |
7 | extensions/*.lo |
8 | +extensions/fts++/.deps |
9 | +extensions/fts++/.libs |
10 | +extensions/fts++/*.c |
11 | +extensions/fts++/*.stamp |
12 | +extensions/fts++/*.la |
13 | +extensions/fts++/*.lo |
14 | +extensions/fts++/zeitgeist-internal.* |
15 | +extensions/fts++/test/.deps |
16 | +extensions/fts++/test/.libs |
17 | +extensions/fts++/test/test-fts |
18 | +extensions/fts++/org.gnome.zeitgeist.fts.service |
19 | +extensions/fts++/zeitgeist-fts |
20 | test/direct/marshalling |
21 | test/dbus/__pycache__ |
22 | test/direct/table-lookup-test |
23 | src/zeitgeist-engine.vapi |
24 | src/zeitgeist-engine.h |
25 | -extensions/fts-python/org.gnome.zeitgeist.fts.service |
26 | py-compile |
27 | python/_ontology.py |
28 | test/direct/*.c |
29 | |
30 | === modified file 'configure.ac' |
31 | --- configure.ac 2012-01-27 15:39:16 +0000 |
32 | +++ configure.ac 2012-02-09 22:47:22 +0000 |
33 | @@ -8,6 +8,7 @@ |
34 | |
35 | AC_PROG_CC |
36 | AM_PROG_CC_C_O |
37 | +AC_PROG_CXX |
38 | AC_DISABLE_STATIC |
39 | AC_PROG_LIBTOOL |
40 | |
41 | @@ -59,7 +60,8 @@ |
42 | Makefile |
43 | src/Makefile |
44 | extensions/Makefile |
45 | - extensions/fts-python/Makefile |
46 | + extensions/fts++/Makefile |
47 | + extensions/fts++/test/Makefile |
48 | data/Makefile |
49 | data/ontology/Makefile |
50 | python/Makefile |
51 | |
52 | === modified file 'extensions/Makefile.am' |
53 | --- extensions/Makefile.am 2011-12-25 16:24:04 +0000 |
54 | +++ extensions/Makefile.am 2012-02-09 22:47:22 +0000 |
55 | @@ -1,4 +1,4 @@ |
56 | -SUBDIRS = fts-python |
57 | +SUBDIRS = fts++ |
58 | |
59 | NULL = |
60 | |
61 | |
62 | === added directory 'extensions/fts++' |
63 | === added file 'extensions/fts++/Makefile.am' |
64 | --- extensions/fts++/Makefile.am 1970-01-01 00:00:00 +0000 |
65 | +++ extensions/fts++/Makefile.am 2012-02-09 22:47:22 +0000 |
66 | @@ -0,0 +1,113 @@ |
67 | +SUBDIRS = test |
68 | +NULL = |
69 | + |
70 | +noinst_LTLIBRARIES = libzeitgeist-internal.la |
71 | +libexec_PROGRAMS = zeitgeist-fts |
72 | + |
73 | +servicedir = $(DBUS_SERVICES_DIR) |
74 | +service_DATA = org.gnome.zeitgeist.fts.service |
75 | + |
76 | +org.gnome.zeitgeist.fts.service: org.gnome.zeitgeist.fts.service.in |
77 | + $(AM_V_GEN)sed -e s!\@libexecdir\@!$(libexecdir)! < $< > $@ |
78 | +org.gnome.zeitgeist.fts.service: Makefile |
79 | + |
80 | +AM_CPPFLAGS = \ |
81 | + $(ZEITGEIST_CFLAGS) \ |
82 | + -include $(CONFIG_HEADER) \ |
83 | + -w \ |
84 | + $(NULL) |
85 | + |
86 | +AM_VALAFLAGS = \ |
87 | + --target-glib=2.26 \ |
88 | + --pkg gio-2.0 \ |
89 | + --pkg sqlite3 \ |
90 | + --pkg posix \ |
91 | + --pkg gmodule-2.0 \ |
92 | + $(top_srcdir)/config.vapi \ |
93 | + $(NULL) |
94 | + |
95 | +libzeitgeist_internal_la_VALASOURCES = \ |
96 | + datamodel.vala \ |
97 | + db-reader.vala \ |
98 | + engine.vala \ |
99 | + sql.vala \ |
100 | + remote.vala \ |
101 | + utils.vala \ |
102 | + errors.vala \ |
103 | + table-lookup.vala \ |
104 | + sql-schema.vala \ |
105 | + where-clause.vala \ |
106 | + ontology.vala \ |
107 | + ontology-uris.vala \ |
108 | + mimetype.vala \ |
109 | + ext-dummies.vala \ |
110 | + $(NULL) |
111 | + |
112 | +libzeitgeist_internal_la_SOURCES = \ |
113 | + zeitgeist-internal.stamp \ |
114 | + $(libzeitgeist_internal_la_VALASOURCES:.vala=.c) \ |
115 | + $(NULL) |
116 | + |
117 | +libzeitgeist_internal_la_LIBADD = \ |
118 | + $(ZEITGEIST_LIBS) \ |
119 | + $(NULL) |
120 | + |
121 | +zeitgeist_fts_VALASOURCES = \ |
122 | + zeitgeist-fts.vala \ |
123 | + $(NULL) |
124 | + |
125 | +zeitgeist_fts_SOURCES = \ |
126 | + zeitgeist-fts_vala.stamp \ |
127 | + $(zeitgeist_fts_VALASOURCES:.vala=.c) \ |
128 | + controller.cpp \ |
129 | + controller.h \ |
130 | + fts.cpp \ |
131 | + fts.h \ |
132 | + indexer.cpp \ |
133 | + indexer.h \ |
134 | + task.cpp \ |
135 | + task.h \ |
136 | + stringutils.cpp \ |
137 | + stringutils.h \ |
138 | + $(NULL) |
139 | + |
140 | +zeitgeist_fts_LDADD = \ |
141 | + $(builddir)/libzeitgeist-internal.la \ |
142 | + -lxapian \ |
143 | + $(NULL) |
144 | + |
145 | +BUILT_SOURCES = \ |
146 | + zeitgeist-internal.stamp \ |
147 | + zeitgeist-fts_vala.stamp \ |
148 | + $(NULL) |
149 | + |
150 | +zeitgeist-internal.stamp: $(libzeitgeist_internal_la_VALASOURCES) |
151 | + $(VALA_V)$(VALAC) $(AM_VALAFLAGS) $(VALAFLAGS) -C -H zeitgeist-internal.h --library zeitgeist-internal $^ |
152 | + @touch "$@" |
153 | + |
154 | +zeitgeist-fts_vala.stamp: $(zeitgeist_fts_VALASOURCES) |
155 | + $(VALA_V)$(VALAC) $(AM_VALAFLAGS) $(VALAFLAGS) \ |
156 | + $(srcdir)/zeitgeist-internal.vapi $(srcdir)/fts.vapi -C $^ |
157 | + @touch "$@" |
158 | + |
159 | +EXTRA_DIST = \ |
160 | + $(libzeitgeist_internal_la_VALASOURCES) \ |
161 | + $(zeitgeist_fts_VALASOURCES) \ |
162 | + zeitgeist-fts_vala.stamp \ |
163 | + zeitgeist-internal.h \ |
164 | + zeitgeist-internal.vapi \ |
165 | + org.gnome.zeitgeist.fts.service.in \ |
166 | + $(NULL) |
167 | + |
168 | +CLEANFILES = org.gnome.zeitgeist.fts.service |
169 | + |
170 | +DISTCLEANFILES = \ |
171 | + $(NULL) |
172 | + |
173 | +distclean-local: |
174 | + rm -f *.c *.o *.stamp *.~[0-9]~ |
175 | + |
176 | +VALA_V = $(VALA_V_$(V)) |
177 | +VALA_V_ = $(VALA_V_$(AM_DEFAULT_VERBOSITY)) |
178 | +VALA_V_0 = @echo " VALAC " $^; |
179 | + |
180 | |
181 | === added file 'extensions/fts++/controller.cpp' |
182 | --- extensions/fts++/controller.cpp 1970-01-01 00:00:00 +0000 |
183 | +++ extensions/fts++/controller.cpp 2012-02-09 22:47:22 +0000 |
184 | @@ -0,0 +1,136 @@ |
185 | +/* |
186 | + * Copyright (C) 2012 Mikkel Kamstrup Erlandsen |
187 | + * |
188 | + * This program is free software; you can redistribute it and/or |
189 | + * modify it under the terms of the GNU General Public License |
190 | + * as published by the Free Software Foundation; either version 2 |
191 | + * of the License, or (at your option) any later version. |
192 | + * |
193 | + * This program is distributed in the hope that it will be useful, |
194 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
195 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
196 | + * GNU General Public License for more details. |
197 | + * |
198 | + * You should have received a copy of the GNU General Public License |
199 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
200 | + * |
201 | + * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
202 | + * |
203 | + */ |
204 | + |
205 | +#include "controller.h" |
206 | + |
207 | +namespace ZeitgeistFTS { |
208 | + |
209 | +void Controller::Initialize (GError **error) |
210 | +{ |
211 | + indexer->Initialize (error); |
212 | +} |
213 | + |
214 | +void Controller::Run () |
215 | +{ |
216 | + if (!indexer->CheckIndex ()) |
217 | + { |
218 | + indexer->DropIndex (); |
219 | + RebuildIndex (); |
220 | + } |
221 | +} |
222 | + |
223 | +void Controller::RebuildIndex () |
224 | +{ |
225 | + GError *error = NULL; |
226 | + GPtrArray *events; |
227 | + GPtrArray *templates = g_ptr_array_new (); |
228 | + ZeitgeistTimeRange *time_range = zeitgeist_time_range_new_anytime (); |
229 | + |
230 | + g_debug ("asking reader for all events"); |
231 | + events = zeitgeist_db_reader_find_events (zg_reader, |
232 | + time_range, |
233 | + templates, |
234 | + ZEITGEIST_STORAGE_STATE_ANY, |
235 | + 0, |
236 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
237 | + NULL, |
238 | + &error); |
239 | + |
240 | + if (error) |
241 | + { |
242 | + g_warning ("%s", error->message); |
243 | + g_error_free (error); |
244 | + } |
245 | + else |
246 | + { |
247 | + g_debug ("reader returned %u events", events->len); |
248 | + |
249 | + IndexEvents (events); |
250 | + g_ptr_array_unref (events); |
251 | + |
252 | + // Set the db metadata key only once we're done |
253 | + PushTask (new MetadataTask ("fts_index_version", INDEX_VERSION)); |
254 | + } |
255 | + |
256 | + g_object_unref (time_range); |
257 | + g_ptr_array_unref (templates); |
258 | +} |
259 | + |
260 | +void Controller::IndexEvents (GPtrArray *events) |
261 | +{ |
262 | + const int CHUNK_SIZE = 32; |
263 | + // Break down index tasks into suitable chunks |
264 | + for (unsigned i = 0; i < events->len; i += CHUNK_SIZE) |
265 | + { |
266 | + PushTask (new IndexEventsTask (g_ptr_array_ref (events), i, CHUNK_SIZE)); |
267 | + } |
268 | +} |
269 | + |
270 | +void Controller::DeleteEvents (guint *event_ids, int event_ids_size) |
271 | +{ |
272 | + // FIXME: Should we break the task here as well? |
273 | + PushTask (new DeleteEventsTask (event_ids, event_ids_size)); |
274 | +} |
275 | + |
276 | +void Controller::PushTask (Task* task) |
277 | +{ |
278 | + queued_tasks.push (task); |
279 | + |
280 | + if (processing_source_id == 0) |
281 | + { |
282 | + processing_source_id = |
283 | + g_idle_add ((GSourceFunc) &Controller::ProcessTask, this); |
284 | + } |
285 | +} |
286 | + |
287 | +gboolean Controller::ProcessTask () |
288 | +{ |
289 | + if (!queued_tasks.empty ()) |
290 | + { |
291 | + Task *task; |
292 | + |
293 | + task = queued_tasks.front (); |
294 | + queued_tasks.pop (); |
295 | + |
296 | + task->Process (indexer); |
297 | + delete task; |
298 | + } |
299 | + |
300 | + bool all_done = queued_tasks.empty (); |
301 | + if (all_done) |
302 | + { |
303 | + indexer->Commit (); |
304 | + if (processing_source_id != 0) |
305 | + { |
306 | + g_source_remove (processing_source_id); |
307 | + processing_source_id = 0; |
308 | + } |
309 | + return FALSE; |
310 | + } |
311 | + |
312 | + return TRUE; |
313 | +} |
314 | + |
315 | +bool Controller::HasPendingTasks () |
316 | +{ |
317 | + return !queued_tasks.empty (); |
318 | +} |
319 | + |
320 | +} |
321 | |
322 | === added file 'extensions/fts++/controller.h' |
323 | --- extensions/fts++/controller.h 1970-01-01 00:00:00 +0000 |
324 | +++ extensions/fts++/controller.h 2012-02-09 22:47:22 +0000 |
325 | @@ -0,0 +1,72 @@ |
326 | +/* |
327 | + * Copyright (C) 2012 Mikkel Kamstrup Erlandsen |
328 | + * |
329 | + * This program is free software; you can redistribute it and/or |
330 | + * modify it under the terms of the GNU General Public License |
331 | + * as published by the Free Software Foundation; either version 2 |
332 | + * of the License, or (at your option) any later version. |
333 | + * |
334 | + * This program is distributed in the hope that it will be useful, |
335 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
336 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
337 | + * GNU General Public License for more details. |
338 | + * |
339 | + * You should have received a copy of the GNU General Public License |
340 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
341 | + * |
342 | + * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
343 | + * |
344 | + */ |
345 | + |
346 | +#ifndef _ZGFTS_CONTROLLER_H_ |
347 | +#define _ZGFTS_CONTROLLER_H_ |
348 | + |
349 | +#include <glib-object.h> |
350 | +#include <queue> |
351 | +#include <vector> |
352 | + |
353 | +#include "indexer.h" |
354 | +#include "task.h" |
355 | +#include "zeitgeist-internal.h" |
356 | + |
357 | +namespace ZeitgeistFTS { |
358 | + |
359 | +class Controller { |
360 | +public: |
361 | + Controller (ZeitgeistDbReader *reader) |
362 | + : zg_reader (reader) |
363 | + , processing_source_id (0) |
364 | + , indexer (new Indexer (reader)) {}; |
365 | + |
366 | + ~Controller () |
367 | + { |
368 | + if (processing_source_id != 0) |
369 | + { |
370 | + g_source_remove (processing_source_id); |
371 | + } |
372 | + } |
373 | + |
374 | + void Initialize (GError **error); |
375 | + void Run (); |
376 | + void RebuildIndex (); |
377 | + |
378 | + void IndexEvents (GPtrArray *events); |
379 | + void DeleteEvents (guint *event_ids, int event_ids_size); |
380 | + |
381 | + void PushTask (Task* task); |
382 | + bool HasPendingTasks (); |
383 | + gboolean ProcessTask (); |
384 | + |
385 | + Indexer *indexer; |
386 | + |
387 | +private: |
388 | + ZeitgeistDbReader *zg_reader; |
389 | + |
390 | + typedef std::queue<Task*> TaskQueue; |
391 | + TaskQueue queued_tasks; |
392 | + guint processing_source_id; |
393 | +}; |
394 | + |
395 | +} |
396 | + |
397 | +#endif /* _ZGFTS_CONTROLLER_H_ */ |
398 | |
399 | === added symlink 'extensions/fts++/datamodel.vala' |
400 | === target is u'../../src/datamodel.vala' |
401 | === added symlink 'extensions/fts++/db-reader.vala' |
402 | === target is u'../../src/db-reader.vala' |
403 | === added symlink 'extensions/fts++/engine.vala' |
404 | === target is u'../../src/engine.vala' |
405 | === added symlink 'extensions/fts++/errors.vala' |
406 | === target is u'../../src/errors.vala' |
407 | === added file 'extensions/fts++/ext-dummies.vala' |
408 | --- extensions/fts++/ext-dummies.vala 1970-01-01 00:00:00 +0000 |
409 | +++ extensions/fts++/ext-dummies.vala 2012-02-09 22:47:22 +0000 |
410 | @@ -0,0 +1,71 @@ |
411 | +/* ext-dummies.vala |
412 | + * |
413 | + * Copyright © 2011-2012 Michal Hruby <michal.mhr@gmail.com> |
414 | + * |
415 | + * This program is free software: you can redistribute it and/or modify |
416 | + * it under the terms of the GNU Lesser General Public License as published by |
417 | + * the Free Software Foundation, either version 2.1 of the License, or |
418 | + * (at your option) any later version. |
419 | + * |
420 | + * This program is distributed in the hope that it will be useful, |
421 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
422 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
423 | + * GNU General Public License for more details. |
424 | + * |
425 | + * You should have received a copy of the GNU Lesser General Public License |
426 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
427 | + * |
428 | + */ |
429 | + |
430 | +namespace Zeitgeist |
431 | +{ |
432 | + public class ExtensionCollection : Object |
433 | + { |
434 | + public unowned Engine engine { get; construct; } |
435 | + |
436 | + public ExtensionCollection (Engine engine) |
437 | + { |
438 | + Object (engine: engine); |
439 | + } |
440 | + |
441 | + public string[] get_extension_names () |
442 | + { |
443 | + string[] result = {}; |
444 | + return result; |
445 | + } |
446 | + |
447 | + public void call_pre_insert_events (GenericArray<Event?> events, |
448 | + BusName? sender) |
449 | + { |
450 | + } |
451 | + |
452 | + public void call_post_insert_events (GenericArray<Event?> events, |
453 | + BusName? sender) |
454 | + { |
455 | + } |
456 | + |
457 | + public unowned uint32[] call_pre_delete_events (uint32[] event_ids, |
458 | + BusName? sender) |
459 | + { |
460 | + return event_ids; |
461 | + } |
462 | + |
463 | + public void call_post_delete_events (uint32[] event_ids, |
464 | + BusName? sender) |
465 | + { |
466 | + } |
467 | + } |
468 | + |
469 | + public class ExtensionStore : Object |
470 | + { |
471 | + public unowned Engine engine { get; construct; } |
472 | + |
473 | + public ExtensionStore (Engine engine) |
474 | + { |
475 | + Object (engine: engine); |
476 | + } |
477 | + } |
478 | + |
479 | +} |
480 | + |
481 | +// vim:expandtab:ts=4:sw=4 |
482 | |
483 | === added file 'extensions/fts++/fts.cpp' |
484 | --- extensions/fts++/fts.cpp 1970-01-01 00:00:00 +0000 |
485 | +++ extensions/fts++/fts.cpp 2012-02-09 22:47:22 +0000 |
486 | @@ -0,0 +1,136 @@ |
487 | +/* |
488 | + * Copyright (C) 2012 Canonical Ltd |
489 | + * |
490 | + * This program is free software; you can redistribute it and/or |
491 | + * modify it under the terms of the GNU General Public License |
492 | + * as published by the Free Software Foundation; either version 2 |
493 | + * of the License, or (at your option) any later version. |
494 | + * |
495 | + * This program is distributed in the hope that it will be useful, |
496 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
497 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
498 | + * GNU General Public License for more details. |
499 | + * |
500 | + * You should have received a copy of the GNU General Public License |
501 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
502 | + * |
503 | + * Authored by Michal Hruby <michal.hruby@canonical.com> |
504 | + * |
505 | + */ |
506 | + |
507 | +#include "fts.h" |
508 | +#include "indexer.h" |
509 | +#include "controller.h" |
510 | + |
511 | +ZeitgeistIndexer* |
512 | +zeitgeist_indexer_new (ZeitgeistDbReader *reader, GError **error) |
513 | +{ |
514 | + ZeitgeistFTS::Controller *ctrl; |
515 | + GError *local_error; |
516 | + |
517 | + g_return_val_if_fail (ZEITGEIST_IS_DB_READER (reader), NULL); |
518 | + g_return_val_if_fail (error == NULL || *error == NULL, NULL); |
519 | + |
520 | + g_setenv ("XAPIAN_CJK_NGRAM", "1", TRUE); |
521 | + ctrl = new ZeitgeistFTS::Controller (reader); |
522 | + |
523 | + local_error = NULL; |
524 | + ctrl->Initialize (&local_error); |
525 | + if (local_error) |
526 | + { |
527 | + delete ctrl; |
528 | + g_propagate_error (error, local_error); |
529 | + return NULL; |
530 | + } |
531 | + |
532 | + |
533 | + ctrl->Run (); |
534 | + |
535 | + return (ZeitgeistIndexer*) ctrl; |
536 | +} |
537 | + |
538 | +void |
539 | +zeitgeist_indexer_free (ZeitgeistIndexer* indexer) |
540 | +{ |
541 | + g_return_if_fail (indexer != NULL); |
542 | + |
543 | + delete (ZeitgeistFTS::Controller*) indexer; |
544 | +} |
545 | + |
546 | +GPtrArray* zeitgeist_indexer_search (ZeitgeistIndexer *indexer, |
547 | + const gchar *search_string, |
548 | + ZeitgeistTimeRange *time_range, |
549 | + GPtrArray *templates, |
550 | + guint offset, |
551 | + guint count, |
552 | + ZeitgeistResultType result_type, |
553 | + guint *matches, |
554 | + GError **error) |
555 | +{ |
556 | + GPtrArray *results; |
557 | + ZeitgeistFTS::Controller *_indexer; |
558 | + |
559 | + g_return_val_if_fail (indexer != NULL, NULL); |
560 | + g_return_val_if_fail (search_string != NULL, NULL); |
561 | + g_return_val_if_fail (ZEITGEIST_IS_TIME_RANGE (time_range), NULL); |
562 | + g_return_val_if_fail (error == NULL || *error == NULL, NULL); |
563 | + |
564 | + _indexer = (ZeitgeistFTS::Controller*) indexer; |
565 | + |
566 | + results = _indexer->indexer->Search (search_string, time_range, |
567 | + templates, offset, count, result_type, |
568 | + matches, error); |
569 | + |
570 | + return results; |
571 | +} |
572 | + |
573 | +void zeitgeist_indexer_index_events (ZeitgeistIndexer *indexer, |
574 | + GPtrArray *events) |
575 | +{ |
576 | + ZeitgeistFTS::Controller *_indexer; |
577 | + |
578 | + g_return_if_fail (indexer != NULL); |
579 | + g_return_if_fail (events != NULL); |
580 | + |
581 | + _indexer = (ZeitgeistFTS::Controller*) indexer; |
582 | + |
583 | + _indexer->IndexEvents (events); |
584 | +} |
585 | + |
586 | +void zeitgeist_indexer_delete_events (ZeitgeistIndexer *indexer, |
587 | + guint *event_ids, |
588 | + int event_ids_size) |
589 | +{ |
590 | + ZeitgeistFTS::Controller *_indexer; |
591 | + |
592 | + g_return_if_fail (indexer != NULL); |
593 | + |
594 | + if (event_ids_size <= 0) return; |
595 | + |
596 | + _indexer = (ZeitgeistFTS::Controller*) indexer; |
597 | + |
598 | + _indexer->DeleteEvents (event_ids, event_ids_size); |
599 | +} |
600 | + |
601 | +gboolean zeitgeist_indexer_has_pending_tasks (ZeitgeistIndexer *indexer) |
602 | +{ |
603 | + ZeitgeistFTS::Controller *_indexer; |
604 | + |
605 | + g_return_val_if_fail (indexer != NULL, FALSE); |
606 | + |
607 | + _indexer = (ZeitgeistFTS::Controller*) indexer; |
608 | + |
609 | + return _indexer->HasPendingTasks () ? TRUE : FALSE; |
610 | +} |
611 | + |
612 | +void zeitgeist_indexer_process_task (ZeitgeistIndexer *indexer) |
613 | +{ |
614 | + ZeitgeistFTS::Controller *_indexer; |
615 | + |
616 | + g_return_if_fail (indexer != NULL); |
617 | + |
618 | + _indexer = (ZeitgeistFTS::Controller*) indexer; |
619 | + |
620 | + _indexer->ProcessTask (); |
621 | +} |
622 | + |
623 | |
624 | === added file 'extensions/fts++/fts.h' |
625 | --- extensions/fts++/fts.h 1970-01-01 00:00:00 +0000 |
626 | +++ extensions/fts++/fts.h 2012-02-09 22:47:22 +0000 |
627 | @@ -0,0 +1,59 @@ |
628 | +/* |
629 | + * Copyright (C) 2012 Canonical Ltd |
630 | + * |
631 | + * This program is free software; you can redistribute it and/or |
632 | + * modify it under the terms of the GNU General Public License |
633 | + * as published by the Free Software Foundation; either version 2 |
634 | + * of the License, or (at your option) any later version. |
635 | + * |
636 | + * This program is distributed in the hope that it will be useful, |
637 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
638 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
639 | + * GNU General Public License for more details. |
640 | + * |
641 | + * You should have received a copy of the GNU General Public License |
642 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
643 | + * |
644 | + * Authored by Michal Hruby <michal.hruby@canonical.com> |
645 | + * |
646 | + */ |
647 | + |
648 | +#ifndef _ZGFTS_H_ |
649 | +#define _ZGFTS_H_ |
650 | + |
651 | +#include <glib.h> |
652 | +#include "zeitgeist-internal.h" |
653 | + |
654 | +typedef struct _ZeitgeistIndexer ZeitgeistIndexer; |
655 | + |
656 | +G_BEGIN_DECLS |
657 | + |
658 | +ZeitgeistIndexer* zeitgeist_indexer_new (ZeitgeistDbReader* reader, |
659 | + GError **error); |
660 | + |
661 | +void zeitgeist_indexer_free (ZeitgeistIndexer* indexer); |
662 | + |
663 | +GPtrArray* zeitgeist_indexer_search (ZeitgeistIndexer *indexer, |
664 | + const gchar *search_string, |
665 | + ZeitgeistTimeRange *time_range, |
666 | + GPtrArray *templates, |
667 | + guint offset, |
668 | + guint count, |
669 | + ZeitgeistResultType result_type, |
670 | + guint *matches, |
671 | + GError **error); |
672 | + |
673 | +void zeitgeist_indexer_index_events (ZeitgeistIndexer *indexer, |
674 | + GPtrArray *events); |
675 | + |
676 | +void zeitgeist_indexer_delete_events (ZeitgeistIndexer *indexer, |
677 | + guint *event_ids, |
678 | + int event_ids_size); |
679 | + |
680 | +gboolean zeitgeist_indexer_has_pending_tasks (ZeitgeistIndexer *indexer); |
681 | + |
682 | +void zeitgeist_indexer_process_task (ZeitgeistIndexer *indexer); |
683 | + |
684 | +G_END_DECLS |
685 | + |
686 | +#endif /* _ZGFTS_H_ */ |
687 | |
688 | === added file 'extensions/fts++/fts.vapi' |
689 | --- extensions/fts++/fts.vapi 1970-01-01 00:00:00 +0000 |
690 | +++ extensions/fts++/fts.vapi 2012-02-09 22:47:22 +0000 |
691 | @@ -0,0 +1,25 @@ |
692 | +/* indexer.vapi is hand-written - not a big deal for these ~10 lines */ |
693 | + |
694 | +namespace Zeitgeist { |
695 | + [Compact] |
696 | + [CCode (free_function = "zeitgeist_indexer_free", cheader_filename = "fts.h")] |
697 | + public class Indexer { |
698 | + public Indexer (DbReader reader) throws EngineError; |
699 | + |
700 | + public GLib.GenericArray<Event> search (string search_string, |
701 | + TimeRange time_range, |
702 | + GLib.GenericArray<Event> templates, |
703 | + uint offset, |
704 | + uint count, |
705 | + ResultType result_type, |
706 | + out uint matches) throws GLib.Error; |
707 | + |
708 | + public void index_events (GLib.GenericArray<Event> events); |
709 | + |
710 | + public void delete_events (uint[] event_ids); |
711 | + |
712 | + public bool has_pending_tasks (); |
713 | + |
714 | + public void process_task (); |
715 | + } |
716 | +} |
717 | |
718 | === added file 'extensions/fts++/indexer.cpp' |
719 | --- extensions/fts++/indexer.cpp 1970-01-01 00:00:00 +0000 |
720 | +++ extensions/fts++/indexer.cpp 2012-02-09 22:47:22 +0000 |
721 | @@ -0,0 +1,897 @@ |
722 | +/* |
723 | + * Copyright (C) 2012 Canonical Ltd |
724 | + * 2012 Mikkel Kamstrup Erlandsen |
725 | + * |
726 | + * This program is free software; you can redistribute it and/or |
727 | + * modify it under the terms of the GNU General Public License |
728 | + * as published by the Free Software Foundation; either version 2 |
729 | + * of the License, or (at your option) any later version. |
730 | + * |
731 | + * This program is distributed in the hope that it will be useful, |
732 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
733 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
734 | + * GNU General Public License for more details. |
735 | + * |
736 | + * You should have received a copy of the GNU General Public License |
737 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
738 | + * |
739 | + * Authored by Michal Hruby <michal.hruby@canonical.com> |
740 | + * Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
741 | + * |
742 | + */ |
743 | + |
744 | +#include "indexer.h" |
745 | +#include "stringutils.h" |
746 | +#include <xapian.h> |
747 | +#include <queue> |
748 | +#include <vector> |
749 | + |
750 | +#include <gio/gio.h> |
751 | +#include <gio/gdesktopappinfo.h> |
752 | + |
753 | +namespace ZeitgeistFTS { |
754 | + |
755 | +const std::string FILTER_PREFIX_EVENT_INTERPRETATION = "ZGEI"; |
756 | +const std::string FILTER_PREFIX_EVENT_MANIFESTATION = "ZGEM"; |
757 | +const std::string FILTER_PREFIX_ACTOR = "ZGA"; |
758 | +const std::string FILTER_PREFIX_SUBJECT_URI = "ZGSU"; |
759 | +const std::string FILTER_PREFIX_SUBJECT_INTERPRETATION = "ZGSI"; |
760 | +const std::string FILTER_PREFIX_SUBJECT_MANIFESTATION = "ZGSM"; |
761 | +const std::string FILTER_PREFIX_SUBJECT_ORIGIN = "ZGSO"; |
762 | +const std::string FILTER_PREFIX_SUBJECT_MIMETYPE = "ZGST"; |
763 | +const std::string FILTER_PREFIX_SUBJECT_STORAGE = "ZGSS"; |
764 | +const std::string FILTER_PREFIX_XDG_CATEGORY = "AC"; |
765 | + |
766 | +const Xapian::valueno VALUE_EVENT_ID = 0; |
767 | +const Xapian::valueno VALUE_TIMESTAMP = 1; |
768 | + |
769 | +#define QUERY_PARSER_FLAGS \ |
770 | + Xapian::QueryParser::FLAG_PHRASE | Xapian::QueryParser::FLAG_BOOLEAN | \ |
771 | + Xapian::QueryParser::FLAG_PURE_NOT | Xapian::QueryParser::FLAG_LOVEHATE | \ |
772 | + Xapian::QueryParser::FLAG_WILDCARD |
773 | + |
774 | +const std::string FTS_MAIN_DIR = "ftspp.index"; |
775 | + |
776 | +void Indexer::Initialize (GError **error) |
777 | +{ |
778 | + try |
779 | + { |
780 | + if (zeitgeist_utils_using_in_memory_database ()) |
781 | + { |
782 | + this->db = new Xapian::WritableDatabase; |
783 | + this->db->add_database (Xapian::InMemory::open ()); |
784 | + } |
785 | + else |
786 | + { |
787 | + gchar *path = g_build_filename (zeitgeist_utils_get_data_path (), |
788 | + FTS_MAIN_DIR.c_str (), NULL); |
789 | + this->db = new Xapian::WritableDatabase (path, |
790 | + Xapian::DB_CREATE_OR_OPEN); |
791 | + g_free (path); |
792 | + } |
793 | + |
794 | + this->tokenizer = new Xapian::TermGenerator (); |
795 | + this->query_parser = new Xapian::QueryParser (); |
796 | + this->query_parser->add_prefix ("name", "N"); |
797 | + this->query_parser->add_prefix ("title", "N"); |
798 | + this->query_parser->add_prefix ("site", "S"); |
799 | + this->query_parser->add_prefix ("app", "A"); |
800 | + this->query_parser->add_boolean_prefix ("zgei", |
801 | + FILTER_PREFIX_EVENT_INTERPRETATION); |
802 | + this->query_parser->add_boolean_prefix ("zgem", |
803 | + FILTER_PREFIX_EVENT_MANIFESTATION); |
804 | + this->query_parser->add_boolean_prefix ("zga", FILTER_PREFIX_ACTOR); |
805 | + this->query_parser->add_prefix ("zgsu", FILTER_PREFIX_SUBJECT_URI); |
806 | + this->query_parser->add_boolean_prefix ("zgsi", |
807 | + FILTER_PREFIX_SUBJECT_INTERPRETATION); |
808 | + this->query_parser->add_boolean_prefix ("zgsm", |
809 | + FILTER_PREFIX_SUBJECT_MANIFESTATION); |
810 | + this->query_parser->add_prefix ("zgso", FILTER_PREFIX_SUBJECT_ORIGIN); |
811 | + this->query_parser->add_boolean_prefix ("zgst", |
812 | + FILTER_PREFIX_SUBJECT_MIMETYPE); |
813 | + this->query_parser->add_boolean_prefix ("zgss", |
814 | + FILTER_PREFIX_SUBJECT_STORAGE); |
815 | + this->query_parser->add_prefix ("category", FILTER_PREFIX_XDG_CATEGORY); |
816 | + |
817 | + this->query_parser->add_valuerangeprocessor ( |
818 | + new Xapian::NumberValueRangeProcessor (VALUE_EVENT_ID, "id")); |
819 | + this->query_parser->add_valuerangeprocessor ( |
820 | + new Xapian::NumberValueRangeProcessor (VALUE_TIMESTAMP, "ms", false)); |
821 | + |
822 | + this->query_parser->set_default_op (Xapian::Query::OP_AND); |
823 | + this->query_parser->set_database (*this->db); |
824 | + |
825 | + this->enquire = new Xapian::Enquire (*this->db); |
826 | + |
827 | + } |
828 | + catch (const Xapian::Error &xp_error) |
829 | + { |
830 | + g_set_error_literal (error, |
831 | + ZEITGEIST_ENGINE_ERROR, |
832 | + ZEITGEIST_ENGINE_ERROR_DATABASE_ERROR, |
833 | + xp_error.get_msg ().c_str ()); |
834 | + this->db = NULL; |
835 | + } |
836 | +} |
837 | + |
838 | +/** |
839 | + * Returns true if and only if the index is good. |
840 | + * Otherwise the index should be rebuild. |
841 | + */ |
842 | +bool Indexer::CheckIndex () |
843 | +{ |
844 | + std::string db_version (db->get_metadata ("fts_index_version")); |
845 | + if (db_version != INDEX_VERSION) |
846 | + { |
847 | + g_message ("Index must be upgraded. Doing full rebuild"); |
848 | + return false; |
849 | + } |
850 | + else if (db->get_doccount () == 0) |
851 | + { |
852 | + g_message ("Empty index detected. Doing full rebuild"); |
853 | + return false; |
854 | + } |
855 | + |
856 | + return true; |
857 | +} |
858 | + |
859 | +/** |
860 | + * Clear the index and create a new empty one |
861 | + */ |
862 | +void Indexer::DropIndex () |
863 | +{ |
864 | + try |
865 | + { |
866 | + if (this->db != NULL) |
867 | + { |
868 | + this->db->close (); |
869 | + delete this->db; |
870 | + this->db = NULL; |
871 | + } |
872 | + |
873 | + if (this->enquire != NULL) |
874 | + { |
875 | + delete this->enquire; |
876 | + this->enquire = NULL; |
877 | + } |
878 | + |
879 | + if (zeitgeist_utils_using_in_memory_database ()) |
880 | + { |
881 | + this->db = new Xapian::WritableDatabase; |
882 | + this->db->add_database (Xapian::InMemory::open ()); |
883 | + } |
884 | + else |
885 | + { |
886 | + gchar *path = g_build_filename (zeitgeist_utils_get_data_path (), |
887 | + FTS_MAIN_DIR.c_str (), NULL); |
888 | + this->db = new Xapian::WritableDatabase (path, |
889 | + Xapian::DB_CREATE_OR_OVERWRITE); |
890 | + // FIXME: leaks on error |
891 | + g_free (path); |
892 | + } |
893 | + |
894 | + this->query_parser->set_database (*this->db); |
895 | + this->enquire = new Xapian::Enquire (*this->db); |
896 | + } |
897 | + catch (const Xapian::Error &xp_error) |
898 | + { |
899 | + g_error ("Error ocurred during database reindex: %s", |
900 | + xp_error.get_msg ().c_str ()); |
901 | + } |
902 | +} |
903 | + |
904 | +void Indexer::Commit () |
905 | +{ |
906 | + try |
907 | + { |
908 | + db->commit (); |
909 | + } |
910 | + catch (Xapian::Error const& e) |
911 | + { |
912 | + g_warning ("Failed to commit changes: %s", e.get_msg ().c_str ()); |
913 | + } |
914 | +} |
915 | + |
916 | +std::string Indexer::ExpandType (std::string const& prefix, |
917 | + const gchar* unparsed_uri) |
918 | +{ |
919 | + gchar* uri = g_strdup (unparsed_uri); |
920 | + gboolean is_negation = zeitgeist_utils_parse_negation (&uri); |
921 | + gboolean noexpand = zeitgeist_utils_parse_noexpand (&uri); |
922 | + |
923 | + std::string result; |
924 | + GList *symbols = NULL; |
925 | + symbols = g_list_append (symbols, uri); |
926 | + if (!noexpand) |
927 | + { |
928 | + GList *children = zeitgeist_symbol_get_all_children (uri); |
929 | + symbols = g_list_concat (symbols, children); |
930 | + } |
931 | + |
932 | + for (GList *iter = symbols; iter != NULL; iter = iter->next) |
933 | + { |
934 | + result += prefix + std::string((gchar*) iter->data); |
935 | + if (iter->next != NULL) result += " OR "; |
936 | + } |
937 | + |
938 | + g_list_free (symbols); |
939 | + g_free (uri); |
940 | + |
941 | + if (is_negation) result = "NOT (" + result + ")"; |
942 | + |
943 | + return result; |
944 | +} |
945 | + |
946 | +std::string Indexer::CompileEventFilterQuery (GPtrArray *templates) |
947 | +{ |
948 | + std::vector<std::string> query; |
949 | + |
950 | + for (unsigned i = 0; i < templates->len; i++) |
951 | + { |
952 | + const gchar* val; |
953 | + std::vector<std::string> tmpl; |
954 | + ZeitgeistEvent *event = (ZeitgeistEvent*) g_ptr_array_index (templates, i); |
955 | + |
956 | + val = zeitgeist_event_get_interpretation (event); |
957 | + if (val && val[0] != '\0') |
958 | + tmpl.push_back (ExpandType ("zgei:", val)); |
959 | + |
960 | + val = zeitgeist_event_get_manifestation (event); |
961 | + if (val && val[0] != '\0') |
962 | + tmpl.push_back (ExpandType ("zgem:", val)); |
963 | + |
964 | + val = zeitgeist_event_get_actor (event); |
965 | + if (val && val[0] != '\0') |
966 | + tmpl.push_back ("zga:" + StringUtils::MangleUri (val)); |
967 | + |
968 | + GPtrArray *subjects = zeitgeist_event_get_subjects (event); |
969 | + for (unsigned j = 0; j < subjects->len; j++) |
970 | + { |
971 | + ZeitgeistSubject *subject = (ZeitgeistSubject*) g_ptr_array_index (subjects, j); |
972 | + val = zeitgeist_subject_get_uri (subject); |
973 | + if (val && val[0] != '\0') |
974 | + tmpl.push_back ("zgsu:" + StringUtils::MangleUri (val)); |
975 | + |
976 | + val = zeitgeist_subject_get_interpretation (subject); |
977 | + if (val && val[0] != '\0') |
978 | + tmpl.push_back (ExpandType ("zgsi:", val)); |
979 | + |
980 | + val = zeitgeist_subject_get_manifestation (subject); |
981 | + if (val && val[0] != '\0') |
982 | + tmpl.push_back (ExpandType ("zgsm:", val)); |
983 | + |
984 | + val = zeitgeist_subject_get_origin (subject); |
985 | + if (val && val[0] != '\0') |
986 | + tmpl.push_back ("zgso:" + StringUtils::MangleUri (val)); |
987 | + |
988 | + val = zeitgeist_subject_get_mimetype (subject); |
989 | + if (val && val[0] != '\0') |
990 | + tmpl.push_back (std::string ("zgst:") + val); |
991 | + |
992 | + val = zeitgeist_subject_get_storage (subject); |
993 | + if (val && val[0] != '\0') |
994 | + tmpl.push_back (std::string ("zgss:") + val); |
995 | + } |
996 | + |
997 | + if (tmpl.size () == 0) continue; |
998 | + |
999 | + std::string event_query ("("); |
1000 | + for (int i = 0; i < tmpl.size (); i++) |
1001 | + { |
1002 | + event_query += tmpl[i]; |
1003 | + if (i < tmpl.size () - 1) event_query += ") AND ("; |
1004 | + } |
1005 | + query.push_back (event_query + ")"); |
1006 | + } |
1007 | + |
1008 | + if (query.size () == 0) return std::string (""); |
1009 | + |
1010 | + std::string result; |
1011 | + for (int i = 0; i < query.size (); i++) |
1012 | + { |
1013 | + result += query[i]; |
1014 | + if (i < query.size () - 1) result += " OR "; |
1015 | + } |
1016 | + return result; |
1017 | +} |
1018 | + |
1019 | +std::string Indexer::CompileTimeRangeFilterQuery (gint64 start, gint64 end) |
1020 | +{ |
1021 | + // let's use gprinting to be safe |
1022 | + gchar *q = g_strdup_printf ("%" G_GINT64_FORMAT "..%" G_GINT64_FORMAT "ms", |
1023 | + start, end); |
1024 | + std::string query (q); |
1025 | + g_free (q); |
1026 | + |
1027 | + return query; |
1028 | +} |
1029 | + |
1030 | +/** |
1031 | + * Adds the filtering rules to the doc. Filtering rules will |
1032 | + * not affect the relevancy ranking of the event/doc |
1033 | + */ |
1034 | +void Indexer::AddDocFilters (ZeitgeistEvent *event, Xapian::Document &doc) |
1035 | +{ |
1036 | + const gchar* val; |
1037 | + |
1038 | + val = zeitgeist_event_get_interpretation (event); |
1039 | + if (val && val[0] != '\0') |
1040 | + doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_EVENT_INTERPRETATION + val)); |
1041 | + |
1042 | + val = zeitgeist_event_get_manifestation (event); |
1043 | + if (val && val[0] != '\0') |
1044 | + doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_EVENT_MANIFESTATION + val)); |
1045 | + |
1046 | + val = zeitgeist_event_get_actor (event); |
1047 | + if (val && val[0] != '\0') |
1048 | + doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_ACTOR + StringUtils::MangleUri (val))); |
1049 | + |
1050 | + GPtrArray *subjects = zeitgeist_event_get_subjects (event); |
1051 | + for (unsigned j = 0; j < subjects->len; j++) |
1052 | + { |
1053 | + ZeitgeistSubject *subject = (ZeitgeistSubject*) g_ptr_array_index (subjects, j); |
1054 | + val = zeitgeist_subject_get_uri (subject); |
1055 | + if (val && val[0] != '\0') |
1056 | + doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_URI + StringUtils::MangleUri (val))); |
1057 | + |
1058 | + val = zeitgeist_subject_get_interpretation (subject); |
1059 | + if (val && val[0] != '\0') |
1060 | + doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_INTERPRETATION + val)); |
1061 | + |
1062 | + val = zeitgeist_subject_get_manifestation (subject); |
1063 | + if (val && val[0] != '\0') |
1064 | + doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_MANIFESTATION + val)); |
1065 | + |
1066 | + val = zeitgeist_subject_get_origin (subject); |
1067 | + if (val && val[0] != '\0') |
1068 | + doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_ORIGIN + StringUtils::MangleUri (val))); |
1069 | + |
1070 | + val = zeitgeist_subject_get_mimetype (subject); |
1071 | + if (val && val[0] != '\0') |
1072 | + doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_MIMETYPE + val)); |
1073 | + |
1074 | + val = zeitgeist_subject_get_storage (subject); |
1075 | + if (val && val[0] != '\0') |
1076 | + doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_STORAGE + val)); |
1077 | + } |
1078 | +} |
1079 | + |
1080 | +void Indexer::IndexText (std::string const& text) |
1081 | +{ |
1082 | + // FIXME: ascii folding! |
1083 | + tokenizer->index_text (text, 5); |
1084 | +} |
1085 | + |
1086 | +void Indexer::IndexUri (std::string const& uri, std::string const& origin) |
1087 | +{ |
1088 | + GFile *f = g_file_new_for_uri (uri.c_str ()); |
1089 | + |
1090 | + gchar *scheme = g_file_get_uri_scheme (f); |
1091 | + if (scheme == NULL) |
1092 | + { |
1093 | + g_warning ("Invalid URI: %s", uri.c_str ()); |
1094 | + return; |
1095 | + } |
1096 | + |
1097 | + std::string scheme_str(scheme); |
1098 | + g_free (scheme); |
1099 | + |
1100 | + if (scheme_str == "file") |
1101 | + { |
1102 | + // FIXME: special case some typical filenames (like photos) |
1103 | + // examples of typical filenames from cameras: |
1104 | + // P07-08-08_16.25.JPG |
1105 | + // P070608_18.54.JPG |
1106 | + // P180308_22.27[1].jpg |
1107 | + // P6220111.JPG |
1108 | + // PC220006.JPG |
1109 | + // DSCN0149.JPG |
1110 | + // DSC01166.JPG |
1111 | + // SDC12583.JPG |
1112 | + // IMGP3199.JPG |
1113 | + // IMGP1251-4.jpg |
1114 | + // IMG_101_8987.JPG |
1115 | + // 10052010152.jpg |
1116 | + // 4867_93080512835_623012835_1949065_8351752_n.jpg |
1117 | + // 2011-05-29 10.49.37.jpg |
1118 | + // V100908_11.24.AVI |
1119 | + // video-2011-05-29-15-14-58.mp4 |
1120 | + |
1121 | + // get_parse_name will convert escaped characters to UTF-8, but only for |
1122 | + // the "file" scheme, so using it elsewhere won't be of much help |
1123 | + |
1124 | + gchar *pn = g_file_get_parse_name (f); |
1125 | + gchar *basename = g_path_get_basename (pn); |
1126 | + |
1127 | + // FIXME: remove unscores, CamelCase and process digits |
1128 | + tokenizer->index_text (basename, 5); |
1129 | + tokenizer->index_text (basename, 5, "N"); |
1130 | + |
1131 | + g_free (basename); |
1132 | + // limit the directory indexing to just a few levels |
1133 | + // (the original formula was weight = 5.0 / (1.5^n) |
1134 | + unsigned path_weights[] = { 3, 2, 1, 0 }; |
1135 | + unsigned weight_index = 0; |
1136 | + |
1137 | + // this should be equal to origin, but we already got a nice utf-8 display |
1138 | + // name, so we'll use that |
1139 | + gchar *dir = g_path_get_dirname (pn); |
1140 | + std::string path_component (dir); |
1141 | + g_free (dir); |
1142 | + g_free (pn); |
1143 | + |
1144 | + while (path_component.length () > 2 && |
1145 | + weight_index < G_N_ELEMENTS (path_weights)) |
1146 | + { |
1147 | + // if this is already home directory we don't want it |
1148 | + if (path_component.length () == home_dir_path.length () && |
1149 | + path_component == home_dir_path) return; |
1150 | + |
1151 | + gchar *name = g_path_get_basename (path_component.c_str ()); |
1152 | + |
1153 | + // FIXME: un-underscore, uncamelcase, ascii fold |
1154 | + tokenizer->index_text (name, path_weights[weight_index++]); |
1155 | + |
1156 | + dir = g_path_get_dirname (path_component.c_str ()); |
1157 | + path_component = dir; |
1158 | + g_free (dir); |
1159 | + g_free (name); |
1160 | + } |
1161 | + } |
1162 | + else if (scheme_str == "mailto") |
1163 | + { |
1164 | + // mailto:username@server.com |
1165 | + size_t scheme_len = scheme_str.length () + 1; |
1166 | + size_t at_pos = uri.find ('@', scheme_len); |
1167 | + if (at_pos == std::string::npos) return; |
1168 | + |
1169 | + tokenizer->index_text (uri.substr (scheme_len, at_pos - scheme_len), 5); |
1170 | + tokenizer->index_text (uri.substr (at_pos + 1), 1); |
1171 | + } |
1172 | + else if (scheme_str.compare (0, 4, "http") == 0) |
1173 | + { |
1174 | + // http / https - we'll index just the basename of the uri (minus query |
1175 | + // part) and the hostname/domain |
1176 | + |
1177 | + // step 1) strip query part |
1178 | + gchar *basename; |
1179 | + size_t question_mark = uri.find ('?'); |
1180 | + if (question_mark != std::string::npos) |
1181 | + { |
1182 | + std::string stripped (uri, 0, question_mark - 1); |
1183 | + basename = g_path_get_basename (stripped.c_str ()); |
1184 | + } |
1185 | + else |
1186 | + { |
1187 | + basename = g_file_get_basename (f); |
1188 | + } |
1189 | + |
1190 | + // step 2) unescape and check that it's valid utf8 |
1191 | + gchar *unescaped_basename = g_uri_unescape_string (basename, ""); |
1192 | + |
1193 | + if (g_utf8_validate (unescaped_basename, -1, NULL)) |
1194 | + { |
1195 | + // FIXME: remove unscores, CamelCase and process digits |
1196 | + tokenizer->index_text (unescaped_basename, 5); |
1197 | + tokenizer->index_text (unescaped_basename, 5, "N"); |
1198 | + } |
1199 | + |
1200 | + // and also index hostname (taken from origin field if possible) |
1201 | + std::string host_str (origin.empty () ? uri : origin); |
1202 | + size_t hostname_start = host_str.find ("://"); |
1203 | + if (hostname_start != std::string::npos) |
1204 | + { |
1205 | + std::string hostname (host_str, hostname_start + 3); |
1206 | + size_t slash_pos = hostname.find ("/"); |
1207 | + if (slash_pos != std::string::npos) hostname.resize (slash_pos); |
1208 | + |
1209 | + // support IDN |
1210 | + if (g_hostname_is_ascii_encoded (hostname.c_str ())) |
1211 | + { |
1212 | + gchar *printable_hostname = g_hostname_to_unicode (hostname.c_str ()); |
1213 | + if (printable_hostname != NULL) hostname = printable_hostname; |
1214 | + g_free (printable_hostname); |
1215 | + } |
1216 | + |
1217 | + tokenizer->index_text (hostname, 2); |
1218 | + tokenizer->index_text (hostname, 2, "N"); |
1219 | + tokenizer->index_text (hostname, 2, "S"); |
1220 | + } |
1221 | + |
1222 | + g_free (unescaped_basename); |
1223 | + g_free (basename); |
1224 | + } |
1225 | + else if (scheme_str == "data") |
1226 | + { |
1227 | + // we *really* don't want to index anything with this scheme |
1228 | + } |
1229 | + else |
1230 | + { |
1231 | + std::string authority, path, query; |
1232 | + StringUtils::SplitUri (uri, authority, path, query); |
1233 | + |
1234 | + if (!path.empty ()) |
1235 | + { |
1236 | + gchar *basename = g_path_get_basename (path.c_str ()); |
1237 | + gchar *unescaped_basename = g_uri_unescape_string (basename, ""); |
1238 | + |
1239 | + if (g_utf8_validate (unescaped_basename, -1, NULL)) |
1240 | + { |
1241 | + std::string capped (StringUtils::Truncate (unescaped_basename, 30)); |
1242 | + tokenizer->index_text (capped, 5); |
1243 | + tokenizer->index_text (capped, 5, "N"); |
1244 | + } |
1245 | + |
1246 | + // FIXME: rest of the path? |
1247 | + g_free (unescaped_basename); |
1248 | + g_free (basename); |
1249 | + } |
1250 | + |
1251 | + if (!authority.empty ()) |
1252 | + { |
1253 | + std::string capped (StringUtils::Truncate (authority, 30)); |
1254 | + |
1255 | + tokenizer->index_text (capped, 2); |
1256 | + tokenizer->index_text (capped, 2, "N"); |
1257 | + tokenizer->index_text (capped, 2, "S"); |
1258 | + } |
1259 | + } |
1260 | + |
1261 | + g_object_unref (f); |
1262 | +} |
1263 | + |
1264 | +bool Indexer::IndexActor (std::string const& actor, bool is_subject) |
1265 | +{ |
1266 | + GDesktopAppInfo *dai = NULL; |
1267 | + // check the cache first |
1268 | + GAppInfo *ai = app_info_cache[actor]; |
1269 | + |
1270 | + if (ai == NULL) |
1271 | + { |
1272 | + // check also the failed cache |
1273 | + if (failed_lookups.count (actor) != 0) return false; |
1274 | + |
1275 | + // and now try to load from the disk |
1276 | + if (g_path_is_absolute (actor.c_str ())) |
1277 | + { |
1278 | + dai = g_desktop_app_info_new_from_filename (actor.c_str ()); |
1279 | + } |
1280 | + else if (g_str_has_prefix (actor.c_str (), "application://")) |
1281 | + { |
1282 | + dai = g_desktop_app_info_new (actor.substr (14).c_str ()); |
1283 | + } |
1284 | + |
1285 | + if (dai != NULL) |
1286 | + { |
1287 | + ai = G_APP_INFO (dai); |
1288 | + app_info_cache[actor] = ai; |
1289 | + } |
1290 | + else |
1291 | + { |
1292 | + // cache failed lookup |
1293 | + failed_lookups.insert (actor); |
1294 | + if (clear_failed_id == 0) |
1295 | + { |
1296 | + // but clear the failed cache in 30 seconds |
1297 | + clear_failed_id = g_timeout_add_seconds (30, |
1298 | + (GSourceFunc) &Indexer::ClearFailedLookupsCb, this); |
1299 | + } |
1300 | + } |
1301 | + } |
1302 | + else |
1303 | + { |
1304 | + dai = G_DESKTOP_APP_INFO (ai); |
1305 | + } |
1306 | + |
1307 | + if (dai == NULL) |
1308 | + { |
1309 | + g_warning ("Unable to get info on %s", actor.c_str ()); |
1310 | + return false; |
1311 | + } |
1312 | + |
1313 | + const gchar *val; |
1314 | + unsigned name_weight = is_subject ? 5 : 2; |
1315 | + unsigned comment_weight = 2; |
1316 | + |
1317 | + // FIXME: ascii folding somewhere |
1318 | + |
1319 | + val = g_app_info_get_display_name (ai); |
1320 | + if (val && val[0] != '\0') |
1321 | + { |
1322 | + std::string display_name (val); |
1323 | + tokenizer->index_text (display_name, name_weight); |
1324 | + tokenizer->index_text (display_name, name_weight, "A"); |
1325 | + } |
1326 | + |
1327 | + val = g_desktop_app_info_get_generic_name (dai); |
1328 | + if (val && val[0] != '\0') |
1329 | + { |
1330 | + std::string generic_name (val); |
1331 | + tokenizer->index_text (generic_name, name_weight); |
1332 | + tokenizer->index_text (generic_name, name_weight, "A"); |
1333 | + } |
1334 | + |
1335 | + if (!is_subject) return true; |
1336 | + // the rest of the code only applies to events with application subject uris: |
1337 | + // index the comment field, add category terms, index keywords |
1338 | + |
1339 | + val = g_app_info_get_description (ai); |
1340 | + if (val && val[0] != '\0') |
1341 | + { |
1342 | + std::string comment (val); |
1343 | + tokenizer->index_text (comment, comment_weight); |
1344 | + tokenizer->index_text (comment, comment_weight, "A"); |
1345 | + } |
1346 | + |
1347 | + val = g_desktop_app_info_get_categories (dai); |
1348 | + if (val && val[0] != '\0') |
1349 | + { |
1350 | + gchar **categories = g_strsplit (val, ";", 0); |
1351 | + Xapian::Document doc(tokenizer->get_document ()); |
1352 | + for (gchar **iter = categories; *iter != NULL; ++iter) |
1353 | + { |
1354 | + // FIXME: what if this isn't ascii? but it should, that's what |
1355 | + // the fdo menu spec says |
1356 | + gchar *category = g_ascii_strdown (*iter, -1); |
1357 | + doc.add_boolean_term (FILTER_PREFIX_XDG_CATEGORY + category); |
1358 | + g_free (category); |
1359 | + } |
1360 | + g_strfreev (categories); |
1361 | + } |
1362 | + |
1363 | + return true; |
1364 | +} |
1365 | + |
1366 | +GPtrArray* Indexer::Search (const gchar *search_string, |
1367 | + ZeitgeistTimeRange *time_range, |
1368 | + GPtrArray *templates, |
1369 | + guint offset, |
1370 | + guint count, |
1371 | + ZeitgeistResultType result_type, |
1372 | + guint *matches, |
1373 | + GError **error) |
1374 | +{ |
1375 | + GPtrArray *results = NULL; |
1376 | + try |
1377 | + { |
1378 | + std::string query_string(search_string); |
1379 | + |
1380 | + if (templates && templates->len > 0) |
1381 | + { |
1382 | + std::string filters (CompileEventFilterQuery (templates)); |
1383 | + query_string = "(" + query_string + ") AND (" + filters + ")"; |
1384 | + } |
1385 | + |
1386 | + if (time_range) |
1387 | + { |
1388 | + gint64 start_time = zeitgeist_time_range_get_start (time_range); |
1389 | + gint64 end_time = zeitgeist_time_range_get_end (time_range); |
1390 | + |
1391 | + if (start_time > 0 || end_time < G_MAXINT64) |
1392 | + { |
1393 | + std::string time_filter (CompileTimeRangeFilterQuery (start_time, end_time)); |
1394 | + query_string = "(" + query_string + ") AND (" + time_filter + ")"; |
1395 | + } |
1396 | + } |
1397 | + |
1398 | + // FIXME: which result types coalesce? |
1399 | + guint maxhits = count * 3; |
1400 | + |
1401 | + if (result_type == 100) |
1402 | + { |
1403 | + enquire->set_sort_by_relevance (); |
1404 | + } |
1405 | + else |
1406 | + { |
1407 | + enquire->set_sort_by_value (VALUE_TIMESTAMP, true); |
1408 | + } |
1409 | + |
1410 | + g_debug ("query: %s", query_string.c_str ()); |
1411 | + Xapian::Query q(query_parser->parse_query (query_string, QUERY_PARSER_FLAGS)); |
1412 | + enquire->set_query (q); |
1413 | + Xapian::MSet hits (enquire->get_mset (offset, maxhits)); |
1414 | + Xapian::doccount hitcount = hits.get_matches_estimated (); |
1415 | + |
1416 | + if (result_type == 100) |
1417 | + { |
1418 | + std::vector<unsigned> event_ids; |
1419 | + for (Xapian::MSetIterator iter = hits.begin (); iter != hits.end (); ++iter) |
1420 | + { |
1421 | + Xapian::Document doc(iter.get_document ()); |
1422 | + double unserialized = |
1423 | + Xapian::sortable_unserialise(doc.get_value (VALUE_EVENT_ID)); |
1424 | + event_ids.push_back (static_cast<unsigned>(unserialized)); |
1425 | + } |
1426 | + |
1427 | + results = zeitgeist_db_reader_get_events (zg_reader, |
1428 | + &event_ids[0], |
1429 | + event_ids.size (), |
1430 | + NULL, |
1431 | + error); |
1432 | + } |
1433 | + else |
1434 | + { |
1435 | + GPtrArray *event_templates; |
1436 | + event_templates = g_ptr_array_new_with_free_func (g_object_unref); |
1437 | + for (Xapian::MSetIterator iter = hits.begin (); iter != hits.end (); ++iter) |
1438 | + { |
1439 | + Xapian::Document doc(iter.get_document ()); |
1440 | + double unserialized = |
1441 | + Xapian::sortable_unserialise(doc.get_value (VALUE_EVENT_ID)); |
1442 | + // this doesn't need ref sinking, does it? |
1443 | + ZeitgeistEvent *event = zeitgeist_event_new (); |
1444 | + zeitgeist_event_set_id (event, static_cast<unsigned>(unserialized)); |
1445 | + g_ptr_array_add (event_templates, event); |
1446 | + } |
1447 | + |
1448 | + if (event_templates->len > 0) |
1449 | + { |
1450 | + ZeitgeistTimeRange *time_range = zeitgeist_time_range_new_anytime (); |
1451 | + results = zeitgeist_db_reader_find_events (zg_reader, |
1452 | + time_range, |
1453 | + event_templates, |
1454 | + ZEITGEIST_STORAGE_STATE_ANY, |
1455 | + 0, |
1456 | + result_type, |
1457 | + NULL, |
1458 | + error); |
1459 | + |
1460 | + g_object_unref (time_range); |
1461 | + } |
1462 | + else |
1463 | + { |
1464 | + results = g_ptr_array_new (); |
1465 | + } |
1466 | + |
1467 | + g_ptr_array_unref (event_templates); |
1468 | + } |
1469 | + |
1470 | + if (matches) |
1471 | + { |
1472 | + *matches = hitcount; |
1473 | + } |
1474 | + } |
1475 | + catch (Xapian::Error const& e) |
1476 | + { |
1477 | + g_warning ("Failed to index event: %s", e.get_msg ().c_str ()); |
1478 | + g_set_error_literal (error, |
1479 | + ZEITGEIST_ENGINE_ERROR, |
1480 | + ZEITGEIST_ENGINE_ERROR_DATABASE_ERROR, |
1481 | + e.get_msg ().c_str ()); |
1482 | + } |
1483 | + |
1484 | + return results; |
1485 | +} |
1486 | + |
1487 | +void Indexer::IndexEvent (ZeitgeistEvent *event) |
1488 | +{ |
1489 | + try |
1490 | + { |
1491 | + // FIXME: we need to special case MOVE_EVENTs |
1492 | + const gchar *val; |
1493 | + guint event_id = zeitgeist_event_get_id (event); |
1494 | + g_return_if_fail (event_id > 0); |
1495 | + |
1496 | + g_debug ("Indexing event with ID: %u", event_id); |
1497 | + |
1498 | + Xapian::Document doc; |
1499 | + doc.add_value (VALUE_EVENT_ID, |
1500 | + Xapian::sortable_serialise (static_cast<double>(event_id))); |
1501 | + doc.add_value (VALUE_TIMESTAMP, |
1502 | + Xapian::sortable_serialise (static_cast<double>(zeitgeist_event_get_timestamp (event)))); |
1503 | + |
1504 | + tokenizer->set_document (doc); |
1505 | + |
1506 | + val = zeitgeist_event_get_actor (event); |
1507 | + if (val && val[0] != '\0') |
1508 | + { |
1509 | + // it's nice that searching for "gedit" will find all files you worked |
1510 | + // with in gedit, but the relevancy has to be low |
1511 | + IndexActor (val, false); |
1512 | + } |
1513 | + |
1514 | + GPtrArray *subjects = zeitgeist_event_get_subjects (event); |
1515 | + for (unsigned i = 0; i < subjects->len; i++) |
1516 | + { |
1517 | + ZeitgeistSubject *subject; |
1518 | + subject = (ZeitgeistSubject*) g_ptr_array_index (subjects, i); |
1519 | + |
1520 | + val = zeitgeist_subject_get_uri (subject); |
1521 | + if (val == NULL || val[0] == '\0') continue; |
1522 | + |
1523 | + std::string uri(val); |
1524 | + |
1525 | + if (uri.length () > 512) |
1526 | + { |
1527 | + g_warning ("URI too long (%lu). Discarding:\n%s", |
1528 | + uri.length (), uri.substr (0, 32).c_str ()); |
1529 | + return; // ignore this event completely... |
1530 | + } |
1531 | + |
1532 | + val = zeitgeist_subject_get_text (subject); |
1533 | + if (val && val[0] != '\0') |
1534 | + { |
1535 | + IndexText (val); |
1536 | + } |
1537 | + |
1538 | + val = zeitgeist_subject_get_origin (subject); |
1539 | + std::string origin (val != NULL ? val : ""); |
1540 | + |
1541 | + if (uri.compare (0, 14, "application://") == 0) |
1542 | + { |
1543 | + if (!IndexActor (uri, true)) |
1544 | + IndexUri (uri, origin); |
1545 | + } |
1546 | + else |
1547 | + { |
1548 | + IndexUri (uri, origin); |
1549 | + } |
1550 | + } |
1551 | + |
1552 | + AddDocFilters (event, doc); |
1553 | + |
1554 | + this->db->add_document (doc); |
1555 | + } |
1556 | + catch (Xapian::Error const& e) |
1557 | + { |
1558 | + g_warning ("Failed to index event: %s", e.get_msg ().c_str ()); |
1559 | + } |
1560 | +} |
1561 | + |
1562 | +void Indexer::DeleteEvent (guint32 event_id) |
1563 | +{ |
1564 | + g_debug ("Deleting event with ID: %u", event_id); |
1565 | + |
1566 | + try |
1567 | + { |
1568 | + std::string id(Xapian::sortable_serialise (static_cast<double>(event_id))); |
1569 | + Xapian::Query query (Xapian::Query::OP_VALUE_RANGE, VALUE_EVENT_ID, id, id); |
1570 | + |
1571 | + enquire->set_query(query); |
1572 | + Xapian::MSet mset = enquire->get_mset(0, 10); |
1573 | + |
1574 | + Xapian::doccount total = mset.get_matches_estimated(); |
1575 | + if (total > 1) |
1576 | + { |
1577 | + g_warning ("More than one event found with id '%s", id.c_str ()); |
1578 | + } |
1579 | + else if (total == 0) |
1580 | + { |
1581 | + g_warning ("No event for id '%s'", id.c_str ()); |
1582 | + return; |
1583 | + } |
1584 | + |
1585 | + Xapian::MSetIterator i, end; |
1586 | + for (i= mset.begin(), end = mset.end(); i != end; i++) |
1587 | + { |
1588 | + db->delete_document (*i); |
1589 | + } |
1590 | + } |
1591 | + catch (Xapian::Error const& e) |
1592 | + { |
1593 | + g_warning ("Failed to delete event '%u': %s", |
1594 | + event_id, e.get_msg().c_str ()); |
1595 | + } |
1596 | +} |
1597 | + |
1598 | +void Indexer::SetDbMetadata (std::string const& key, std::string const& value) |
1599 | +{ |
1600 | + try |
1601 | + { |
1602 | + db->set_metadata (key, value); |
1603 | + } |
1604 | + catch (Xapian::Error const& e) |
1605 | + { |
1606 | + g_warning ("Failed to set metadata: %s", e.get_msg ().c_str ()); |
1607 | + } |
1608 | +} |
1609 | + |
1610 | +gboolean Indexer::ClearFailedLookupsCb () |
1611 | +{ |
1612 | + failed_lookups.clear (); |
1613 | + |
1614 | + clear_failed_id = 0; |
1615 | + return FALSE; |
1616 | +} |
1617 | + |
1618 | +} /* namespace */ |
1619 | |
1620 | === added file 'extensions/fts++/indexer.h' |
1621 | --- extensions/fts++/indexer.h 1970-01-01 00:00:00 +0000 |
1622 | +++ extensions/fts++/indexer.h 2012-02-09 22:47:22 +0000 |
1623 | @@ -0,0 +1,115 @@ |
1624 | +/* |
1625 | + * Copyright (C) 2012 Canonical Ltd |
1626 | + * |
1627 | + * This program is free software; you can redistribute it and/or |
1628 | + * modify it under the terms of the GNU General Public License |
1629 | + * as published by the Free Software Foundation; either version 2 |
1630 | + * of the License, or (at your option) any later version. |
1631 | + * |
1632 | + * This program is distributed in the hope that it will be useful, |
1633 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
1634 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
1635 | + * GNU General Public License for more details. |
1636 | + * |
1637 | + * You should have received a copy of the GNU General Public License |
1638 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
1639 | + * |
1640 | + * Authored by Michal Hruby <michal.hruby@canonical.com> |
1641 | + * |
1642 | + */ |
1643 | + |
1644 | +#ifndef _ZGFTS_INDEXER_H_ |
1645 | +#define _ZGFTS_INDEXER_H_ |
1646 | + |
1647 | +#include <glib-object.h> |
1648 | +#include <gio/gio.h> |
1649 | +#include <xapian.h> |
1650 | + |
1651 | +#include "zeitgeist-internal.h" |
1652 | + |
1653 | +namespace ZeitgeistFTS { |
1654 | + |
1655 | +const std::string INDEX_VERSION = "1"; |
1656 | + |
1657 | +class Indexer |
1658 | +{ |
1659 | +public: |
1660 | + typedef std::map<std::string, GAppInfo*> AppInfoMap; |
1661 | + typedef std::set<std::string> ApplicationSet; |
1662 | + |
1663 | + Indexer (ZeitgeistDbReader *reader) |
1664 | + : zg_reader (reader) |
1665 | + , db (NULL) |
1666 | + , query_parser (NULL) |
1667 | + , enquire (NULL) |
1668 | + , tokenizer (NULL) |
1669 | + , clear_failed_id (0) |
1670 | + { |
1671 | + const gchar *home_dir = g_get_home_dir (); |
1672 | + home_dir_path = home_dir != NULL ? home_dir : "/home"; |
1673 | + } |
1674 | + |
1675 | + ~Indexer () |
1676 | + { |
1677 | + if (tokenizer) delete tokenizer; |
1678 | + if (enquire) delete enquire; |
1679 | + if (query_parser) delete query_parser; |
1680 | + if (db) delete db; |
1681 | + |
1682 | + for (AppInfoMap::iterator it = app_info_cache.begin (); |
1683 | + it != app_info_cache.end (); ++it) |
1684 | + { |
1685 | + g_object_unref (it->second); |
1686 | + } |
1687 | + |
1688 | + if (clear_failed_id != 0) |
1689 | + { |
1690 | + g_source_remove (clear_failed_id); |
1691 | + } |
1692 | + } |
1693 | + |
1694 | + void Initialize (GError **error); |
1695 | + bool CheckIndex (); |
1696 | + void DropIndex (); |
1697 | + void Commit (); |
1698 | + |
1699 | + void IndexEvent (ZeitgeistEvent *event); |
1700 | + void DeleteEvent (guint32 event_id); |
1701 | + void SetDbMetadata (std::string const& key, std::string const& value); |
1702 | + |
1703 | + GPtrArray* Search (const gchar *search_string, |
1704 | + ZeitgeistTimeRange *time_range, |
1705 | + GPtrArray *templates, |
1706 | + guint offset, |
1707 | + guint count, |
1708 | + ZeitgeistResultType result_type, |
1709 | + guint *matches, |
1710 | + GError **error); |
1711 | + |
1712 | +private: |
1713 | + std::string ExpandType (std::string const& prefix, const gchar* unparsed_uri); |
1714 | + std::string CompileEventFilterQuery (GPtrArray *templates); |
1715 | + std::string CompileTimeRangeFilterQuery (gint64 start, gint64 end); |
1716 | + |
1717 | + void AddDocFilters (ZeitgeistEvent *event, Xapian::Document &doc); |
1718 | + void IndexText (std::string const& text); |
1719 | + void IndexUri (std::string const& uri, std::string const& origin); |
1720 | + bool IndexActor (std::string const& actor, bool is_subject); |
1721 | + |
1722 | + gboolean ClearFailedLookupsCb (); |
1723 | + |
1724 | + ZeitgeistDbReader *zg_reader; |
1725 | + Xapian::WritableDatabase *db; |
1726 | + Xapian::QueryParser *query_parser; |
1727 | + Xapian::Enquire *enquire; |
1728 | + Xapian::TermGenerator *tokenizer; |
1729 | + AppInfoMap app_info_cache; |
1730 | + ApplicationSet failed_lookups; |
1731 | + |
1732 | + guint clear_failed_id; |
1733 | + std::string home_dir_path; |
1734 | +}; |
1735 | + |
1736 | +} |
1737 | + |
1738 | +#endif /* _ZGFTS_INDEXER_H_ */ |
1739 | |
1740 | === added symlink 'extensions/fts++/mimetype.vala' |
1741 | === target is u'../../src/mimetype.vala' |
1742 | === added symlink 'extensions/fts++/ontology-uris.vala' |
1743 | === target is u'../../src/ontology-uris.vala' |
1744 | === added symlink 'extensions/fts++/ontology.vala' |
1745 | === target is u'../../src/ontology.vala' |
1746 | === added file 'extensions/fts++/org.gnome.zeitgeist.fts.service.in' |
1747 | --- extensions/fts++/org.gnome.zeitgeist.fts.service.in 1970-01-01 00:00:00 +0000 |
1748 | +++ extensions/fts++/org.gnome.zeitgeist.fts.service.in 2012-02-09 22:47:22 +0000 |
1749 | @@ -0,0 +1,3 @@ |
1750 | +[D-BUS Service] |
1751 | +Name=org.gnome.zeitgeist.SimpleIndexer |
1752 | +Exec=@libexecdir@/zeitgeist-fts |
1753 | |
1754 | === added symlink 'extensions/fts++/remote.vala' |
1755 | === target is u'../../src/remote.vala' |
1756 | === added symlink 'extensions/fts++/sql-schema.vala' |
1757 | === target is u'../../src/sql-schema.vala' |
1758 | === added symlink 'extensions/fts++/sql.vala' |
1759 | === target is u'../../src/sql.vala' |
1760 | === added file 'extensions/fts++/stringutils.cpp' |
1761 | --- extensions/fts++/stringutils.cpp 1970-01-01 00:00:00 +0000 |
1762 | +++ extensions/fts++/stringutils.cpp 2012-02-09 22:47:22 +0000 |
1763 | @@ -0,0 +1,128 @@ |
1764 | +/* |
1765 | + * Copyright (C) 2012 Mikkel Kamstrup Erlandsen |
1766 | + * |
1767 | + * This program is free software; you can redistribute it and/or |
1768 | + * modify it under the terms of the GNU General Public License |
1769 | + * as published by the Free Software Foundation; either version 2 |
1770 | + * of the License, or (at your option) any later version. |
1771 | + * |
1772 | + * This program is distributed in the hope that it will be useful, |
1773 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
1774 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
1775 | + * GNU General Public License for more details. |
1776 | + * |
1777 | + * You should have received a copy of the GNU General Public License |
1778 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
1779 | + * |
1780 | + * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
1781 | + * |
1782 | + */ |
1783 | +#include <string> |
1784 | + |
1785 | +#include "stringutils.h" |
1786 | + |
1787 | +using namespace std; |
1788 | + |
1789 | +namespace ZeitgeistFTS { |
1790 | + |
1791 | +namespace StringUtils { |
1792 | + |
1793 | +/** |
1794 | + * Make sure s has equal or less than 'nbytes' bytes making sure the returned |
1795 | + * string is still valid UTF-8. |
1796 | + * |
1797 | + * NOTE: It is assumed the input string is valid UTF-8. Untrusted text |
1798 | + * should be validated with g_utf8_validate(). |
1799 | + * |
1800 | + * This function useful for working with Xapian terms because Xapian has |
1801 | + * a max term length of 245 (which is not very well documented, but see |
1802 | + * http://xapian.org/docs/omega/termprefixes.html). |
1803 | + */ |
1804 | +string Truncate (string const& s, unsigned int nbytes) |
1805 | +{ |
1806 | + const gchar *str = s.c_str(); |
1807 | + const gchar *iter = str; |
1808 | + |
1809 | + nbytes = MIN(nbytes, s.length()); |
1810 | + |
1811 | + while (iter - str < nbytes) |
1812 | + { |
1813 | + const gchar *tmp = g_utf8_next_char (iter); |
1814 | + if (tmp - str > nbytes) break; |
1815 | + iter = tmp; |
1816 | + } |
1817 | + |
1818 | + |
1819 | + return s.substr(0, iter - str); |
1820 | +} |
1821 | + |
1822 | +/** |
1823 | + * Converts a URI into an index- and query friendly string. The problem |
1824 | + * is that Xapian doesn't handle CAPITAL letters or most non-alphanumeric |
1825 | + * symbols in a boolean term when it does prefix matching. The mangled |
1826 | + * URIs returned from this function are suitable for boolean prefix searches. |
1827 | + * |
1828 | + * IMPORTANT: This is a 1-way function! You can not convert back. |
1829 | + */ |
1830 | +string MangleUri (string const& orig) |
1831 | +{ |
1832 | + string s(orig); |
1833 | + size_t pos = 0; |
1834 | + while ((pos = s.find_first_of (": /", pos)) != string::npos) |
1835 | + { |
1836 | + s.replace (pos, 1, 1, '_'); |
1837 | + pos++; |
1838 | + } |
1839 | + |
1840 | + return s; |
1841 | +} |
1842 | + |
1843 | +/** |
1844 | + * This method expects a valid uri and tries to split it into authority, |
1845 | + * path and query. |
1846 | + * |
1847 | + * Note that any and all parts may be left untouched. |
1848 | + */ |
1849 | +void SplitUri (string const& uri, string &authority, |
1850 | + string &path, string &query) |
1851 | +{ |
1852 | + size_t colon_pos = uri.find (':'); |
1853 | + if (colon_pos == string::npos) return; // not an uri? |
1854 | + bool has_double_slash = uri.length () > colon_pos + 2 && |
1855 | + uri.compare (colon_pos + 1, 2, "//") == 0; |
1856 | + |
1857 | + size_t start_pos = has_double_slash ? colon_pos + 3 : colon_pos + 1; |
1858 | + |
1859 | + size_t first_slash = uri.find ('/', start_pos); |
1860 | + size_t question_mark_pos = uri.find ('?', first_slash == string::npos ? |
1861 | + start_pos : first_slash + 1); |
1862 | + |
1863 | + authority = uri.substr (start_pos); |
1864 | + if (first_slash != string::npos) |
1865 | + { |
1866 | + authority.resize (first_slash - start_pos); |
1867 | + } |
1868 | + else if (question_mark_pos != string::npos) |
1869 | + { |
1870 | + authority.resize (question_mark_pos - start_pos); |
1871 | + } |
1872 | + |
1873 | + if (first_slash == string::npos) |
1874 | + { |
1875 | + first_slash = start_pos + authority.length (); |
1876 | + } |
1877 | + |
1878 | + if (question_mark_pos != string::npos) |
1879 | + { |
1880 | + path = uri.substr (first_slash, question_mark_pos - first_slash); |
1881 | + query = uri.substr (question_mark_pos + 1); |
1882 | + } |
1883 | + else |
1884 | + { |
1885 | + path = uri.substr (first_slash); |
1886 | + } |
1887 | +} |
1888 | + |
1889 | +} /* namespace StringUtils */ |
1890 | + |
1891 | +} /* namespace ZeitgeistFTS */ |
1892 | |
1893 | === added file 'extensions/fts++/stringutils.h' |
1894 | --- extensions/fts++/stringutils.h 1970-01-01 00:00:00 +0000 |
1895 | +++ extensions/fts++/stringutils.h 2012-02-09 22:47:22 +0000 |
1896 | @@ -0,0 +1,42 @@ |
1897 | +/* |
1898 | + * Copyright (C) 2012 Mikkel Kamstrup Erlandsen |
1899 | + * |
1900 | + * This program is free software; you can redistribute it and/or |
1901 | + * modify it under the terms of the GNU General Public License |
1902 | + * as published by the Free Software Foundation; either version 2 |
1903 | + * of the License, or (at your option) any later version. |
1904 | + * |
1905 | + * This program is distributed in the hope that it will be useful, |
1906 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
1907 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
1908 | + * GNU General Public License for more details. |
1909 | + * |
1910 | + * You should have received a copy of the GNU General Public License |
1911 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
1912 | + * |
1913 | + * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
1914 | + * |
1915 | + */ |
1916 | + |
1917 | +#include <string> |
1918 | +#include <glib.h> |
1919 | + |
1920 | +namespace ZeitgeistFTS { |
1921 | + |
1922 | +namespace StringUtils { |
1923 | + |
1924 | +const unsigned int MAX_TERM_LENGTH = 245; |
1925 | + |
1926 | +std::string Truncate (std::string const& s, |
1927 | + unsigned int nbytes = MAX_TERM_LENGTH); |
1928 | + |
1929 | +std::string MangleUri (std::string const& orig); |
1930 | + |
1931 | +void SplitUri (std::string const& uri, |
1932 | + std::string &host, |
1933 | + std::string &path, |
1934 | + std::string &basename); |
1935 | + |
1936 | +} /* namespace StringUtils */ |
1937 | + |
1938 | +} /* namespace ZeitgeistFTS */ |
1939 | |
1940 | === added symlink 'extensions/fts++/table-lookup.vala' |
1941 | === target is u'../../src/table-lookup.vala' |
1942 | === added file 'extensions/fts++/task.cpp' |
1943 | --- extensions/fts++/task.cpp 1970-01-01 00:00:00 +0000 |
1944 | +++ extensions/fts++/task.cpp 2012-02-09 22:47:22 +0000 |
1945 | @@ -0,0 +1,47 @@ |
1946 | +/* |
1947 | + * Copyright (C) 2012 Canonical Ltd |
1948 | + * |
1949 | + * This program is free software; you can redistribute it and/or |
1950 | + * modify it under the terms of the GNU General Public License |
1951 | + * as published by the Free Software Foundation; either version 2 |
1952 | + * of the License, or (at your option) any later version. |
1953 | + * |
1954 | + * This program is distributed in the hope that it will be useful, |
1955 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
1956 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
1957 | + * GNU General Public License for more details. |
1958 | + * |
1959 | + * You should have received a copy of the GNU General Public License |
1960 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
1961 | + * |
1962 | + * Authored by Michal Hruby <michal.hruby@canonical.com> |
1963 | + * |
1964 | + */ |
1965 | + |
1966 | +#include "task.h" |
1967 | + |
1968 | +namespace ZeitgeistFTS { |
1969 | + |
1970 | +void IndexEventsTask::Process (Indexer *indexer) |
1971 | +{ |
1972 | + unsigned end_index = MIN (start_index + event_count, events->len); |
1973 | + for (unsigned i = start_index; i < end_index; i++) |
1974 | + { |
1975 | + indexer->IndexEvent ((ZeitgeistEvent*) g_ptr_array_index (events, i)); |
1976 | + } |
1977 | +} |
1978 | + |
1979 | +void DeleteEventsTask::Process (Indexer *indexer) |
1980 | +{ |
1981 | + for (unsigned i = 0; i < event_ids.size (); i++) |
1982 | + { |
1983 | + indexer->DeleteEvent (event_ids[i]); |
1984 | + } |
1985 | +} |
1986 | + |
1987 | +void MetadataTask::Process (Indexer *indexer) |
1988 | +{ |
1989 | + indexer->SetDbMetadata (key_name, value); |
1990 | +} |
1991 | + |
1992 | +} |
1993 | |
1994 | === added file 'extensions/fts++/task.h' |
1995 | --- extensions/fts++/task.h 1970-01-01 00:00:00 +0000 |
1996 | +++ extensions/fts++/task.h 2012-02-09 22:47:22 +0000 |
1997 | @@ -0,0 +1,100 @@ |
1998 | +/* |
1999 | + * Copyright (C) 2012 Canonical Ltd |
2000 | + * |
2001 | + * This program is free software; you can redistribute it and/or |
2002 | + * modify it under the terms of the GNU General Public License |
2003 | + * as published by the Free Software Foundation; either version 2 |
2004 | + * of the License, or (at your option) any later version. |
2005 | + * |
2006 | + * This program is distributed in the hope that it will be useful, |
2007 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
2008 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
2009 | + * GNU General Public License for more details. |
2010 | + * |
2011 | + * You should have received a copy of the GNU General Public License |
2012 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
2013 | + * |
2014 | + * Authored by Michal Hruby <michal.hruby@canonical.com> |
2015 | + * |
2016 | + */ |
2017 | + |
2018 | +#ifndef _ZGFTS_TASK_H_ |
2019 | +#define _ZGFTS_TASK_H_ |
2020 | + |
2021 | +#include <glib.h> |
2022 | + |
2023 | +#include "indexer.h" |
2024 | + |
2025 | +namespace ZeitgeistFTS { |
2026 | + |
2027 | +/** |
2028 | + * A task contains a chunk of work defined by the Controller. |
2029 | + * A task should not be clever in scheduling on its own, the |
2030 | + * Controller is responsible for breaking down tasks in suitable |
2031 | + * chunks. |
2032 | + */ |
2033 | +class Task |
2034 | +{ |
2035 | +public: |
2036 | + virtual ~Task () {} |
2037 | + virtual void Process (Indexer *indexer) = 0; |
2038 | +}; |
2039 | + |
2040 | +class IndexEventsTask : public Task |
2041 | +{ |
2042 | +public: |
2043 | + void Process (Indexer *indexer); |
2044 | + |
2045 | + IndexEventsTask (GPtrArray *event_arr) |
2046 | + : events (event_arr), start_index (0), event_count (event_arr->len) {} |
2047 | + |
2048 | + IndexEventsTask (GPtrArray *event_arr, unsigned index, unsigned count) |
2049 | + : events (event_arr), start_index (index), event_count (count) {} |
2050 | + |
2051 | + virtual ~IndexEventsTask () |
2052 | + { |
2053 | + g_ptr_array_unref (events); |
2054 | + } |
2055 | + |
2056 | +private: |
2057 | + GPtrArray *events; |
2058 | + unsigned start_index; |
2059 | + unsigned event_count; |
2060 | +}; |
2061 | + |
2062 | +class DeleteEventsTask : public Task |
2063 | +{ |
2064 | +public: |
2065 | + void Process (Indexer *indexer); |
2066 | + |
2067 | + DeleteEventsTask (unsigned *event_ids_arr, int event_ids_arr_size) |
2068 | + : event_ids (event_ids_arr, event_ids_arr + event_ids_arr_size) {} |
2069 | + |
2070 | + virtual ~DeleteEventsTask () |
2071 | + { |
2072 | + } |
2073 | + |
2074 | +private: |
2075 | + std::vector<unsigned> event_ids; |
2076 | +}; |
2077 | + |
2078 | +class MetadataTask : public Task |
2079 | +{ |
2080 | +public: |
2081 | + void Process (Indexer *indexer); |
2082 | + |
2083 | + MetadataTask (std::string const& name, std::string const& val) |
2084 | + : key_name (name), value (val) {} |
2085 | + |
2086 | + virtual ~MetadataTask () |
2087 | + {} |
2088 | + |
2089 | +private: |
2090 | + std::string key_name; |
2091 | + std::string value; |
2092 | +}; |
2093 | + |
2094 | +} |
2095 | + |
2096 | +#endif /* _ZGFTS_TASK_H_ */ |
2097 | + |
2098 | |
2099 | === added directory 'extensions/fts++/test' |
2100 | === added file 'extensions/fts++/test/Makefile.am' |
2101 | --- extensions/fts++/test/Makefile.am 1970-01-01 00:00:00 +0000 |
2102 | +++ extensions/fts++/test/Makefile.am 2012-02-09 22:47:22 +0000 |
2103 | @@ -0,0 +1,27 @@ |
2104 | +NULL = |
2105 | +check_PROGRAMS = test-fts |
2106 | +TESTS = test-fts |
2107 | + |
2108 | +AM_CPPFLAGS = \ |
2109 | + $(ZEITGEIST_CFLAGS) \ |
2110 | + -include $(CONFIG_HEADER) \ |
2111 | + -w \ |
2112 | + -I$(srcdir)/.. \ |
2113 | + $(NULL) |
2114 | + |
2115 | +test_fts_SOURCES = \ |
2116 | + test-stringutils.cpp \ |
2117 | + test-indexer.cpp \ |
2118 | + test-fts.c \ |
2119 | + $(srcdir)/../stringutils.cpp \ |
2120 | + $(srcdir)/../controller.cpp \ |
2121 | + $(srcdir)/../indexer.cpp \ |
2122 | + $(srcdir)/../task.cpp \ |
2123 | + $(srcdir)/../fts.cpp \ |
2124 | + $(NULL) |
2125 | + |
2126 | +test_fts_LDADD = \ |
2127 | + $(builddir)/../libzeitgeist-internal.la \ |
2128 | + -lxapian \ |
2129 | + $(NULL) |
2130 | + |
2131 | |
2132 | === added file 'extensions/fts++/test/test-fts.c' |
2133 | --- extensions/fts++/test/test-fts.c 1970-01-01 00:00:00 +0000 |
2134 | +++ extensions/fts++/test/test-fts.c 2012-02-09 22:47:22 +0000 |
2135 | @@ -0,0 +1,37 @@ |
2136 | +/* |
2137 | + * Copyright (C) 2012 Mikkel Kamstrup Erlandsen |
2138 | + * |
2139 | + * This program is free software; you can redistribute it and/or |
2140 | + * modify it under the terms of the GNU General Public License |
2141 | + * as published by the Free Software Foundation; either version 2 |
2142 | + * of the License, or (at your option) any later version. |
2143 | + * |
2144 | + * This program is distributed in the hope that it will be useful, |
2145 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
2146 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
2147 | + * GNU General Public License for more details. |
2148 | + * |
2149 | + * You should have received a copy of the GNU General Public License |
2150 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
2151 | + * |
2152 | + * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
2153 | + * |
2154 | + */ |
2155 | + |
2156 | +#include <glib-object.h> |
2157 | + |
2158 | +void test_stringutils_create_suite (void); |
2159 | +void test_indexer_create_suite (void); |
2160 | + |
2161 | +gint |
2162 | +main (gint argc, gchar *argv[]) |
2163 | +{ |
2164 | + g_type_init (); |
2165 | + |
2166 | + g_test_init (&argc, &argv, NULL); |
2167 | + |
2168 | + test_stringutils_create_suite (); |
2169 | + test_indexer_create_suite (); |
2170 | + |
2171 | + return g_test_run (); |
2172 | +} |
2173 | |
2174 | === added file 'extensions/fts++/test/test-indexer.cpp' |
2175 | --- extensions/fts++/test/test-indexer.cpp 1970-01-01 00:00:00 +0000 |
2176 | +++ extensions/fts++/test/test-indexer.cpp 2012-02-09 22:47:22 +0000 |
2177 | @@ -0,0 +1,531 @@ |
2178 | +/* |
2179 | + * Copyright (C) 2012 Mikkel Kamstrup Erlandsen |
2180 | + * |
2181 | + * This program is free software; you can redistribute it and/or |
2182 | + * modify it under the terms of the GNU General Public License |
2183 | + * as published by the Free Software Foundation; either version 2 |
2184 | + * of the License, or (at your option) any later version. |
2185 | + * |
2186 | + * This program is distributed in the hope that it will be useful, |
2187 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
2188 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
2189 | + * GNU General Public License for more details. |
2190 | + * |
2191 | + * You should have received a copy of the GNU General Public License |
2192 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
2193 | + * |
2194 | + * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
2195 | + * |
2196 | + */ |
2197 | + |
2198 | +#include <glib-object.h> |
2199 | + |
2200 | +#include "stringutils.h" |
2201 | +#include "fts.h" |
2202 | +#include <zeitgeist-internal.h> |
2203 | + |
2204 | +using namespace ZeitgeistFTS; |
2205 | + |
2206 | +typedef struct |
2207 | +{ |
2208 | + ZeitgeistDbReader *db; |
2209 | + ZeitgeistIndexer *indexer; |
2210 | +} Fixture; |
2211 | + |
2212 | +static void setup (Fixture *fix, gconstpointer data); |
2213 | +static void teardown (Fixture *fix, gconstpointer data); |
2214 | + |
2215 | +static void |
2216 | +setup (Fixture *fix, gconstpointer data) |
2217 | +{ |
2218 | + // use in-memory databases for both zg db and fts db |
2219 | + GError *error = NULL; |
2220 | + g_setenv ("ZEITGEIST_DATABASE_PATH", ":memory:", TRUE); |
2221 | + fix->db = ZEITGEIST_DB_READER (zeitgeist_engine_new (&error)); |
2222 | + |
2223 | + if (error) |
2224 | + { |
2225 | + g_warning ("%s", error->message); |
2226 | + return; |
2227 | + } |
2228 | + |
2229 | + fix->indexer = zeitgeist_indexer_new (fix->db, &error); |
2230 | + if (error) |
2231 | + { |
2232 | + g_warning ("%s", error->message); |
2233 | + return; |
2234 | + } |
2235 | +} |
2236 | + |
2237 | +static void |
2238 | +teardown (Fixture *fix, gconstpointer data) |
2239 | +{ |
2240 | + zeitgeist_indexer_free (fix->indexer); |
2241 | + g_object_unref (fix->db); |
2242 | +} |
2243 | + |
2244 | +static ZeitgeistEvent* create_test_event1 (void) |
2245 | +{ |
2246 | + ZeitgeistEvent *event = zeitgeist_event_new (); |
2247 | + ZeitgeistSubject *subject = zeitgeist_subject_new (); |
2248 | + |
2249 | + zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_RASTER_IMAGE); |
2250 | + zeitgeist_subject_set_manifestation (subject, ZEITGEIST_NFO_REMOTE_DATA_OBJECT); |
2251 | + zeitgeist_subject_set_uri (subject, "http://example.com/image.jpg"); |
2252 | + zeitgeist_subject_set_text (subject, "text"); |
2253 | + zeitgeist_subject_set_mimetype (subject, "image/png"); |
2254 | + |
2255 | + zeitgeist_event_set_interpretation (event, ZEITGEIST_ZG_ACCESS_EVENT); |
2256 | + zeitgeist_event_set_manifestation (event, ZEITGEIST_ZG_USER_ACTIVITY); |
2257 | + zeitgeist_event_set_actor (event, "application://firefox.desktop"); |
2258 | + zeitgeist_event_add_subject (event, subject); |
2259 | + |
2260 | + g_object_unref (subject); |
2261 | + return event; |
2262 | +} |
2263 | + |
2264 | +static ZeitgeistEvent* create_test_event2 (void) |
2265 | +{ |
2266 | + ZeitgeistEvent *event = zeitgeist_event_new (); |
2267 | + ZeitgeistSubject *subject = zeitgeist_subject_new (); |
2268 | + |
2269 | + zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_WEBSITE); |
2270 | + zeitgeist_subject_set_manifestation (subject, ZEITGEIST_NFO_REMOTE_DATA_OBJECT); |
2271 | + zeitgeist_subject_set_uri (subject, "http://example.com/I%20Love%20Wikis"); |
2272 | + zeitgeist_subject_set_text (subject, "Example.com Wiki Page. Kanji is awesome 漢字"); |
2273 | + zeitgeist_subject_set_mimetype (subject, "text/html"); |
2274 | + |
2275 | + zeitgeist_event_set_interpretation (event, ZEITGEIST_ZG_ACCESS_EVENT); |
2276 | + zeitgeist_event_set_manifestation (event, ZEITGEIST_ZG_USER_ACTIVITY); |
2277 | + zeitgeist_event_set_actor (event, "application://firefox.desktop"); |
2278 | + zeitgeist_event_add_subject (event, subject); |
2279 | + |
2280 | + g_object_unref (subject); |
2281 | + return event; |
2282 | +} |
2283 | + |
2284 | +static ZeitgeistEvent* create_test_event3 (void) |
2285 | +{ |
2286 | + ZeitgeistEvent *event = zeitgeist_event_new (); |
2287 | + ZeitgeistSubject *subject = zeitgeist_subject_new (); |
2288 | + |
2289 | + zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_WEBSITE); |
2290 | + zeitgeist_subject_set_manifestation (subject, ZEITGEIST_NFO_REMOTE_DATA_OBJECT); |
2291 | + // Greek IDN - stands for http://παράδειγμα.δοκιμή |
2292 | + zeitgeist_subject_set_uri (subject, "http://xn--hxajbheg2az3al.xn--jxalpdlp/"); |
2293 | + zeitgeist_subject_set_text (subject, "IDNwiki"); |
2294 | + zeitgeist_subject_set_mimetype (subject, "text/html"); |
2295 | + |
2296 | + zeitgeist_event_set_interpretation (event, ZEITGEIST_ZG_ACCESS_EVENT); |
2297 | + zeitgeist_event_set_manifestation (event, ZEITGEIST_ZG_USER_ACTIVITY); |
2298 | + zeitgeist_event_set_actor (event, "application://firefox.desktop"); |
2299 | + zeitgeist_event_add_subject (event, subject); |
2300 | + |
2301 | + g_object_unref (subject); |
2302 | + return event; |
2303 | +} |
2304 | + |
2305 | +static ZeitgeistEvent* create_test_event4 (void) |
2306 | +{ |
2307 | + ZeitgeistEvent *event = zeitgeist_event_new (); |
2308 | + ZeitgeistSubject *subject = zeitgeist_subject_new (); |
2309 | + |
2310 | + zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_PRESENTATION); |
2311 | + zeitgeist_subject_set_manifestation (subject, ZEITGEIST_NFO_FILE_DATA_OBJECT); |
2312 | + zeitgeist_subject_set_uri (subject, "file:///home/username/Documents/my_fabulous_presentation.pdf"); |
2313 | + zeitgeist_subject_set_text (subject, NULL); |
2314 | + zeitgeist_subject_set_mimetype (subject, "application/pdf"); |
2315 | + |
2316 | + zeitgeist_event_set_interpretation (event, ZEITGEIST_ZG_MODIFY_EVENT); |
2317 | + zeitgeist_event_set_manifestation (event, ZEITGEIST_ZG_USER_ACTIVITY); |
2318 | + zeitgeist_event_set_actor (event, "application://libreoffice-impress.desktop"); |
2319 | + zeitgeist_event_add_subject (event, subject); |
2320 | + |
2321 | + g_object_unref (subject); |
2322 | + return event; |
2323 | +} |
2324 | + |
2325 | +// Steals the event, ref it if you want to keep it |
2326 | +static guint |
2327 | +index_event (Fixture *fix, ZeitgeistEvent *event) |
2328 | +{ |
2329 | + guint event_id = 0; |
2330 | + |
2331 | + // add event to DBs |
2332 | + event_id = zeitgeist_engine_insert_event (ZEITGEIST_ENGINE (fix->db), |
2333 | + event, NULL, NULL); |
2334 | + |
2335 | + GPtrArray *events = g_ptr_array_new_with_free_func (g_object_unref); |
2336 | + g_ptr_array_add (events, event); // steal event ref |
2337 | + zeitgeist_indexer_index_events (fix->indexer, events); |
2338 | + g_ptr_array_unref (events); |
2339 | + |
2340 | + while (zeitgeist_indexer_has_pending_tasks (fix->indexer)) |
2341 | + { |
2342 | + zeitgeist_indexer_process_task (fix->indexer); |
2343 | + } |
2344 | + |
2345 | + return event_id; |
2346 | +} |
2347 | + |
2348 | +static void |
2349 | +test_simple_query (Fixture *fix, gconstpointer data) |
2350 | +{ |
2351 | + guint matches; |
2352 | + guint event_id; |
2353 | + ZeitgeistEvent* event; |
2354 | + |
2355 | + // add test events to DBs |
2356 | + event_id = index_event (fix, create_test_event1 ()); |
2357 | + index_event (fix, create_test_event2 ()); |
2358 | + index_event (fix, create_test_event3 ()); |
2359 | + index_event (fix, create_test_event4 ()); |
2360 | + |
2361 | + GPtrArray *results = |
2362 | + zeitgeist_indexer_search (fix->indexer, |
2363 | + "text", |
2364 | + zeitgeist_time_range_new_anytime (), |
2365 | + g_ptr_array_new (), |
2366 | + 0, |
2367 | + 10, |
2368 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
2369 | + &matches, |
2370 | + NULL); |
2371 | + |
2372 | + g_assert_cmpuint (matches, >, 0); |
2373 | + g_assert_cmpuint (results->len, ==, 1); |
2374 | + |
2375 | + event = (ZeitgeistEvent*) results->pdata[0]; |
2376 | + g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id); |
2377 | + |
2378 | + ZeitgeistSubject *subject = (ZeitgeistSubject*) |
2379 | + g_ptr_array_index (zeitgeist_event_get_subjects (event), 0); |
2380 | + g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "text"); |
2381 | +} |
2382 | + |
2383 | +static void |
2384 | +test_simple_with_filter (Fixture *fix, gconstpointer data) |
2385 | +{ |
2386 | + guint matches; |
2387 | + guint event_id; |
2388 | + ZeitgeistEvent* event; |
2389 | + |
2390 | + // add test events to DBs |
2391 | + index_event (fix, create_test_event1 ()); |
2392 | + index_event (fix, create_test_event2 ()); |
2393 | + |
2394 | + GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref); |
2395 | + event = zeitgeist_event_new (); |
2396 | + zeitgeist_event_set_interpretation (event, ZEITGEIST_NFO_DOCUMENT); |
2397 | + g_ptr_array_add (filters, event); // steals ref |
2398 | + |
2399 | + GPtrArray *results = |
2400 | + zeitgeist_indexer_search (fix->indexer, |
2401 | + "text", |
2402 | + zeitgeist_time_range_new_anytime (), |
2403 | + filters, |
2404 | + 0, |
2405 | + 10, |
2406 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
2407 | + &matches, |
2408 | + NULL); |
2409 | + |
2410 | + g_assert_cmpuint (results->len, ==, 0); |
2411 | + g_assert_cmpuint (matches, ==, 0); |
2412 | +} |
2413 | + |
2414 | +static void |
2415 | +test_simple_with_valid_filter (Fixture *fix, gconstpointer data) |
2416 | +{ |
2417 | + guint matches; |
2418 | + guint event_id; |
2419 | + ZeitgeistEvent* event; |
2420 | + ZeitgeistSubject *subject; |
2421 | + |
2422 | + // add test events to DBs |
2423 | + event_id = index_event (fix, create_test_event1 ()); |
2424 | + index_event (fix, create_test_event2 ()); |
2425 | + |
2426 | + GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref); |
2427 | + event = zeitgeist_event_new (); |
2428 | + subject = zeitgeist_subject_new (); |
2429 | + zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_IMAGE); |
2430 | + zeitgeist_event_add_subject (event, subject); |
2431 | + g_ptr_array_add (filters, event); // steals ref |
2432 | + |
2433 | + GPtrArray *results = |
2434 | + zeitgeist_indexer_search (fix->indexer, |
2435 | + "text", |
2436 | + zeitgeist_time_range_new_anytime (), |
2437 | + filters, |
2438 | + 0, |
2439 | + 10, |
2440 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
2441 | + &matches, |
2442 | + NULL); |
2443 | + |
2444 | + g_assert_cmpuint (matches, >, 0); |
2445 | + g_assert_cmpuint (results->len, ==, 1); |
2446 | + |
2447 | + event = (ZeitgeistEvent*) results->pdata[0]; |
2448 | + g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id); |
2449 | + |
2450 | + subject = (ZeitgeistSubject*) |
2451 | + g_ptr_array_index (zeitgeist_event_get_subjects (event), 0); |
2452 | + g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "text"); |
2453 | +} |
2454 | + |
2455 | +static void |
2456 | +test_simple_negation (Fixture *fix, gconstpointer data) |
2457 | +{ |
2458 | + guint matches; |
2459 | + guint event_id; |
2460 | + ZeitgeistEvent* event; |
2461 | + ZeitgeistSubject *subject; |
2462 | + |
2463 | + // add test events to DBs |
2464 | + event_id = index_event (fix, create_test_event1 ()); |
2465 | + index_event (fix, create_test_event2 ()); |
2466 | + |
2467 | + GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref); |
2468 | + event = zeitgeist_event_new (); |
2469 | + subject = zeitgeist_subject_new (); |
2470 | + zeitgeist_subject_set_interpretation (subject, "!" ZEITGEIST_NFO_IMAGE); |
2471 | + zeitgeist_event_add_subject (event, subject); |
2472 | + g_ptr_array_add (filters, event); // steals ref |
2473 | + |
2474 | + GPtrArray *results = |
2475 | + zeitgeist_indexer_search (fix->indexer, |
2476 | + "text", |
2477 | + zeitgeist_time_range_new_anytime (), |
2478 | + filters, |
2479 | + 0, |
2480 | + 10, |
2481 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
2482 | + &matches, |
2483 | + NULL); |
2484 | + |
2485 | + g_assert_cmpuint (matches, ==, 0); |
2486 | + g_assert_cmpuint (results->len, ==, 0); |
2487 | +} |
2488 | + |
2489 | +static void |
2490 | +test_simple_noexpand (Fixture *fix, gconstpointer data) |
2491 | +{ |
2492 | + guint matches; |
2493 | + guint event_id; |
2494 | + ZeitgeistEvent* event; |
2495 | + ZeitgeistSubject *subject; |
2496 | + |
2497 | + // add test events to DBs |
2498 | + event_id = index_event (fix, create_test_event1 ()); |
2499 | + index_event (fix, create_test_event2 ()); |
2500 | + |
2501 | + GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref); |
2502 | + event = zeitgeist_event_new (); |
2503 | + subject = zeitgeist_subject_new (); |
2504 | + zeitgeist_subject_set_interpretation (subject, "+" ZEITGEIST_NFO_IMAGE); |
2505 | + zeitgeist_event_add_subject (event, subject); |
2506 | + g_ptr_array_add (filters, event); // steals ref |
2507 | + |
2508 | + GPtrArray *results = |
2509 | + zeitgeist_indexer_search (fix->indexer, |
2510 | + "text", |
2511 | + zeitgeist_time_range_new_anytime (), |
2512 | + filters, |
2513 | + 0, |
2514 | + 10, |
2515 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
2516 | + &matches, |
2517 | + NULL); |
2518 | + |
2519 | + g_assert_cmpuint (matches, ==, 0); |
2520 | + g_assert_cmpuint (results->len, ==, 0); |
2521 | +} |
2522 | + |
2523 | +static void |
2524 | +test_simple_noexpand_valid (Fixture *fix, gconstpointer data) |
2525 | +{ |
2526 | + guint matches; |
2527 | + guint event_id; |
2528 | + ZeitgeistEvent* event; |
2529 | + ZeitgeistSubject *subject; |
2530 | + |
2531 | + // add test events to DBs |
2532 | + event_id = index_event (fix, create_test_event1 ()); |
2533 | + index_event (fix, create_test_event2 ()); |
2534 | + |
2535 | + GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref); |
2536 | + event = zeitgeist_event_new (); |
2537 | + subject = zeitgeist_subject_new (); |
2538 | + zeitgeist_subject_set_interpretation (subject, "+"ZEITGEIST_NFO_RASTER_IMAGE); |
2539 | + zeitgeist_event_add_subject (event, subject); |
2540 | + g_ptr_array_add (filters, event); // steals ref |
2541 | + |
2542 | + GPtrArray *results = |
2543 | + zeitgeist_indexer_search (fix->indexer, |
2544 | + "text", |
2545 | + zeitgeist_time_range_new_anytime (), |
2546 | + filters, |
2547 | + 0, |
2548 | + 10, |
2549 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
2550 | + &matches, |
2551 | + NULL); |
2552 | + |
2553 | + g_assert_cmpuint (matches, >, 0); |
2554 | + g_assert_cmpuint (results->len, ==, 1); |
2555 | + |
2556 | + event = (ZeitgeistEvent*) results->pdata[0]; |
2557 | + g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id); |
2558 | + |
2559 | + subject = (ZeitgeistSubject*) |
2560 | + g_ptr_array_index (zeitgeist_event_get_subjects (event), 0); |
2561 | + g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "text"); |
2562 | +} |
2563 | + |
2564 | +static void |
2565 | +test_simple_url_unescape (Fixture *fix, gconstpointer data) |
2566 | +{ |
2567 | + guint matches; |
2568 | + guint event_id; |
2569 | + ZeitgeistEvent* event; |
2570 | + ZeitgeistSubject *subject; |
2571 | + |
2572 | + // add test events to DBs |
2573 | + index_event (fix, create_test_event1 ()); |
2574 | + event_id = index_event (fix, create_test_event2 ()); |
2575 | + |
2576 | + GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref); |
2577 | + event = zeitgeist_event_new (); |
2578 | + subject = zeitgeist_subject_new (); |
2579 | + zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_WEBSITE); |
2580 | + zeitgeist_event_add_subject (event, subject); |
2581 | + g_ptr_array_add (filters, event); // steals ref |
2582 | + |
2583 | + GPtrArray *results = |
2584 | + zeitgeist_indexer_search (fix->indexer, |
2585 | + "love", |
2586 | + zeitgeist_time_range_new_anytime (), |
2587 | + filters, |
2588 | + 0, |
2589 | + 10, |
2590 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
2591 | + &matches, |
2592 | + NULL); |
2593 | + |
2594 | + g_assert_cmpuint (matches, >, 0); |
2595 | + g_assert_cmpuint (results->len, ==, 1); |
2596 | + |
2597 | + event = (ZeitgeistEvent*) results->pdata[0]; |
2598 | + g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id); |
2599 | + |
2600 | + subject = (ZeitgeistSubject*) |
2601 | + g_ptr_array_index (zeitgeist_event_get_subjects (event), 0); |
2602 | + g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "Example.com Wiki Page. Kanji is awesome 漢字"); |
2603 | +} |
2604 | + |
2605 | +static void |
2606 | +test_simple_cjk (Fixture *fix, gconstpointer data) |
2607 | +{ |
2608 | + guint matches; |
2609 | + guint event_id; |
2610 | + ZeitgeistEvent* event; |
2611 | + ZeitgeistSubject *subject; |
2612 | + |
2613 | + // add test events to DBs |
2614 | + index_event (fix, create_test_event1 ()); |
2615 | + event_id = index_event (fix, create_test_event2 ()); |
2616 | + |
2617 | + GPtrArray *results = |
2618 | + zeitgeist_indexer_search (fix->indexer, |
2619 | + "漢*", |
2620 | + zeitgeist_time_range_new_anytime (), |
2621 | + g_ptr_array_new (), |
2622 | + 0, |
2623 | + 10, |
2624 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
2625 | + &matches, |
2626 | + NULL); |
2627 | + |
2628 | + g_assert_cmpuint (matches, >, 0); |
2629 | + g_assert_cmpuint (results->len, ==, 1); |
2630 | + |
2631 | + event = (ZeitgeistEvent*) results->pdata[0]; |
2632 | + g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id); |
2633 | + |
2634 | + subject = (ZeitgeistSubject*) |
2635 | + g_ptr_array_index (zeitgeist_event_get_subjects (event), 0); |
2636 | + g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "Example.com Wiki Page. Kanji is awesome 漢字"); |
2637 | +} |
2638 | + |
2639 | +static void |
2640 | +test_simple_idn_support (Fixture *fix, gconstpointer data) |
2641 | +{ |
2642 | + guint matches; |
2643 | + guint event_id; |
2644 | + ZeitgeistEvent* event; |
2645 | + ZeitgeistSubject *subject; |
2646 | + |
2647 | + // add test events to DBs |
2648 | + index_event (fix, create_test_event1 ()); |
2649 | + index_event (fix, create_test_event2 ()); |
2650 | + event_id = index_event (fix, create_test_event3 ()); |
2651 | + |
2652 | + GPtrArray *results = |
2653 | + zeitgeist_indexer_search (fix->indexer, |
2654 | + "παράδειγμα", |
2655 | + zeitgeist_time_range_new_anytime (), |
2656 | + g_ptr_array_new (), |
2657 | + 0, |
2658 | + 10, |
2659 | + ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS, |
2660 | + &matches, |
2661 | + NULL); |
2662 | + |
2663 | + g_assert_cmpuint (matches, >, 0); |
2664 | + g_assert_cmpuint (results->len, ==, 1); |
2665 | + |
2666 | + event = (ZeitgeistEvent*) results->pdata[0]; |
2667 | + g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id); |
2668 | + |
2669 | + subject = (ZeitgeistSubject*) |
2670 | + g_ptr_array_index (zeitgeist_event_get_subjects (event), 0); |
2671 | + g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "IDNwiki"); |
2672 | +} |
2673 | + |
2674 | +G_BEGIN_DECLS |
2675 | + |
2676 | +static void discard_message (const gchar *domain, |
2677 | + GLogLevelFlags level, |
2678 | + const gchar *msg, |
2679 | + gpointer userdata) |
2680 | +{ |
2681 | +} |
2682 | + |
2683 | +void test_indexer_create_suite (void) |
2684 | +{ |
2685 | + g_test_add ("/Zeitgeist/FTS/Indexer/SimpleQuery", Fixture, 0, |
2686 | + setup, test_simple_query, teardown); |
2687 | + g_test_add ("/Zeitgeist/FTS/Indexer/SimpleWithFilter", Fixture, 0, |
2688 | + setup, test_simple_with_filter, teardown); |
2689 | + g_test_add ("/Zeitgeist/FTS/Indexer/SimpleWithValidFilter", Fixture, 0, |
2690 | + setup, test_simple_with_valid_filter, teardown); |
2691 | + g_test_add ("/Zeitgeist/FTS/Indexer/SimpleNegation", Fixture, 0, |
2692 | + setup, test_simple_negation, teardown); |
2693 | + g_test_add ("/Zeitgeist/FTS/Indexer/SimpleNoexpand", Fixture, 0, |
2694 | + setup, test_simple_noexpand, teardown); |
2695 | + g_test_add ("/Zeitgeist/FTS/Indexer/SimpleNoexpandValid", Fixture, 0, |
2696 | + setup, test_simple_noexpand_valid, teardown); |
2697 | + g_test_add ("/Zeitgeist/FTS/Indexer/URLUnescape", Fixture, 0, |
2698 | + setup, test_simple_url_unescape, teardown); |
2699 | + g_test_add ("/Zeitgeist/FTS/Indexer/IDNSupport", Fixture, 0, |
2700 | + setup, test_simple_idn_support, teardown); |
2701 | + g_test_add ("/Zeitgeist/FTS/Indexer/CJK", Fixture, 0, |
2702 | + setup, test_simple_cjk, teardown); |
2703 | + |
2704 | + // get rid of the "rebuilding index..." messages |
2705 | + g_log_set_handler (NULL, G_LOG_LEVEL_MESSAGE, discard_message, NULL); |
2706 | +} |
2707 | + |
2708 | +G_END_DECLS |
2709 | |
2710 | === added file 'extensions/fts++/test/test-stringutils.cpp' |
2711 | --- extensions/fts++/test/test-stringutils.cpp 1970-01-01 00:00:00 +0000 |
2712 | +++ extensions/fts++/test/test-stringutils.cpp 2012-02-09 22:47:22 +0000 |
2713 | @@ -0,0 +1,178 @@ |
2714 | +/* |
2715 | + * Copyright (C) 2012 Mikkel Kamstrup Erlandsen |
2716 | + * |
2717 | + * This program is free software; you can redistribute it and/or |
2718 | + * modify it under the terms of the GNU General Public License |
2719 | + * as published by the Free Software Foundation; either version 2 |
2720 | + * of the License, or (at your option) any later version. |
2721 | + * |
2722 | + * This program is distributed in the hope that it will be useful, |
2723 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
2724 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
2725 | + * GNU General Public License for more details. |
2726 | + * |
2727 | + * You should have received a copy of the GNU General Public License |
2728 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
2729 | + * |
2730 | + * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
2731 | + * |
2732 | + */ |
2733 | + |
2734 | +#include <glib-object.h> |
2735 | + |
2736 | +#include "stringutils.h" |
2737 | + |
2738 | +using namespace ZeitgeistFTS; |
2739 | + |
2740 | +typedef struct |
2741 | +{ |
2742 | + int i; |
2743 | +} Fixture; |
2744 | + |
2745 | +static void setup (Fixture *fix, gconstpointer data); |
2746 | +static void teardown (Fixture *fix, gconstpointer data); |
2747 | + |
2748 | +static void |
2749 | +setup (Fixture *fix, gconstpointer data) |
2750 | +{ |
2751 | + |
2752 | +} |
2753 | + |
2754 | +static void |
2755 | +teardown (Fixture *fix, gconstpointer data) |
2756 | +{ |
2757 | + |
2758 | +} |
2759 | + |
2760 | +static void |
2761 | +test_truncate (Fixture *fix, gconstpointer data) |
2762 | +{ |
2763 | + g_assert_cmpstr ("", ==, StringUtils::Truncate("").c_str ()); |
2764 | + |
2765 | + g_assert_cmpstr ("", ==, StringUtils::Truncate("a", 0).c_str ()); |
2766 | + g_assert_cmpstr ("a", ==, StringUtils::Truncate("a", 1).c_str ()); |
2767 | + g_assert_cmpstr ("a", ==, StringUtils::Truncate("a").c_str ()); |
2768 | + |
2769 | + g_assert_cmpstr ("", ==, StringUtils::Truncate("aa", 0).c_str ()); |
2770 | + g_assert_cmpstr ("a", ==, StringUtils::Truncate("aa", 1).c_str ()); |
2771 | + g_assert_cmpstr ("aa", ==, StringUtils::Truncate("aa", 2).c_str ()); |
2772 | + g_assert_cmpstr ("aa", ==, StringUtils::Truncate("aa").c_str ()); |
2773 | + |
2774 | + |
2775 | + g_assert_cmpstr ("", ==, StringUtils::Truncate("å", 0).c_str ()); |
2776 | + g_assert_cmpstr ("", ==, StringUtils::Truncate("å", 1).c_str ()); |
2777 | + g_assert_cmpstr ("å", ==, StringUtils::Truncate("å").c_str ()); |
2778 | + |
2779 | + g_assert_cmpstr ("", ==, StringUtils::Truncate("åå", 0).c_str ()); |
2780 | + g_assert_cmpstr ("", ==, StringUtils::Truncate("åå", 1).c_str ()); |
2781 | + g_assert_cmpstr ("å", ==, StringUtils::Truncate("åå", 2).c_str ()); |
2782 | + g_assert_cmpstr ("å", ==, StringUtils::Truncate("åå", 3).c_str ()); |
2783 | + g_assert_cmpstr ("åå", ==, StringUtils::Truncate("åå", 4).c_str ()); |
2784 | + g_assert_cmpstr ("åå", ==, StringUtils::Truncate("åå").c_str ()); |
2785 | +} |
2786 | + |
2787 | +static void |
2788 | +test_mangle (Fixture *fix, gconstpointer data) |
2789 | +{ |
2790 | + g_assert_cmpstr ("", ==, StringUtils::MangleUri("").c_str ()); |
2791 | + |
2792 | + g_assert_cmpstr ("file", ==, StringUtils::MangleUri("file").c_str ()); |
2793 | + g_assert_cmpstr ("file___", ==, StringUtils::MangleUri("file://").c_str ()); |
2794 | + g_assert_cmpstr ("http___www.zeitgeist-project.com", ==, |
2795 | + StringUtils::MangleUri("http://www.zeitgeist-project.com").c_str ()); |
2796 | + |
2797 | + g_assert_cmpstr ("scheme_no_spaces_in_uris", ==, |
2798 | + StringUtils::MangleUri("scheme:no spaces in uris").c_str ()); |
2799 | +} |
2800 | + |
2801 | +static void |
2802 | +test_split (Fixture *fix, gconstpointer data) |
2803 | +{ |
2804 | + std::string authority, path, query; |
2805 | + |
2806 | + authority = path = query = ""; |
2807 | + StringUtils::SplitUri ("", authority, path, query); // doesn't crash |
2808 | + |
2809 | + g_assert_cmpstr ("", ==, authority.c_str ()); |
2810 | + g_assert_cmpstr ("", ==, path.c_str ()); |
2811 | + g_assert_cmpstr ("", ==, query.c_str ()); |
2812 | + |
2813 | + authority = path = query = ""; |
2814 | + StringUtils::SplitUri ("scheme:", authority, path, query); // doesn't crash |
2815 | + |
2816 | + g_assert_cmpstr ("", ==, authority.c_str ()); |
2817 | + g_assert_cmpstr ("", ==, path.c_str ()); |
2818 | + g_assert_cmpstr ("", ==, query.c_str ()); |
2819 | + |
2820 | + authority = path = query = ""; |
2821 | + StringUtils::SplitUri ("ldap://ldap1.example.net:6666/o=University%20" |
2822 | + "of%20Michigan,c=US??sub?(cn=Babs%20Jensen)", |
2823 | + authority, path, query); |
2824 | + |
2825 | + g_assert_cmpstr ("ldap1.example.net:6666", ==, authority.c_str ()); |
2826 | + g_assert_cmpstr ("/o=University%20of%20Michigan,c=US", ==, path.c_str ()); |
2827 | + g_assert_cmpstr ("?sub?(cn=Babs%20Jensen)", ==, query.c_str ()); |
2828 | + |
2829 | + |
2830 | + authority = path = query = ""; |
2831 | + StringUtils::SplitUri ("mailto:jsmith@example.com", |
2832 | + authority, path, query); |
2833 | + |
2834 | + g_assert_cmpstr ("jsmith@example.com", ==, authority.c_str ()); |
2835 | + g_assert_cmpstr ("", ==, path.c_str ()); |
2836 | + g_assert_cmpstr ("", ==, query.c_str ()); |
2837 | + |
2838 | + authority = path = query = ""; |
2839 | + StringUtils::SplitUri ("mailto:jsmith@example.com?subject=A%20Test&body=" |
2840 | + "My%20idea%20is%3A%20%0A", authority, path, query); |
2841 | + |
2842 | + g_assert_cmpstr ("jsmith@example.com", ==, authority.c_str ()); |
2843 | + g_assert_cmpstr ("", ==, path.c_str ()); |
2844 | + g_assert_cmpstr ("subject=A%20Test&body=My%20idea%20is%3A%20%0A", ==, query.c_str ()); |
2845 | + |
2846 | + authority = path = query = ""; |
2847 | + StringUtils::SplitUri ("sip:alice@atlanta.com?subject=project%20x", |
2848 | + authority, path, query); |
2849 | + |
2850 | + g_assert_cmpstr ("alice@atlanta.com", ==, authority.c_str ()); |
2851 | + g_assert_cmpstr ("", ==, path.c_str ()); |
2852 | + g_assert_cmpstr ("subject=project%20x", ==, query.c_str ()); |
2853 | + |
2854 | + authority = path = query = ""; |
2855 | + StringUtils::SplitUri ("file:///", |
2856 | + authority, path, query); |
2857 | + |
2858 | + g_assert_cmpstr ("", ==, authority.c_str ()); |
2859 | + g_assert_cmpstr ("/", ==, path.c_str ()); |
2860 | + g_assert_cmpstr ("", ==, query.c_str ()); |
2861 | + |
2862 | + authority = path = query = ""; |
2863 | + StringUtils::SplitUri ("file:///home/username/file.ext", |
2864 | + authority, path, query); |
2865 | + |
2866 | + g_assert_cmpstr ("", ==, authority.c_str ()); |
2867 | + g_assert_cmpstr ("/home/username/file.ext", ==, path.c_str ()); |
2868 | + g_assert_cmpstr ("", ==, query.c_str ()); |
2869 | + |
2870 | + authority = path = query = ""; |
2871 | + StringUtils::SplitUri ("dns://192.168.1.1/ftp.example.org?type=A", |
2872 | + authority, path, query); |
2873 | + |
2874 | + g_assert_cmpstr ("192.168.1.1", ==, authority.c_str ()); |
2875 | + g_assert_cmpstr ("/ftp.example.org", ==, path.c_str ()); |
2876 | + g_assert_cmpstr ("type=A", ==, query.c_str ()); |
2877 | +} |
2878 | + |
2879 | +G_BEGIN_DECLS |
2880 | + |
2881 | +void test_stringutils_create_suite (void) |
2882 | +{ |
2883 | + g_test_add ("/Zeitgeist/FTS/StringUtils/Truncate", Fixture, 0, |
2884 | + setup, test_truncate, teardown); |
2885 | + g_test_add ("/Zeitgeist/FTS/StringUtils/MangleUri", Fixture, 0, |
2886 | + setup, test_mangle, teardown); |
2887 | + g_test_add ("/Zeitgeist/FTS/StringUtils/SplitUri", Fixture, 0, |
2888 | + setup, test_split, teardown); |
2889 | +} |
2890 | + |
2891 | +G_END_DECLS |
2892 | |
2893 | === added symlink 'extensions/fts++/utils.vala' |
2894 | === target is u'../../src/utils.vala' |
2895 | === added symlink 'extensions/fts++/where-clause.vala' |
2896 | === target is u'../../src/where-clause.vala' |
2897 | === added file 'extensions/fts++/zeitgeist-fts.vala' |
2898 | --- extensions/fts++/zeitgeist-fts.vala 1970-01-01 00:00:00 +0000 |
2899 | +++ extensions/fts++/zeitgeist-fts.vala 2012-02-09 22:47:22 +0000 |
2900 | @@ -0,0 +1,301 @@ |
2901 | +/* zeitgeist-fts.vala |
2902 | + * |
2903 | + * Copyright © 2012 Canonical Ltd. |
2904 | + * Copyright © 2012 Michal Hruby <michal.mhr@gmail.com> |
2905 | + * |
2906 | + * This program is free software; you can redistribute it and/or |
2907 | + * modify it under the terms of the GNU General Public License |
2908 | + * as published by the Free Software Foundation; either version 2 |
2909 | + * of the License, or (at your option) any later version. |
2910 | + * |
2911 | + * This program is distributed in the hope that it will be useful, |
2912 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
2913 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
2914 | + * GNU General Public License for more details. |
2915 | + * |
2916 | + * You should have received a copy of the GNU General Public License |
2917 | + * along with this program. If not, see <http://www.gnu.org/licenses/>. |
2918 | + * |
2919 | + */ |
2920 | + |
2921 | +namespace Zeitgeist |
2922 | +{ |
2923 | + |
2924 | + [DBus (name = "org.freedesktop.DBus")] |
2925 | + public interface RemoteDBus : Object |
2926 | + { |
2927 | + public abstract bool name_has_owner (string name) throws IOError; |
2928 | + } |
2929 | + |
2930 | + public class FtsDaemon : Object, RemoteSimpleIndexer, RemoteMonitor |
2931 | + { |
2932 | + //const string DBUS_NAME = "org.gnome.zeitgeist.Fts"; |
2933 | + const string DBUS_NAME = "org.gnome.zeitgeist.SimpleIndexer"; |
2934 | + const string ZEITGEIST_DBUS_NAME = "org.gnome.zeitgeist.Engine"; |
2935 | + private static bool show_version_info = false; |
2936 | + private static string log_level = ""; |
2937 | + |
2938 | + const OptionEntry[] options = |
2939 | + { |
2940 | + { |
2941 | + "version", 'v', 0, OptionArg.NONE, out show_version_info, |
2942 | + "Print program's version number and exit", null |
2943 | + }, |
2944 | + { |
2945 | + "log-level", 0, 0, OptionArg.STRING, out log_level, |
2946 | + "How much information should be printed; possible values: " + |
2947 | + "DEBUG, INFO, WARNING, ERROR, CRITICAL", "LEVEL" |
2948 | + }, |
2949 | + { |
2950 | + null |
2951 | + } |
2952 | + }; |
2953 | + |
2954 | + private static FtsDaemon? instance; |
2955 | + private static MainLoop mainloop; |
2956 | + private static bool name_acquired = false; |
2957 | + |
2958 | + private DbReader engine; |
2959 | + private Indexer indexer; |
2960 | + |
2961 | + private uint indexer_register_id; |
2962 | + private uint monitor_register_id; |
2963 | + private unowned DBusConnection connection; |
2964 | + |
2965 | + public FtsDaemon () throws EngineError |
2966 | + { |
2967 | + engine = new DbReader (); |
2968 | + indexer = new Indexer (engine); |
2969 | + } |
2970 | + |
2971 | + private void do_quit () |
2972 | + { |
2973 | + engine.close (); |
2974 | + mainloop.quit (); |
2975 | + } |
2976 | + |
2977 | + public void register_dbus_object (DBusConnection conn) throws IOError |
2978 | + { |
2979 | + connection = conn; |
2980 | + indexer_register_id = conn.register_object<RemoteSimpleIndexer> ( |
2981 | + "/org/gnome/zeitgeist/index/activity", this); |
2982 | + monitor_register_id = conn.register_object<RemoteMonitor> ( |
2983 | + "/org/gnome/zeitgeist/monitor/special", this); |
2984 | + } |
2985 | + |
2986 | + public void unregister_dbus_object () |
2987 | + { |
2988 | + if (indexer_register_id != 0) |
2989 | + { |
2990 | + connection.unregister_object (indexer_register_id); |
2991 | + indexer_register_id = 0; |
2992 | + } |
2993 | + |
2994 | + if (monitor_register_id != 0) |
2995 | + { |
2996 | + connection.unregister_object (monitor_register_id); |
2997 | + monitor_register_id = 0; |
2998 | + } |
2999 | + } |
3000 | + |
3001 | + public async void notify_insert (Variant time_range, Variant events) |
3002 | + throws IOError |
3003 | + { |
3004 | + debug ("got insertion notification"); |
3005 | + var events_arr = Events.from_variant (events); |
3006 | + indexer.index_events (events_arr); |
3007 | + } |
3008 | + |
3009 | + public async void notify_delete (Variant time_range, uint32[] event_ids) |
3010 | + throws IOError |
3011 | + { |
3012 | + debug ("got deletion notification"); |
3013 | + indexer.delete_events (event_ids); |
3014 | + } |
3015 | + |
3016 | + public async void search (string query_string, Variant time_range, |
3017 | + Variant filter_templates, |
3018 | + uint offset, uint count, uint result_type, |
3019 | + out Variant events, out uint matches) |
3020 | + throws Error |
3021 | + { |
3022 | + var tr = new TimeRange.from_variant (time_range); |
3023 | + var templates = Events.from_variant (filter_templates); |
3024 | + var results = instance.indexer.search (query_string, |
3025 | + tr, |
3026 | + templates, |
3027 | + offset, |
3028 | + count, |
3029 | + (ResultType) result_type, |
3030 | + out matches); |
3031 | + |
3032 | + events = Events.to_variant (results); |
3033 | + } |
3034 | + |
3035 | + private static void name_acquired_callback (DBusConnection conn) |
3036 | + { |
3037 | + name_acquired = true; |
3038 | + } |
3039 | + |
3040 | + private static void name_lost_callback (DBusConnection? conn) |
3041 | + { |
3042 | + if (conn == null) |
3043 | + { |
3044 | + // something happened to our bus connection |
3045 | + mainloop.quit (); |
3046 | + } |
3047 | + else if (instance != null && name_acquired) |
3048 | + { |
3049 | + // we owned the name and we lost it... what to do? |
3050 | + mainloop.quit (); |
3051 | + } |
3052 | + } |
3053 | + |
3054 | + static void run () |
3055 | + throws Error |
3056 | + { |
3057 | + DBusConnection connection = Bus.get_sync (BusType.SESSION); |
3058 | + var proxy = connection.get_proxy_sync<RemoteDBus> ( |
3059 | + "org.freedesktop.DBus", "/org/freedesktop/DBus", |
3060 | + DBusProxyFlags.DO_NOT_LOAD_PROPERTIES); |
3061 | + bool zeitgeist_up = proxy.name_has_owner (ZEITGEIST_DBUS_NAME); |
3062 | + // FIXME: throw an error that zeitgeist isn't up? or just start it? |
3063 | + bool name_owned = proxy.name_has_owner (DBUS_NAME); |
3064 | + if (name_owned) |
3065 | + { |
3066 | + throw new EngineError.EXISTING_INSTANCE ( |
3067 | + "The FTS daemon is running already."); |
3068 | + } |
3069 | + |
3070 | + /* setup Engine instance and register objects on dbus */ |
3071 | + try |
3072 | + { |
3073 | + instance = new FtsDaemon (); |
3074 | + instance.register_dbus_object (connection); |
3075 | + } |
3076 | + catch (Error err) |
3077 | + { |
3078 | + if (err is EngineError.DATABASE_CANTOPEN) |
3079 | + { |
3080 | + warning ("Could not access the database file.\n" + |
3081 | + "Please check the permissions of file %s.", |
3082 | + Utils.get_database_file_path ()); |
3083 | + } |
3084 | + else if (err is EngineError.DATABASE_BUSY) |
3085 | + { |
3086 | + warning ("It looks like another Zeitgeist instance " + |
3087 | + "is already running (the database is locked)."); |
3088 | + } |
3089 | + throw err; |
3090 | + } |
3091 | + |
3092 | + uint owner_id = Bus.own_name_on_connection (connection, |
3093 | + DBUS_NAME, |
3094 | + BusNameOwnerFlags.NONE, |
3095 | + name_acquired_callback, |
3096 | + name_lost_callback); |
3097 | + |
3098 | + mainloop = new MainLoop (); |
3099 | + mainloop.run (); |
3100 | + |
3101 | + if (instance != null) |
3102 | + { |
3103 | + Bus.unown_name (owner_id); |
3104 | + instance.unregister_dbus_object (); |
3105 | + instance = null; |
3106 | + |
3107 | + // make sure we send quit reply |
3108 | + try |
3109 | + { |
3110 | + connection.flush_sync (); |
3111 | + } |
3112 | + catch (Error e) |
3113 | + { |
3114 | + warning ("%s", e.message); |
3115 | + } |
3116 | + } |
3117 | + } |
3118 | + |
3119 | + static void safe_exit () |
3120 | + { |
3121 | + instance.do_quit (); |
3122 | + } |
3123 | + |
3124 | + static int main (string[] args) |
3125 | + { |
3126 | + // FIXME: the cat process xapian spawns won't like this and we |
3127 | + // can freeze if it dies |
3128 | + Posix.signal (Posix.SIGHUP, safe_exit); |
3129 | + Posix.signal (Posix.SIGINT, safe_exit); |
3130 | + Posix.signal (Posix.SIGTERM, safe_exit); |
3131 | + |
3132 | + var opt_context = new OptionContext (" - Zeitgeist FTS daemon"); |
3133 | + opt_context.add_main_entries (options, null); |
3134 | + |
3135 | + try |
3136 | + { |
3137 | + opt_context.parse (ref args); |
3138 | + |
3139 | + if (show_version_info) |
3140 | + { |
3141 | + stdout.printf (Config.VERSION + "\n"); |
3142 | + return 0; |
3143 | + } |
3144 | + |
3145 | + LogLevelFlags discarded = LogLevelFlags.LEVEL_DEBUG; |
3146 | + if (log_level != null) |
3147 | + { |
3148 | + var ld = LogLevelFlags.LEVEL_DEBUG; |
3149 | + var li = LogLevelFlags.LEVEL_INFO; |
3150 | + var lm = LogLevelFlags.LEVEL_MESSAGE; |
3151 | + var lw = LogLevelFlags.LEVEL_WARNING; |
3152 | + var lc = LogLevelFlags.LEVEL_CRITICAL; |
3153 | + switch (log_level.up ()) |
3154 | + { |
3155 | + case "DEBUG": |
3156 | + discarded = 0; |
3157 | + break; |
3158 | + case "INFO": |
3159 | + discarded = ld; |
3160 | + break; |
3161 | + case "WARNING": |
3162 | + discarded = ld | li | lm; |
3163 | + break; |
3164 | + case "CRITICAL": |
3165 | + discarded = ld | li | lm | lw; |
3166 | + break; |
3167 | + case "ERROR": |
3168 | + discarded = ld | li | lm | lw | lc; |
3169 | + break; |
3170 | + } |
3171 | + } |
3172 | + if (discarded != 0) |
3173 | + { |
3174 | + Log.set_handler ("", discarded, () => {}); |
3175 | + } |
3176 | + else |
3177 | + { |
3178 | + Environment.set_variable ("G_MESSAGES_DEBUG", "all", true); |
3179 | + } |
3180 | + |
3181 | + run (); |
3182 | + } |
3183 | + catch (Error err) |
3184 | + { |
3185 | + if (err is EngineError.DATABASE_CANTOPEN) |
3186 | + return 21; |
3187 | + if (err is EngineError.DATABASE_BUSY) |
3188 | + return 22; |
3189 | + |
3190 | + warning ("%s", err.message); |
3191 | + return 1; |
3192 | + } |
3193 | + |
3194 | + return 0; |
3195 | + } |
3196 | + |
3197 | + } |
3198 | + |
3199 | +} |
3200 | + |
3201 | +// vim:expandtab:ts=4:sw=4 |
3202 | |
3203 | === removed directory 'extensions/fts-python' |
3204 | === removed file 'extensions/fts-python/Makefile.am' |
3205 | --- extensions/fts-python/Makefile.am 2011-11-01 20:26:36 +0000 |
3206 | +++ extensions/fts-python/Makefile.am 1970-01-01 00:00:00 +0000 |
3207 | @@ -1,23 +0,0 @@ |
3208 | -NULL = |
3209 | - |
3210 | -ftsdir = $(pkgdatadir)/fts-python |
3211 | -dist_fts_SCRIPTS = \ |
3212 | - fts.py \ |
3213 | - $(NULL) |
3214 | - |
3215 | -dist_fts_DATA = \ |
3216 | - datamodel.py \ |
3217 | - constants.py \ |
3218 | - lrucache.py \ |
3219 | - sql.py \ |
3220 | - $(NULL) |
3221 | - |
3222 | -servicedir = $(DBUS_SERVICES_DIR) |
3223 | -service_DATA = org.gnome.zeitgeist.fts.service |
3224 | - |
3225 | -org.gnome.zeitgeist.fts.service: org.gnome.zeitgeist.fts.service.in |
3226 | - $(AM_V_GEN)sed -e s!\@pkgdatadir\@!$(pkgdatadir)! < $< > $@ |
3227 | -org.gnome.zeitgeist.fts.service: Makefile |
3228 | - |
3229 | -EXTRA_DIST = org.gnome.zeitgeist.fts.service.in |
3230 | -CLEANFILES = org.gnome.zeitgeist.fts.service |
3231 | |
3232 | === removed file 'extensions/fts-python/constants.py' |
3233 | --- extensions/fts-python/constants.py 2011-10-31 15:28:09 +0000 |
3234 | +++ extensions/fts-python/constants.py 1970-01-01 00:00:00 +0000 |
3235 | @@ -1,71 +0,0 @@ |
3236 | -# -.- coding: utf-8 -.- |
3237 | - |
3238 | -# Zeitgeist |
3239 | -# |
3240 | -# Copyright © 2009 Markus Korn <thekorn@gmx.de> |
3241 | -# Copyright © 2009-2010 Siegfried-Angel Gevatter Pujals <rainct@ubuntu.com> |
3242 | -# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
3243 | -# |
3244 | -# This program is free software: you can redistribute it and/or modify |
3245 | -# it under the terms of the GNU Lesser General Public License as published by |
3246 | -# the Free Software Foundation, either version 2.1 of the License, or |
3247 | -# (at your option) any later version. |
3248 | -# |
3249 | -# This program is distributed in the hope that it will be useful, |
3250 | -# but WITHOUT ANY WARRANTY; without even the implied warranty of |
3251 | -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
3252 | -# GNU Lesser General Public License for more details. |
3253 | -# |
3254 | -# You should have received a copy of the GNU Lesser General Public License |
3255 | -# along with this program. If not, see <http://www.gnu.org/licenses/>. |
3256 | - |
3257 | -import os |
3258 | -import logging |
3259 | -from xdg import BaseDirectory |
3260 | - |
3261 | -from zeitgeist.client import ZeitgeistDBusInterface |
3262 | - |
3263 | -__all__ = [ |
3264 | - "log", |
3265 | - "get_engine", |
3266 | - "constants" |
3267 | -] |
3268 | - |
3269 | -log = logging.getLogger("zeitgeist.engine") |
3270 | - |
3271 | -_engine = None |
3272 | -def get_engine(): |
3273 | - """ Get the running engine instance or create a new one. """ |
3274 | - global _engine |
3275 | - if _engine is None or _engine.is_closed(): |
3276 | - import main # _zeitgeist.engine.main |
3277 | - _engine = main.ZeitgeistEngine() |
3278 | - return _engine |
3279 | - |
3280 | -class _Constants: |
3281 | - # Directories |
3282 | - DATA_PATH = os.environ.get("ZEITGEIST_DATA_PATH", |
3283 | - BaseDirectory.save_data_path("zeitgeist")) |
3284 | - DATABASE_FILE = os.environ.get("ZEITGEIST_DATABASE_PATH", |
3285 | - os.path.join(DATA_PATH, "activity.sqlite")) |
3286 | - DATABASE_FILE_BACKUP = os.environ.get("ZEITGEIST_DATABASE_BACKUP_PATH", |
3287 | - os.path.join(DATA_PATH, "activity.sqlite.bck")) |
3288 | - DEFAULT_LOG_PATH = os.path.join(BaseDirectory.xdg_cache_home, |
3289 | - "zeitgeist", "daemon.log") |
3290 | - |
3291 | - # D-Bus |
3292 | - DBUS_INTERFACE = ZeitgeistDBusInterface.INTERFACE_NAME |
3293 | - SIG_EVENT = "asaasay" |
3294 | - |
3295 | - # Required version of DB schema |
3296 | - CORE_SCHEMA="core" |
3297 | - CORE_SCHEMA_VERSION = 4 |
3298 | - |
3299 | - USER_EXTENSION_PATH = os.path.join(DATA_PATH, "extensions") |
3300 | - |
3301 | - # configure runtime cache for events |
3302 | - # default size is 2000 |
3303 | - CACHE_SIZE = int(os.environ.get("ZEITGEIST_CACHE_SIZE", 2000)) |
3304 | - log.debug("Cache size = %i" %CACHE_SIZE) |
3305 | - |
3306 | -constants = _Constants() |
3307 | |
3308 | === removed file 'extensions/fts-python/datamodel.py' |
3309 | --- extensions/fts-python/datamodel.py 2011-10-10 14:07:42 +0000 |
3310 | +++ extensions/fts-python/datamodel.py 1970-01-01 00:00:00 +0000 |
3311 | @@ -1,83 +0,0 @@ |
3312 | -# -.- coding: utf-8 -.- |
3313 | - |
3314 | -# Zeitgeist |
3315 | -# |
3316 | -# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
3317 | -# Copyright © 2009 Markus Korn <thekorn@gmx.de> |
3318 | -# Copyright © 2009 Seif Lotfy <seif@lotfy.com> |
3319 | -# Copyright © 2009-2010 Siegfried-Angel Gevatter Pujals <rainct@ubuntu.com> |
3320 | -# |
3321 | -# This program is free software: you can redistribute it and/or modify |
3322 | -# it under the terms of the GNU Lesser General Public License as published by |
3323 | -# the Free Software Foundation, either version 2.1 of the License, or |
3324 | -# (at your option) any later version. |
3325 | -# |
3326 | -# This program is distributed in the hope that it will be useful, |
3327 | -# but WITHOUT ANY WARRANTY; without even the implied warranty of |
3328 | -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
3329 | -# GNU Lesser General Public License for more details. |
3330 | -# |
3331 | -# You should have received a copy of the GNU Lesser General Public License |
3332 | -# along with this program. If not, see <http://www.gnu.org/licenses/>. |
3333 | - |
3334 | -from zeitgeist.datamodel import Event as OrigEvent, Subject as OrigSubject, \ |
3335 | - DataSource as OrigDataSource |
3336 | - |
3337 | -class Event(OrigEvent): |
3338 | - |
3339 | - @staticmethod |
3340 | - def _to_unicode(obj): |
3341 | - """ |
3342 | - Return an unicode representation of the given object. |
3343 | - If obj is None, return an empty string. |
3344 | - """ |
3345 | - return unicode(obj) if obj is not None else u"" |
3346 | - |
3347 | - @staticmethod |
3348 | - def _make_dbus_sendable(obj): |
3349 | - """ |
3350 | - Ensure that all fields in the event struct are non-None |
3351 | - """ |
3352 | - for n, value in enumerate(obj[0]): |
3353 | - obj[0][n] = obj._to_unicode(value) |
3354 | - for subject in obj[1]: |
3355 | - for n, value in enumerate(subject): |
3356 | - subject[n] = obj._to_unicode(value) |
3357 | - # The payload require special handling, since it is binary data |
3358 | - # If there is indeed data here, we must not unicode encode it! |
3359 | - if obj[2] is None: |
3360 | - obj[2] = u"" |
3361 | - elif isinstance(obj[2], unicode): |
3362 | - obj[2] = str(obj[2]) |
3363 | - return obj |
3364 | - |
3365 | - @staticmethod |
3366 | - def get_plain(ev): |
3367 | - """ |
3368 | - Ensure that an Event instance is a Plain Old Python Object (popo), |
3369 | - without DBus wrappings etc. |
3370 | - """ |
3371 | - popo = [] |
3372 | - popo.append(map(unicode, ev[0])) |
3373 | - popo.append([map(unicode, subj) for subj in ev[1]]) |
3374 | - # We need the check here so that if D-Bus gives us an empty |
3375 | - # byte array we don't serialize the text "dbus.Array(...)". |
3376 | - popo.append(str(ev[2]) if ev[2] else u'') |
3377 | - return popo |
3378 | - |
3379 | -class Subject(OrigSubject): |
3380 | - pass |
3381 | - |
3382 | -class DataSource(OrigDataSource): |
3383 | - |
3384 | - @staticmethod |
3385 | - def get_plain(datasource): |
3386 | - for plaintype, props in { |
3387 | - unicode: (DataSource.Name, DataSource.Description), |
3388 | - lambda x: map(Event.get_plain, x): (DataSource.EventTemplates,), |
3389 | - bool: (DataSource.Running, DataSource.Enabled), |
3390 | - int: (DataSource.LastSeen,), |
3391 | - }.iteritems(): |
3392 | - for prop in props: |
3393 | - datasource[prop] = plaintype(datasource[prop]) |
3394 | - return tuple(datasource) |
3395 | |
3396 | === removed file 'extensions/fts-python/fts.py' |
3397 | --- extensions/fts-python/fts.py 2012-01-06 10:11:45 +0000 |
3398 | +++ extensions/fts-python/fts.py 1970-01-01 00:00:00 +0000 |
3399 | @@ -1,1273 +0,0 @@ |
3400 | -#!/usr/bin/env python |
3401 | -# -.- coding: utf-8 -.- |
3402 | - |
3403 | -# Zeitgeist |
3404 | -# |
3405 | -# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
3406 | -# Copyright © 2010 Canonical Ltd |
3407 | -# |
3408 | -# This program is free software: you can redistribute it and/or modify |
3409 | -# it under the terms of the GNU Lesser General Public License as published by |
3410 | -# the Free Software Foundation, either version 3 of the License, or |
3411 | -# (at your option) any later version. |
3412 | -# |
3413 | -# This program is distributed in the hope that it will be useful, |
3414 | -# but WITHOUT ANY WARRANTY; without even the implied warranty of |
3415 | -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
3416 | -# GNU Lesser General Public License for more details. |
3417 | -# |
3418 | -# You should have received a copy of the GNU Lesser General Public License |
3419 | -# along with this program. If not, see <http://www.gnu.org/licenses/>. |
3420 | -# |
3421 | - |
3422 | -# |
3423 | -# TODO |
3424 | -# |
3425 | -# - Delete events hook |
3426 | -# - ? Filter on StorageState |
3427 | -# - Throttle IO and CPU where possible |
3428 | - |
3429 | -import os, sys |
3430 | -import time |
3431 | -import pickle |
3432 | -import dbus |
3433 | -import sqlite3 |
3434 | -import dbus.service |
3435 | -from xdg import BaseDirectory |
3436 | -from xdg.DesktopEntry import DesktopEntry, xdg_data_dirs |
3437 | -import logging |
3438 | -import subprocess |
3439 | -from xml.dom import minidom |
3440 | -import xapian |
3441 | -import os |
3442 | -from Queue import Queue, Empty |
3443 | -import threading |
3444 | -from urllib import quote as url_escape, unquote as url_unescape |
3445 | -import gobject, gio |
3446 | -from cStringIO import StringIO |
3447 | - |
3448 | -from collections import defaultdict |
3449 | -from array import array |
3450 | -from zeitgeist.datamodel import Event as OrigEvent, StorageState, TimeRange, \ |
3451 | - ResultType, get_timestamp_for_now, Interpretation, Symbol, NEGATION_OPERATOR, WILDCARD, NULL_EVENT |
3452 | -from datamodel import Event, Subject |
3453 | -from constants import constants |
3454 | -from zeitgeist.client import ZeitgeistClient, ZeitgeistDBusInterface |
3455 | -from sql import get_default_cursor, unset_cursor, TableLookup, WhereClause |
3456 | -from lrucache import LRUCache |
3457 | - |
3458 | -ZG_CLIENT = ZeitgeistClient() |
3459 | - |
3460 | -logging.basicConfig(level=logging.DEBUG) |
3461 | -log = logging.getLogger("zeitgeist.fts") |
3462 | - |
3463 | -INDEX_FILE = os.path.join(constants.DATA_PATH, "bb.fts.index") |
3464 | -INDEX_VERSION = "1" |
3465 | -INDEX_LOCK = threading.Lock() |
3466 | -FTS_DBUS_BUS_NAME = "org.gnome.zeitgeist.SimpleIndexer" |
3467 | -FTS_DBUS_OBJECT_PATH = "/org/gnome/zeitgeist/index/activity" |
3468 | -FTS_DBUS_INTERFACE = "org.gnome.zeitgeist.Index" |
3469 | - |
3470 | -FILTER_PREFIX_EVENT_INTERPRETATION = "ZGEI" |
3471 | -FILTER_PREFIX_EVENT_MANIFESTATION = "ZGEM" |
3472 | -FILTER_PREFIX_ACTOR = "ZGA" |
3473 | -FILTER_PREFIX_SUBJECT_URI = "ZGSU" |
3474 | -FILTER_PREFIX_SUBJECT_INTERPRETATION = "ZGSI" |
3475 | -FILTER_PREFIX_SUBJECT_MANIFESTATION = "ZGSM" |
3476 | -FILTER_PREFIX_SUBJECT_ORIGIN = "ZGSO" |
3477 | -FILTER_PREFIX_SUBJECT_MIMETYPE = "ZGST" |
3478 | -FILTER_PREFIX_SUBJECT_STORAGE = "ZGSS" |
3479 | -FILTER_PREFIX_XDG_CATEGORY = "AC" |
3480 | - |
3481 | -VALUE_EVENT_ID = 0 |
3482 | -VALUE_TIMESTAMP = 1 |
3483 | - |
3484 | -MAX_CACHE_BATCH_SIZE = constants.CACHE_SIZE/2 |
3485 | - |
3486 | -# When sorting by of the COALESCING_RESULT_TYPES result types, |
3487 | -# we need to fetch some extra events from the Xapian index because |
3488 | -# the final result set will be coalesced on some property of the event |
3489 | -COALESCING_RESULT_TYPES = [ \ |
3490 | - ResultType.MostRecentSubjects, |
3491 | - ResultType.LeastRecentSubjects, |
3492 | - ResultType.MostPopularSubjects, |
3493 | - ResultType.LeastPopularSubjects, |
3494 | - ResultType.MostRecentActor, |
3495 | - ResultType.LeastRecentActor, |
3496 | - ResultType.MostPopularActor, |
3497 | - ResultType.LeastPopularActor, |
3498 | -] |
3499 | - |
3500 | -MAX_TERM_LENGTH = 245 |
3501 | - |
3502 | - |
3503 | -class NegationNotSupported(ValueError): |
3504 | - pass |
3505 | - |
3506 | -class WildcardNotSupported(ValueError): |
3507 | - pass |
3508 | - |
3509 | -def parse_negation(kind, field, value, parse_negation=True): |
3510 | - """checks if value starts with the negation operator, |
3511 | - if value starts with the negation operator but the field does |
3512 | - not support negation a ValueError is raised. |
3513 | - This function returns a (value_without_negation, negation)-tuple |
3514 | - """ |
3515 | - negation = False |
3516 | - if parse_negation and value.startswith(NEGATION_OPERATOR): |
3517 | - negation = True |
3518 | - value = value[len(NEGATION_OPERATOR):] |
3519 | - if negation and field not in kind.SUPPORTS_NEGATION: |
3520 | - raise NegationNotSupported("This field does not support negation") |
3521 | - return value, negation |
3522 | - |
3523 | -def parse_wildcard(kind, field, value): |
3524 | - """checks if value ends with the a wildcard, |
3525 | - if value ends with a wildcard but the field does not support wildcards |
3526 | - a ValueError is raised. |
3527 | - This function returns a (value_without_wildcard, wildcard)-tuple |
3528 | - """ |
3529 | - wildcard = False |
3530 | - if value.endswith(WILDCARD): |
3531 | - wildcard = True |
3532 | - value = value[:-len(WILDCARD)] |
3533 | - if wildcard and field not in kind.SUPPORTS_WILDCARDS: |
3534 | - raise WildcardNotSupported("This field does not support wildcards") |
3535 | - return value, wildcard |
3536 | - |
3537 | -def parse_operators(kind, field, value): |
3538 | - """runs both (parse_negation and parse_wildcard) parser functions |
3539 | - on query values, and handles the special case of Subject.Text correctly. |
3540 | - returns a (value_without_negation_and_wildcard, negation, wildcard)-tuple |
3541 | - """ |
3542 | - try: |
3543 | - value, negation = parse_negation(kind, field, value) |
3544 | - except ValueError: |
3545 | - if kind is Subject and field == Subject.Text: |
3546 | - # we do not support negation of the text field, |
3547 | - # the text field starts with the NEGATION_OPERATOR |
3548 | - # so we handle this string as the content instead |
3549 | - # of an operator |
3550 | - negation = False |
3551 | - else: |
3552 | - raise |
3553 | - value, wildcard = parse_wildcard(kind, field, value) |
3554 | - return value, negation, wildcard |
3555 | - |
3556 | - |
3557 | -def synchronized(lock): |
3558 | - """ Synchronization decorator. """ |
3559 | - def wrap(f): |
3560 | - def newFunction(*args, **kw): |
3561 | - lock.acquire() |
3562 | - try: |
3563 | - return f(*args, **kw) |
3564 | - finally: |
3565 | - lock.release() |
3566 | - return newFunction |
3567 | - return wrap |
3568 | - |
3569 | -class Deletion: |
3570 | - """ |
3571 | - A marker class that marks an event id for deletion |
3572 | - """ |
3573 | - def __init__ (self, event_id): |
3574 | - self.event_id = event_id |
3575 | - |
3576 | -class Reindex: |
3577 | - """ |
3578 | - Marker class that tells the worker thread to rebuild the entire index. |
3579 | - On construction time all events are pulled out of the zg_engine |
3580 | - argument and stored for later processing in the worker thread. |
3581 | - This avoid concurrent access to the ZG sqlite db from the worker thread. |
3582 | - """ |
3583 | - def __init__ (self, zg_engine): |
3584 | - all_events = zg_engine._find_events(1, TimeRange.always(), |
3585 | - [], StorageState.Any, |
3586 | - sys.maxint, |
3587 | - ResultType.MostRecentEvents) |
3588 | - self.all_events = all_events |
3589 | - |
3590 | -class SearchEngineExtension (dbus.service.Object): |
3591 | - """ |
3592 | - Full text indexing and searching extension for Zeitgeist |
3593 | - """ |
3594 | - PUBLIC_METHODS = [] |
3595 | - |
3596 | - def __init__ (self): |
3597 | - bus_name = dbus.service.BusName(FTS_DBUS_BUS_NAME, bus=dbus.SessionBus()) |
3598 | - dbus.service.Object.__init__(self, bus_name, FTS_DBUS_OBJECT_PATH) |
3599 | - self._indexer = Indexer() |
3600 | - |
3601 | - ZG_CLIENT.install_monitor((0, 2**63 - 1), [], |
3602 | - self.pre_insert_event, self.post_delete_events) |
3603 | - |
3604 | - def pre_insert_event(self, timerange, events): |
3605 | - for event in events: |
3606 | - self._indexer.index_event (event) |
3607 | - |
3608 | - def post_delete_events (self, ids): |
3609 | - for _id in ids: |
3610 | - self._indexer.delete_event (_id) |
3611 | - |
3612 | - @dbus.service.method(FTS_DBUS_INTERFACE, |
3613 | - in_signature="s(xx)a("+constants.SIG_EVENT+")uuu", |
3614 | - out_signature="a("+constants.SIG_EVENT+")u") |
3615 | - def Search(self, query_string, time_range, filter_templates, offset, count, result_type): |
3616 | - """ |
3617 | - DBus method to perform a full text search against the contents of the |
3618 | - Zeitgeist log. Returns an array of events. |
3619 | - """ |
3620 | - time_range = TimeRange(time_range[0], time_range[1]) |
3621 | - filter_templates = map(Event, filter_templates) |
3622 | - events, hit_count = self._indexer.search(query_string, time_range, |
3623 | - filter_templates, |
3624 | - offset, count, result_type) |
3625 | - return self._make_events_sendable (events), hit_count |
3626 | - |
3627 | - @dbus.service.method(FTS_DBUS_INTERFACE, |
3628 | - in_signature="", |
3629 | - out_signature="") |
3630 | - def ForceReindex(self): |
3631 | - """ |
3632 | - DBus method to force a reindex of the entire Zeitgeist log. |
3633 | - This method is only intended for debugging purposes and is not |
3634 | - considered blessed public API. |
3635 | - """ |
3636 | - log.debug ("Received ForceReindex request over DBus.") |
3637 | - self._indexer._queue.put (Reindex (self._indexer)) |
3638 | - |
3639 | - def _make_events_sendable(self, events): |
3640 | - return [NULL_EVENT if event is None else Event._make_dbus_sendable(event) for event in events] |
3641 | - |
3642 | -def mangle_uri (uri): |
3643 | - """ |
3644 | - Converts a URI into an index- and query friendly string. The problem |
3645 | - is that Xapian doesn't handle CAPITAL letters or most non-alphanumeric |
3646 | - symbols in a boolean term when it does prefix matching. The mangled |
3647 | - URIs returned from this function are suitable for boolean prefix searches. |
3648 | - |
3649 | - IMPORTANT: This is a 1-way function! You can not convert back. |
3650 | - """ |
3651 | - result = "" |
3652 | - for c in uri.lower(): |
3653 | - if c in (": /"): |
3654 | - result += "_" |
3655 | - else: |
3656 | - result += c |
3657 | - return result |
3658 | - |
3659 | -def cap_string (s, nbytes=MAX_TERM_LENGTH): |
3660 | - """ |
3661 | - If s has more than nbytes bytes (not characters) then cap it off |
3662 | - after nbytes bytes in a way still producing a valid utf-8 string. |
3663 | - |
3664 | - Assumes that s is a utf-8 string. |
3665 | - |
3666 | - This function useful for working with Xapian terms because Xapian has |
3667 | - a max term length of 245 (which is not very well documented, but see |
3668 | - http://xapian.org/docs/omega/termprefixes.html). |
3669 | - """ |
3670 | - # Check if we can fast-path this string |
3671 | - if (len(s.encode("utf-8")) <= nbytes): |
3672 | - return s |
3673 | - |
3674 | - # We use a StringIO here to avoid mem thrashing via naiive |
3675 | - # string concatenation. See fx. http://www.skymind.com/~ocrow/python_string/ |
3676 | - buf = StringIO() |
3677 | - for char in s : |
3678 | - if buf.tell() >= nbytes - 1 : |
3679 | - return buf.getvalue() |
3680 | - buf.write(char.encode("utf-8")) |
3681 | - |
3682 | - return unicode(buf.getvalue().decode("utf-8")) |
3683 | - |
3684 | - |
3685 | -def expand_type (type_prefix, uri): |
3686 | - """ |
3687 | - Return a string with a Xapian query matching all child types of 'uri' |
3688 | - inside the Xapian prefix 'type_prefix'. |
3689 | - """ |
3690 | - is_negation = uri.startswith(NEGATION_OPERATOR) |
3691 | - uri = uri[1:] if is_negation else uri |
3692 | - children = Symbol.find_child_uris_extended(uri) |
3693 | - children = [ "%s:%s" % (type_prefix, child) for child in children ] |
3694 | - |
3695 | - result = " OR ".join(children) |
3696 | - return result if not is_negation else "NOT (%s)" % result |
3697 | - |
3698 | -class Indexer: |
3699 | - """ |
3700 | - Abstraction of the FT indexer and search engine |
3701 | - """ |
3702 | - |
3703 | - QUERY_PARSER_FLAGS = xapian.QueryParser.FLAG_PHRASE | \ |
3704 | - xapian.QueryParser.FLAG_BOOLEAN | \ |
3705 | - xapian.QueryParser.FLAG_PURE_NOT | \ |
3706 | - xapian.QueryParser.FLAG_LOVEHATE | \ |
3707 | - xapian.QueryParser.FLAG_WILDCARD |
3708 | - |
3709 | - def __init__ (self): |
3710 | - |
3711 | - self._cursor = cursor = get_default_cursor() |
3712 | - os.environ["XAPIAN_CJK_NGRAM"] = "1" |
3713 | - self._interpretation = TableLookup(cursor, "interpretation") |
3714 | - self._manifestation = TableLookup(cursor, "manifestation") |
3715 | - self._mimetype = TableLookup(cursor, "mimetype") |
3716 | - self._actor = TableLookup(cursor, "actor") |
3717 | - self._event_cache = LRUCache(constants.CACHE_SIZE) |
3718 | - |
3719 | - log.debug("Opening full text index: %s" % INDEX_FILE) |
3720 | - try: |
3721 | - self._index = xapian.WritableDatabase(INDEX_FILE, xapian.DB_CREATE_OR_OPEN) |
3722 | - except xapian.DatabaseError, e: |
3723 | - log.warn("Full text index corrupted: '%s'. Rebuilding index." % e) |
3724 | - self._index = xapian.WritableDatabase(INDEX_FILE, xapian.DB_CREATE_OR_OVERWRITE) |
3725 | - self._tokenizer = indexer = xapian.TermGenerator() |
3726 | - self._query_parser = xapian.QueryParser() |
3727 | - self._query_parser.set_database (self._index) |
3728 | - self._query_parser.add_prefix("name", "N") |
3729 | - self._query_parser.add_prefix("title", "N") |
3730 | - self._query_parser.add_prefix("site", "S") |
3731 | - self._query_parser.add_prefix("app", "A") |
3732 | - self._query_parser.add_boolean_prefix("zgei", FILTER_PREFIX_EVENT_INTERPRETATION) |
3733 | - self._query_parser.add_boolean_prefix("zgem", FILTER_PREFIX_EVENT_MANIFESTATION) |
3734 | - self._query_parser.add_boolean_prefix("zga", FILTER_PREFIX_ACTOR) |
3735 | - self._query_parser.add_prefix("zgsu", FILTER_PREFIX_SUBJECT_URI) |
3736 | - self._query_parser.add_boolean_prefix("zgsi", FILTER_PREFIX_SUBJECT_INTERPRETATION) |
3737 | - self._query_parser.add_boolean_prefix("zgsm", FILTER_PREFIX_SUBJECT_MANIFESTATION) |
3738 | - self._query_parser.add_prefix("zgso", FILTER_PREFIX_SUBJECT_ORIGIN) |
3739 | - self._query_parser.add_boolean_prefix("zgst", FILTER_PREFIX_SUBJECT_MIMETYPE) |
3740 | - self._query_parser.add_boolean_prefix("zgss", FILTER_PREFIX_SUBJECT_STORAGE) |
3741 | - self._query_parser.add_prefix("category", FILTER_PREFIX_XDG_CATEGORY) |
3742 | - self._query_parser.add_valuerangeprocessor( |
3743 | - xapian.NumberValueRangeProcessor(VALUE_EVENT_ID, "id", True)) |
3744 | - self._query_parser.add_valuerangeprocessor( |
3745 | - xapian.NumberValueRangeProcessor(VALUE_TIMESTAMP, "ms", False)) |
3746 | - self._query_parser.set_default_op(xapian.Query.OP_AND) |
3747 | - self._enquire = xapian.Enquire(self._index) |
3748 | - |
3749 | - self._desktops = {} |
3750 | - |
3751 | - gobject.threads_init() |
3752 | - self._may_run = True |
3753 | - self._queue = Queue(0) |
3754 | - self._worker = threading.Thread(target=self._worker_thread, |
3755 | - name="IndexWorker") |
3756 | - self._worker.daemon = True |
3757 | - |
3758 | - # We need to defer the index checking until after ZG has completed |
3759 | - # full setup. Hence the idle handler. |
3760 | - # We also don't start the worker until after we've checked the index |
3761 | - gobject.idle_add (self._check_index_and_start_worker) |
3762 | - |
3763 | - @synchronized (INDEX_LOCK) |
3764 | - def _check_index_and_start_worker (self): |
3765 | - """ |
3766 | - Check whether we need a rebuild of the index. |
3767 | - Returns True if the index is good. False if a reindexing has |
3768 | - been commenced. |
3769 | - |
3770 | - This method should be called from the main thread and only once. |
3771 | - It starts the worker thread as a side effect. |
3772 | - |
3773 | - We are clearing the queue, because there may be a race when an |
3774 | - event insertion / deletion is already queued and our index |
3775 | - is corrupted. Creating a new queue instance should be safe, |
3776 | - because we're running in main thread as are the index_event |
3777 | - and delete_event methods, and the worker thread wasn't yet |
3778 | - started. |
3779 | - """ |
3780 | - if self._index.get_metadata("fts_index_version") != INDEX_VERSION: |
3781 | - log.info("Index must be upgraded. Doing full rebuild") |
3782 | - self._queue = Queue(0) |
3783 | - self._queue.put(Reindex(self)) |
3784 | - elif self._index.get_doccount() == 0: |
3785 | - # If the index is empty we trigger a rebuild |
3786 | - # We must delay reindexing until after the engine is done setting up |
3787 | - log.info("Empty index detected. Doing full rebuild") |
3788 | - self._queue = Queue(0) |
3789 | - self._queue.put(Reindex(self)) |
3790 | - |
3791 | - # Now that we've checked the index from the main thread we can start the worker |
3792 | - self._worker.start() |
3793 | - |
3794 | - def index_event (self, event): |
3795 | - """ |
3796 | - This method schedules and event for indexing. It returns immediate and |
3797 | - defers the actual work to a bottom half thread. This means that it |
3798 | - will not block the main loop of the Zeitgeist daemon while indexing |
3799 | - (which may be a heavy operation) |
3800 | - """ |
3801 | - self._queue.put (event) |
3802 | - return event |
3803 | - |
3804 | - def delete_event (self, event_id): |
3805 | - """ |
3806 | - Remove an event from the index given its event id |
3807 | - """ |
3808 | - self._queue.put (Deletion(event_id)) |
3809 | - return |
3810 | - |
3811 | - @synchronized (INDEX_LOCK) |
3812 | - def search (self, query_string, time_range=None, filters=None, offset=0, maxhits=10, result_type=100): |
3813 | - """ |
3814 | - Do a full text search over the indexed corpus. The `result_type` |
3815 | - parameter may be a zeitgeist.datamodel.ResultType or 100. In case it is |
3816 | - 100 the textual relevancy of the search engine will be used to sort the |
3817 | - results. Result type 100 is the fastest (and default) mode. |
3818 | - |
3819 | - The filters argument should be a list of event templates. |
3820 | - """ |
3821 | - # Expand event template filters if necessary |
3822 | - if filters: |
3823 | - query_string = "(%s) AND (%s)" % (query_string, self._compile_event_filter_query (filters)) |
3824 | - |
3825 | - # Expand time range value query |
3826 | - if time_range and not time_range.is_always(): |
3827 | - query_string = "(%s) AND (%s)" % (query_string, self._compile_time_range_filter_query (time_range)) |
3828 | - |
3829 | - # If the result type coalesces the events we need to fetch some extra |
3830 | - # events from the index to have a chance of actually holding 'maxhits' |
3831 | - # unique events |
3832 | - if result_type in COALESCING_RESULT_TYPES: |
3833 | - raw_maxhits = maxhits * 3 |
3834 | - else: |
3835 | - raw_maxhits = maxhits |
3836 | - |
3837 | - # When not sorting by relevance, we fetch the results from Xapian sorted, |
3838 | - # by timestamp. That minimizes the skew we get from otherwise doing a |
3839 | - # relevancy ranked xapaian query and then resorting with Zeitgeist. The |
3840 | - # "skew" is that low-relevancy results may still have the highest timestamp |
3841 | - if result_type == 100: |
3842 | - self._enquire.set_sort_by_relevance() |
3843 | - else: |
3844 | - self._enquire.set_sort_by_value(VALUE_TIMESTAMP, True) |
3845 | - |
3846 | - # Allow wildcards |
3847 | - query_start = time.time() |
3848 | - query = self._query_parser.parse_query (query_string, |
3849 | - self.QUERY_PARSER_FLAGS) |
3850 | - self._enquire.set_query (query) |
3851 | - hits = self._enquire.get_mset (offset, raw_maxhits) |
3852 | - hit_count = hits.get_matches_estimated() |
3853 | - log.debug("Search '%s' gave %s hits in %sms" % |
3854 | - (query_string, hits.get_matches_estimated(), (time.time() - query_start)*1000)) |
3855 | - |
3856 | - if result_type == 100: |
3857 | - event_ids = [] |
3858 | - for m in hits: |
3859 | - event_id = int(xapian.sortable_unserialise( |
3860 | - m.document.get_value(VALUE_EVENT_ID))) |
3861 | - event_ids.append (event_id) |
3862 | - if event_ids: |
3863 | - return self.get_events(event_ids), hit_count |
3864 | - else: |
3865 | - return [], 0 |
3866 | - else: |
3867 | - templates = [] |
3868 | - for m in hits: |
3869 | - event_id = int(xapian.sortable_unserialise( |
3870 | - m.document.get_value(VALUE_EVENT_ID))) |
3871 | - ev = Event() |
3872 | - ev[0][Event.Id] = str(event_id) |
3873 | - templates.append(ev) |
3874 | - if templates: |
3875 | - x = self._find_events(1, TimeRange.always(), |
3876 | - templates, |
3877 | - StorageState.Any, |
3878 | - maxhits, |
3879 | - result_type), hit_count |
3880 | - return x |
3881 | - else: |
3882 | - return [], 0 |
3883 | - |
3884 | - def _worker_thread (self): |
3885 | - is_dirty = False |
3886 | - while self._may_run: |
3887 | - # FIXME: Throttle IO and CPU |
3888 | - try: |
3889 | - # If we are dirty wait a while before we flush, |
3890 | - # or if we are clean wait indefinitely to avoid |
3891 | - # needless wakeups |
3892 | - if is_dirty: |
3893 | - event = self._queue.get(True, 0.5) |
3894 | - else: |
3895 | - event = self._queue.get(True) |
3896 | - |
3897 | - if isinstance (event, Deletion): |
3898 | - self._delete_event_real (event.event_id) |
3899 | - elif isinstance (event, Reindex): |
3900 | - self._reindex (event.all_events) |
3901 | - else: |
3902 | - self._index_event_real (event) |
3903 | - |
3904 | - is_dirty = True |
3905 | - except Empty: |
3906 | - if is_dirty: |
3907 | - # Write changes to disk |
3908 | - log.debug("Committing FTS index") |
3909 | - self._index.flush() |
3910 | - is_dirty = False |
3911 | - else: |
3912 | - log.debug("No changes to index. Sleeping") |
3913 | - |
3914 | - @synchronized (INDEX_LOCK) |
3915 | - def _reindex (self, event_list): |
3916 | - """ |
3917 | - Index everything in the ZG log. The argument must be a list |
3918 | - of events. Typically extracted by a Reindex instance. |
3919 | - Only call from worker thread as it writes to the db and Xapian |
3920 | - is *not* thread safe (only single-writer-multiple-reader). |
3921 | - """ |
3922 | - self._index.close () |
3923 | - self._index = xapian.WritableDatabase(INDEX_FILE, xapian.DB_CREATE_OR_OVERWRITE) |
3924 | - self._query_parser.set_database (self._index) |
3925 | - self._enquire = xapian.Enquire(self._index) |
3926 | - # Register that this index was built with CJK enabled |
3927 | - self._index.set_metadata("fts_index_version", INDEX_VERSION) |
3928 | - log.info("Preparing to rebuild index with %s events" % len(event_list)) |
3929 | - for e in event_list : self._queue.put(e) |
3930 | - |
3931 | - @synchronized (INDEX_LOCK) |
3932 | - def _delete_event_real (self, event_id): |
3933 | - """ |
3934 | - Look up the doc id given an event id and remove the xapian.Document |
3935 | - for that doc id. |
3936 | - Note: This is slow, but there's not much we can do about it |
3937 | - """ |
3938 | - try: |
3939 | - _id = xapian.sortable_serialise(float(event_id)) |
3940 | - query = xapian.Query(xapian.Query.OP_VALUE_RANGE, |
3941 | - VALUE_EVENT_ID, _id, _id) |
3942 | - |
3943 | - self._enquire.set_query (query) |
3944 | - hits = self._enquire.get_mset (0, 10) |
3945 | - |
3946 | - total = hits.get_matches_estimated() |
3947 | - if total > 1: |
3948 | - log.warning ("More than one event found with id '%s'" % event_id) |
3949 | - elif total <= 0: |
3950 | - log.debug ("No event for id '%s'" % event_id) |
3951 | - return |
3952 | - |
3953 | - for m in hits: |
3954 | - log.debug("Deleting event '%s' with docid '%s'" % |
3955 | - (event_id, m.docid)) |
3956 | - self._index.delete_document(m.docid) |
3957 | - except Exception, e: |
3958 | - log.error("Failed to delete event '%s': %s" % (event_id, e)) |
3959 | - |
3960 | - def _split_uri (self, uri): |
3961 | - """ |
3962 | - Returns a triple of (scheme, host, and path) extracted from `uri` |
3963 | - """ |
3964 | - i = uri.find(":") |
3965 | - if i == -1 : |
3966 | - scheme = "" |
3967 | - host = "" |
3968 | - path = uri |
3969 | - else: |
3970 | - scheme = uri[:i] |
3971 | - host = "" |
3972 | - path = "" |
3973 | - |
3974 | - if uri[i+1] == "/" and uri[i+2] == "/": |
3975 | - j = uri.find("/", i+3) |
3976 | - if j == -1 : |
3977 | - host = uri[i+3:] |
3978 | - else: |
3979 | - host = uri[i+3:j] |
3980 | - path = uri[j:] |
3981 | - else: |
3982 | - host = uri[i+1:] |
3983 | - |
3984 | - # Strip out URI query part |
3985 | - i = path.find("?") |
3986 | - if i != -1: |
3987 | - path = path[:i] |
3988 | - |
3989 | - return scheme, host, path |
3990 | - |
3991 | - def _get_desktop_entry (self, app_id): |
3992 | - """ |
3993 | - Return a xdg.DesktopEntry.DesktopEntry `app_id` or None in case |
3994 | - no file is found for the given desktop id |
3995 | - """ |
3996 | - if app_id in self._desktops: |
3997 | - return self._desktops[app_id] |
3998 | - |
3999 | - for datadir in xdg_data_dirs: |
4000 | - path = os.path.join(datadir, "applications", app_id) |
4001 | - if os.path.exists(path): |
4002 | - try: |
4003 | - desktop = DesktopEntry(path) |
4004 | - self._desktops[app_id] = desktop |
4005 | - return desktop |
4006 | - except Exception, e: |
4007 | - log.warning("Unable to load %s: %s" % (path, e)) |
4008 | - return None |
4009 | - |
4010 | - return None |
4011 | - |
4012 | - def _index_actor (self, actor): |
4013 | - """ |
4014 | - Takes an actor as a path to a .desktop file or app:// uri |
4015 | - and index the contents of the corresponding .desktop file |
4016 | - into the document currently set for self._tokenizer. |
4017 | - """ |
4018 | - if not actor : return |
4019 | - |
4020 | - # Get the path of the .desktop file and convert it to |
4021 | - # an app id (eg. 'gedit.desktop') |
4022 | - scheme, host, path = self._split_uri(url_unescape (actor)) |
4023 | - if not path: |
4024 | - path = host |
4025 | - |
4026 | - if not path : |
4027 | - log.debug("Unable to determine application id for %s" % actor) |
4028 | - return |
4029 | - |
4030 | - if path.startswith("/") : |
4031 | - path = os.path.basename(path) |
4032 | - |
4033 | - desktop = self._get_desktop_entry(path) |
4034 | - if desktop: |
4035 | - if not desktop.getNoDisplay(): |
4036 | - self._tokenizer.index_text(desktop.getName(), 5) |
4037 | - self._tokenizer.index_text(desktop.getName(), 5, "A") |
4038 | - self._tokenizer.index_text(desktop.getGenericName(), 5) |
4039 | - self._tokenizer.index_text(desktop.getGenericName(), 5, "A") |
4040 | - self._tokenizer.index_text(desktop.getComment(), 2) |
4041 | - self._tokenizer.index_text(desktop.getComment(), 2, "A") |
4042 | - |
4043 | - doc = self._tokenizer.get_document() |
4044 | - for cat in desktop.getCategories(): |
4045 | - doc.add_boolean_term(FILTER_PREFIX_XDG_CATEGORY+cat.lower()) |
4046 | - else: |
4047 | - log.debug("Unable to look up app info for %s" % actor) |
4048 | - |
4049 | - |
4050 | - def _index_uri (self, uri): |
4051 | - """ |
4052 | - Index `uri` into the document currectly set on self._tokenizer |
4053 | - """ |
4054 | - # File URIs and paths are indexed in one way, and all other, |
4055 | - # usually web URIs, are indexed in another way because there may |
4056 | - # be domain name etc. in there we want to rank differently |
4057 | - scheme, host, path = self._split_uri (url_unescape (uri)) |
4058 | - if scheme == "file" or not scheme: |
4059 | - path, name = os.path.split(path) |
4060 | - self._tokenizer.index_text(name, 5) |
4061 | - self._tokenizer.index_text(name, 5, "N") |
4062 | - |
4063 | - # Index parent names with descending weight |
4064 | - weight = 5 |
4065 | - while path and name: |
4066 | - weight = weight / 1.5 |
4067 | - path, name = os.path.split(path) |
4068 | - self._tokenizer.index_text(name, int(weight)) |
4069 | - |
4070 | - elif scheme == "mailto": |
4071 | - tokens = host.split("@") |
4072 | - name = tokens[0] |
4073 | - self._tokenizer.index_text(name, 6) |
4074 | - if len(tokens) > 1: |
4075 | - self._tokenizer.index_text(" ".join[1:], 1) |
4076 | - else: |
4077 | - # We're cautious about indexing the path components of |
4078 | - # non-file URIs as some websites practice *extremely* long |
4079 | - # and useless URLs |
4080 | - path, name = os.path.split(path) |
4081 | - if len(name) > 30 : name = name[:30] |
4082 | - if len(path) > 30 : path = path[30] |
4083 | - if name: |
4084 | - self._tokenizer.index_text(name, 5) |
4085 | - self._tokenizer.index_text(name, 5, "N") |
4086 | - if path: |
4087 | - self._tokenizer.index_text(path, 1) |
4088 | - self._tokenizer.index_text(path, 1, "N") |
4089 | - if host: |
4090 | - self._tokenizer.index_text(host, 2) |
4091 | - self._tokenizer.index_text(host, 2, "N") |
4092 | - self._tokenizer.index_text(host, 2, "S") |
4093 | - |
4094 | - def _index_text (self, text): |
4095 | - """ |
4096 | - Index `text` as raw text data for the document currently |
4097 | - set on self._tokenizer. The text is assumed to be a primary |
4098 | - description of the subject, such as the basename of a file. |
4099 | - |
4100 | - Primary use is for subject.text |
4101 | - """ |
4102 | - self._tokenizer.index_text(text, 5) |
4103 | - |
4104 | - def _index_contents (self, uri): |
4105 | - # xmlindexer doesn't extract words for URIs only for file paths |
4106 | - |
4107 | - # FIXME: IONICE and NICE on xmlindexer |
4108 | - |
4109 | - path = uri.replace("file://", "") |
4110 | - xmlindexer = subprocess.Popen(['xmlindexer', path], |
4111 | - stdout=subprocess.PIPE) |
4112 | - xml = xmlindexer.communicate()[0].strip() |
4113 | - xmlindexer.wait() |
4114 | - |
4115 | - dom = minidom.parseString(xml) |
4116 | - text_nodes = dom.getElementsByTagName("text") |
4117 | - lines = [] |
4118 | - if text_nodes: |
4119 | - for line in text_nodes[0].childNodes: |
4120 | - lines.append(line.data) |
4121 | - |
4122 | - if lines: |
4123 | - self._tokenizer.index_text (" ".join(lines)) |
4124 | - |
4125 | - |
4126 | - def _add_doc_filters (self, event, doc): |
4127 | - """Adds the filtering rules to the doc. Filtering rules will |
4128 | - not affect the relevancy ranking of the event/doc""" |
4129 | - if event.interpretation: |
4130 | - doc.add_boolean_term (cap_string(FILTER_PREFIX_EVENT_INTERPRETATION+event.interpretation)) |
4131 | - if event.manifestation: |
4132 | - doc.add_boolean_term (cap_string(FILTER_PREFIX_EVENT_MANIFESTATION+event.manifestation)) |
4133 | - if event.actor: |
4134 | - doc.add_boolean_term (cap_string(FILTER_PREFIX_ACTOR+mangle_uri(event.actor))) |
4135 | - |
4136 | - for su in event.subjects: |
4137 | - if su.uri: |
4138 | - doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_URI+mangle_uri(su.uri))) |
4139 | - if su.interpretation: |
4140 | - doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_INTERPRETATION+su.interpretation)) |
4141 | - if su.manifestation: |
4142 | - doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_MANIFESTATION+su.manifestation)) |
4143 | - if su.origin: |
4144 | - doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_ORIGIN+mangle_uri(su.origin))) |
4145 | - if su.mimetype: |
4146 | - doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_MIMETYPE+su.mimetype)) |
4147 | - if su.storage: |
4148 | - doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_STORAGE+su.storage)) |
4149 | - |
4150 | - @synchronized (INDEX_LOCK) |
4151 | - def _index_event_real (self, event): |
4152 | - if not isinstance (event, OrigEvent): |
4153 | - log.error("Not an Event, found: %s" % type(event)) |
4154 | - if not event.id: |
4155 | - log.warning("Not indexing event. Event has no id") |
4156 | - return |
4157 | - |
4158 | - try: |
4159 | - doc = xapian.Document() |
4160 | - doc.add_value (VALUE_EVENT_ID, |
4161 | - xapian.sortable_serialise(float(event.id))) |
4162 | - doc.add_value (VALUE_TIMESTAMP, |
4163 | - xapian.sortable_serialise(float(event.timestamp))) |
4164 | - self._tokenizer.set_document (doc) |
4165 | - |
4166 | - self._index_actor (event.actor) |
4167 | - |
4168 | - for subject in event.subjects: |
4169 | - if not subject.uri : continue |
4170 | - |
4171 | - # By spec URIs can have arbitrary length. In reality that's just silly. |
4172 | - # The general online "rule" is to keep URLs less than 2k so we just |
4173 | - # choose to enforce that |
4174 | - if len(subject.uri) > 2000: |
4175 | - log.info ("URI too long (%s). Discarding: %s..."% (len(subject.uri), subject.uri[:30])) |
4176 | - return |
4177 | - log.debug("Indexing '%s'" % subject.uri) |
4178 | - |
4179 | - self._index_uri (subject.uri) |
4180 | - self._index_text (subject.text) |
4181 | - |
4182 | - # If the subject URI is an actor, we index the .desktop also |
4183 | - if subject.uri.startswith ("application://"): |
4184 | - self._index_actor (subject.uri) |
4185 | - |
4186 | - # File contents indexing disabled for now... |
4187 | - #self._index_contents (subject.uri) |
4188 | - |
4189 | - # FIXME: Possibly index payloads when we have apriori knowledge |
4190 | - |
4191 | - self._add_doc_filters (event, doc) |
4192 | - self._index.add_document (doc) |
4193 | - |
4194 | - except Exception, e: |
4195 | - log.error("Error indexing event: %s" % e) |
4196 | - |
4197 | - def _compile_event_filter_query (self, events): |
4198 | - """Takes a list of event templates and compiles a filter query |
4199 | - based on their, interpretations, manifestations, and actor, |
4200 | - for event and subjects. |
4201 | - |
4202 | - All fields within the same event will be ANDed and each template |
4203 | - will be ORed with the others. Like elsewhere in Zeitgeist the |
4204 | - type tree of the interpretations and manifestations will be expanded |
4205 | - to match all child symbols as well |
4206 | - """ |
4207 | - query = [] |
4208 | - for event in events: |
4209 | - if not isinstance(event, Event): |
4210 | - raise TypeError("Expected Event. Found %s" % type(event)) |
4211 | - |
4212 | - tmpl = [] |
4213 | - if event.interpretation : |
4214 | - tmpl.append(expand_type("zgei", event.interpretation)) |
4215 | - if event.manifestation : |
4216 | - tmpl.append(expand_type("zgem", event.manifestation)) |
4217 | - if event.actor : tmpl.append("zga:%s" % mangle_uri(event.actor)) |
4218 | - for su in event.subjects: |
4219 | - if su.uri : |
4220 | - tmpl.append("zgsu:%s" % mangle_uri(su.uri)) |
4221 | - if su.interpretation : |
4222 | - tmpl.append(expand_type("zgsi", su.interpretation)) |
4223 | - if su.manifestation : |
4224 | - tmpl.append(expand_type("zgsm", su.manifestation)) |
4225 | - if su.origin : |
4226 | - tmpl.append("zgso:%s" % mangle_uri(su.origin)) |
4227 | - if su.mimetype : |
4228 | - tmpl.append("zgst:%s" % su.mimetype) |
4229 | - if su.storage : |
4230 | - tmpl.append("zgss:%s" % su.storage) |
4231 | - |
4232 | - tmpl = "(" + ") AND (".join(tmpl) + ")" |
4233 | - query.append(tmpl) |
4234 | - |
4235 | - return " OR ".join(query) |
4236 | - |
4237 | - def _compile_time_range_filter_query (self, time_range): |
4238 | - """Takes a TimeRange and compiles a range query for it""" |
4239 | - |
4240 | - if not isinstance(time_range, TimeRange): |
4241 | - raise TypeError("Expected TimeRange, but found %s" % type(time_range)) |
4242 | - |
4243 | - return "%s..%sms" % (time_range.begin, time_range.end) |
4244 | - |
4245 | - def _get_event_from_row(self, row): |
4246 | - event = Event() |
4247 | - event[0][Event.Id] = row["id"] # Id property is read-only in the public API |
4248 | - event.timestamp = row["timestamp"] |
4249 | - for field in ("interpretation", "manifestation", "actor"): |
4250 | - # Try to get event attributes from row using the attributed field id |
4251 | - # If attribute does not exist we break the attribute fetching and return |
4252 | - # None instead of of crashing |
4253 | - try: |
4254 | - setattr(event, field, getattr(self, "_" + field).value(row[field])) |
4255 | - except KeyError, e: |
4256 | - log.error("Event %i broken: Table %s has no id %i" \ |
4257 | - %(row["id"], field, row[field])) |
4258 | - return None |
4259 | - event.origin = row["event_origin_uri"] or "" |
4260 | - event.payload = row["payload"] or "" # default payload: empty string |
4261 | - return event |
4262 | - |
4263 | - def _get_subject_from_row(self, row): |
4264 | - subject = Subject() |
4265 | - for field in ("uri", "text", "storage"): |
4266 | - setattr(subject, field, row["subj_" + field]) |
4267 | - subject.origin = row["subj_origin_uri"] |
4268 | - if row["subj_current_uri"]: |
4269 | - subject.current_uri = row["subj_current_uri"] |
4270 | - for field in ("interpretation", "manifestation", "mimetype"): |
4271 | - # Try to get subject attributes from row using the attributed field id |
4272 | - # If attribute does not exist we break the attribute fetching and return |
4273 | - # None instead of crashing |
4274 | - try: |
4275 | - setattr(subject, field, |
4276 | - getattr(self, "_" + field).value(row["subj_" + field])) |
4277 | - except KeyError, e: |
4278 | - log.error("Event %i broken: Table %s has no id %i" \ |
4279 | - %(row["id"], field, row["subj_" + field])) |
4280 | - return None |
4281 | - return subject |
4282 | - |
4283 | - def get_events(self, ids, sender=None): |
4284 | - """ |
4285 | - Look up a list of events. |
4286 | - """ |
4287 | - |
4288 | - t = time.time() |
4289 | - |
4290 | - if not ids: |
4291 | - return [] |
4292 | - |
4293 | - # Split ids into cached and uncached |
4294 | - uncached_ids = array("i") |
4295 | - cached_ids = array("i") |
4296 | - |
4297 | - # If ids batch greater than MAX_CACHE_BATCH_SIZE ids ignore cache |
4298 | - use_cache = True |
4299 | - if len(ids) > MAX_CACHE_BATCH_SIZE: |
4300 | - use_cache = False |
4301 | - if not use_cache: |
4302 | - uncached_ids = ids |
4303 | - else: |
4304 | - for id in ids: |
4305 | - if id in self._event_cache: |
4306 | - cached_ids.append(id) |
4307 | - else: |
4308 | - uncached_ids.append(id) |
4309 | - |
4310 | - id_hash = defaultdict(lambda: array("i")) |
4311 | - for n, id in enumerate(ids): |
4312 | - # the same id can be at multible places (LP: #673916) |
4313 | - # cache all of them |
4314 | - id_hash[id].append(n) |
4315 | - |
4316 | - # If we are not able to get an event by the given id |
4317 | - # append None instead of raising an Error. The client |
4318 | - # might simply have requested an event that has been |
4319 | - # deleted |
4320 | - events = {} |
4321 | - sorted_events = [None]*len(ids) |
4322 | - |
4323 | - for id in cached_ids: |
4324 | - event = self._event_cache[id] |
4325 | - if event: |
4326 | - if event is not None: |
4327 | - for n in id_hash[event.id]: |
4328 | - # insert the event into all necessary spots (LP: #673916) |
4329 | - sorted_events[n] = event |
4330 | - |
4331 | - # Get uncached events |
4332 | - rows = self._cursor.execute(""" |
4333 | - SELECT * FROM event_view |
4334 | - WHERE id IN (%s) |
4335 | - """ % ",".join("%d" % _id for _id in uncached_ids)) |
4336 | - |
4337 | - time_get_uncached = time.time() - t |
4338 | - t = time.time() |
4339 | - |
4340 | - t_get_event = 0 |
4341 | - t_get_subject = 0 |
4342 | - t_apply_get_hooks = 0 |
4343 | - |
4344 | - row_counter = 0 |
4345 | - for row in rows: |
4346 | - row_counter += 1 |
4347 | - # Assumption: all rows of a same event for its different |
4348 | - # subjects are in consecutive order. |
4349 | - t_get_event -= time.time() |
4350 | - event = self._get_event_from_row(row) |
4351 | - t_get_event += time.time() |
4352 | - |
4353 | - if event: |
4354 | - # Check for existing event.id in event to attach |
4355 | - # other subjects to it |
4356 | - if event.id not in events: |
4357 | - events[event.id] = event |
4358 | - else: |
4359 | - event = events[event.id] |
4360 | - |
4361 | - t_get_subject -= time.time() |
4362 | - subject = self._get_subject_from_row(row) |
4363 | - t_get_subject += time.time() |
4364 | - # Check if subject has a proper value. If none than something went |
4365 | - # wrong while trying to fetch the subject from the row. So instead |
4366 | - # of failing and raising an error. We silently skip the event. |
4367 | - if subject: |
4368 | - event.append_subject(subject) |
4369 | - if use_cache and not event.payload: |
4370 | - self._event_cache[event.id] = event |
4371 | - if event is not None: |
4372 | - for n in id_hash[event.id]: |
4373 | - # insert the event into all necessary spots (LP: #673916) |
4374 | - sorted_events[n] = event |
4375 | - # Avoid caching events with payloads to have keep the cache MB size |
4376 | - # at a decent level |
4377 | - |
4378 | - |
4379 | - log.debug("Got %d raw events in %fs" % (row_counter, time_get_uncached)) |
4380 | - log.debug("Got %d events in %fs" % (len(sorted_events), time.time()-t)) |
4381 | - log.debug(" Where time spent in _get_event_from_row in %fs" % (t_get_event)) |
4382 | - log.debug(" Where time spent in _get_subject_from_row in %fs" % (t_get_subject)) |
4383 | - log.debug(" Where time spent in apply_get_hooks in %fs" % (t_apply_get_hooks)) |
4384 | - return sorted_events |
4385 | - |
4386 | - def _find_events(self, return_mode, time_range, event_templates, |
4387 | - storage_state, max_events, order, sender=None): |
4388 | - """ |
4389 | - Accepts 'event_templates' as either a real list of Events or as |
4390 | - a list of tuples (event_data, subject_data) as we do in the |
4391 | - DBus API. |
4392 | - |
4393 | - Return modes: |
4394 | - - 0: IDs. |
4395 | - - 1: Events. |
4396 | - """ |
4397 | - t = time.time() |
4398 | - |
4399 | - where = self._build_sql_event_filter(time_range, event_templates, |
4400 | - storage_state) |
4401 | - |
4402 | - if not where.may_have_results(): |
4403 | - return [] |
4404 | - |
4405 | - if return_mode == 0: |
4406 | - sql = "SELECT DISTINCT id FROM event_view" |
4407 | - elif return_mode == 1: |
4408 | - sql = "SELECT id FROM event_view" |
4409 | - else: |
4410 | - raise NotImplementedError, "Unsupported return_mode." |
4411 | - |
4412 | - wheresql = " WHERE %s" % where.sql if where else "" |
4413 | - |
4414 | - def group_and_sort(field, wheresql, time_asc=False, count_asc=None, |
4415 | - aggregation_type='max'): |
4416 | - |
4417 | - args = { |
4418 | - 'field': field, |
4419 | - 'aggregation_type': aggregation_type, |
4420 | - 'where_sql': wheresql, |
4421 | - 'time_sorting': 'ASC' if time_asc else 'DESC', |
4422 | - 'aggregation_sql': '', |
4423 | - 'order_sql': '', |
4424 | - } |
4425 | - |
4426 | - if count_asc is not None: |
4427 | - args['aggregation_sql'] = ', COUNT(%s) AS num_events' % \ |
4428 | - field |
4429 | - args['order_sql'] = 'num_events %s,' % \ |
4430 | - ('ASC' if count_asc else 'DESC') |
4431 | - |
4432 | - return """ |
4433 | - NATURAL JOIN ( |
4434 | - SELECT %(field)s, |
4435 | - %(aggregation_type)s(timestamp) AS timestamp |
4436 | - %(aggregation_sql)s |
4437 | - FROM event_view %(where_sql)s |
4438 | - GROUP BY %(field)s) |
4439 | - GROUP BY %(field)s |
4440 | - ORDER BY %(order_sql)s timestamp %(time_sorting)s |
4441 | - """ % args |
4442 | - |
4443 | - if order == ResultType.MostRecentEvents: |
4444 | - sql += wheresql + " ORDER BY timestamp DESC" |
4445 | - elif order == ResultType.LeastRecentEvents: |
4446 | - sql += wheresql + " ORDER BY timestamp ASC" |
4447 | - elif order == ResultType.MostRecentEventOrigin: |
4448 | - sql += group_and_sort("origin", wheresql, time_asc=False) |
4449 | - elif order == ResultType.LeastRecentEventOrigin: |
4450 | - sql += group_and_sort("origin", wheresql, time_asc=True) |
4451 | - elif order == ResultType.MostPopularEventOrigin: |
4452 | - sql += group_and_sort("origin", wheresql, time_asc=False, |
4453 | - count_asc=False) |
4454 | - elif order == ResultType.LeastPopularEventOrigin: |
4455 | - sql += group_and_sort("origin", wheresql, time_asc=True, |
4456 | - count_asc=True) |
4457 | - elif order == ResultType.MostRecentSubjects: |
4458 | - # Remember, event.subj_id identifies the subject URI |
4459 | - sql += group_and_sort("subj_id", wheresql, time_asc=False) |
4460 | - elif order == ResultType.LeastRecentSubjects: |
4461 | - sql += group_and_sort("subj_id", wheresql, time_asc=True) |
4462 | - elif order == ResultType.MostPopularSubjects: |
4463 | - sql += group_and_sort("subj_id", wheresql, time_asc=False, |
4464 | - count_asc=False) |
4465 | - elif order == ResultType.LeastPopularSubjects: |
4466 | - sql += group_and_sort("subj_id", wheresql, time_asc=True, |
4467 | - count_asc=True) |
4468 | - elif order == ResultType.MostRecentCurrentUri: |
4469 | - sql += group_and_sort("subj_id_current", wheresql, time_asc=False) |
4470 | - elif order == ResultType.LeastRecentCurrentUri: |
4471 | - sql += group_and_sort("subj_id_current", wheresql, time_asc=True) |
4472 | - elif order == ResultType.MostPopularCurrentUri: |
4473 | - sql += group_and_sort("subj_id_current", wheresql, time_asc=False, |
4474 | - count_asc=False) |
4475 | - elif order == ResultType.LeastPopularCurrentUri: |
4476 | - sql += group_and_sort("subj_id_current", wheresql, time_asc=True, |
4477 | - count_asc=True) |
4478 | - elif order == ResultType.MostRecentActor: |
4479 | - sql += group_and_sort("actor", wheresql, time_asc=False) |
4480 | - elif order == ResultType.LeastRecentActor: |
4481 | - sql += group_and_sort("actor", wheresql, time_asc=True) |
4482 | - elif order == ResultType.MostPopularActor: |
4483 | - sql += group_and_sort("actor", wheresql, time_asc=False, |
4484 | - count_asc=False) |
4485 | - elif order == ResultType.LeastPopularActor: |
4486 | - sql += group_and_sort("actor", wheresql, time_asc=True, |
4487 | - count_asc=True) |
4488 | - elif order == ResultType.OldestActor: |
4489 | - sql += group_and_sort("actor", wheresql, time_asc=True, |
4490 | - aggregation_type="min") |
4491 | - elif order == ResultType.MostRecentOrigin: |
4492 | - sql += group_and_sort("subj_origin", wheresql, time_asc=False) |
4493 | - elif order == ResultType.LeastRecentOrigin: |
4494 | - sql += group_and_sort("subj_origin", wheresql, time_asc=True) |
4495 | - elif order == ResultType.MostPopularOrigin: |
4496 | - sql += group_and_sort("subj_origin", wheresql, time_asc=False, |
4497 | - count_asc=False) |
4498 | - elif order == ResultType.LeastPopularOrigin: |
4499 | - sql += group_and_sort("subj_origin", wheresql, time_asc=True, |
4500 | - count_asc=True) |
4501 | - elif order == ResultType.MostRecentSubjectInterpretation: |
4502 | - sql += group_and_sort("subj_interpretation", wheresql, |
4503 | - time_asc=False) |
4504 | - elif order == ResultType.LeastRecentSubjectInterpretation: |
4505 | - sql += group_and_sort("subj_interpretation", wheresql, |
4506 | - time_asc=True) |
4507 | - elif order == ResultType.MostPopularSubjectInterpretation: |
4508 | - sql += group_and_sort("subj_interpretation", wheresql, |
4509 | - time_asc=False, count_asc=False) |
4510 | - elif order == ResultType.LeastPopularSubjectInterpretation: |
4511 | - sql += group_and_sort("subj_interpretation", wheresql, |
4512 | - time_asc=True, count_asc=True) |
4513 | - elif order == ResultType.MostRecentMimeType: |
4514 | - sql += group_and_sort("subj_mimetype", wheresql, time_asc=False) |
4515 | - elif order == ResultType.LeastRecentMimeType: |
4516 | - sql += group_and_sort("subj_mimetype", wheresql, time_asc=True) |
4517 | - elif order == ResultType.MostPopularMimeType: |
4518 | - sql += group_and_sort("subj_mimetype", wheresql, time_asc=False, |
4519 | - count_asc=False) |
4520 | - elif order == ResultType.LeastPopularMimeType: |
4521 | - sql += group_and_sort("subj_mimetype", wheresql, time_asc=True, |
4522 | - count_asc=True) |
4523 | - |
4524 | - if max_events > 0: |
4525 | - sql += " LIMIT %d" % max_events |
4526 | - result = array("i", self._cursor.execute(sql, where.arguments).fetch(0)) |
4527 | - |
4528 | - if return_mode == 0: |
4529 | - log.debug("Found %d event IDs in %fs" % (len(result), time.time()- t)) |
4530 | - elif return_mode == 1: |
4531 | - log.debug("Found %d events in %fs" % (len(result), time.time()- t)) |
4532 | - result = self.get_events(ids=result, sender=sender) |
4533 | - else: |
4534 | - raise Exception("%d" % return_mode) |
4535 | - |
4536 | - return result |
4537 | - |
4538 | - @staticmethod |
4539 | - def _build_templates(templates): |
4540 | - for event_template in templates: |
4541 | - event_data = event_template[0] |
4542 | - for subject in (event_template[1] or (Subject(),)): |
4543 | - yield Event((event_data, [], None)), Subject(subject) |
4544 | - |
4545 | - def _build_sql_from_event_templates(self, templates): |
4546 | - |
4547 | - where_or = WhereClause(WhereClause.OR) |
4548 | - |
4549 | - for template in templates: |
4550 | - event_template = Event((template[0], [], None)) |
4551 | - if template[1]: |
4552 | - subject_templates = [Subject(data) for data in template[1]] |
4553 | - else: |
4554 | - subject_templates = None |
4555 | - |
4556 | - subwhere = WhereClause(WhereClause.AND) |
4557 | - |
4558 | - if event_template.id: |
4559 | - subwhere.add("id = ?", event_template.id) |
4560 | - |
4561 | - try: |
4562 | - value, negation, wildcard = parse_operators(Event, Event.Interpretation, event_template.interpretation) |
4563 | - # Expand event interpretation children |
4564 | - event_interp_where = WhereClause(WhereClause.OR, negation) |
4565 | - for child_interp in (Symbol.find_child_uris_extended(value)): |
4566 | - if child_interp: |
4567 | - event_interp_where.add_text_condition("interpretation", |
4568 | - child_interp, like=wildcard, cache=self._interpretation) |
4569 | - if event_interp_where: |
4570 | - subwhere.extend(event_interp_where) |
4571 | - |
4572 | - value, negation, wildcard = parse_operators(Event, Event.Manifestation, event_template.manifestation) |
4573 | - # Expand event manifestation children |
4574 | - event_manif_where = WhereClause(WhereClause.OR, negation) |
4575 | - for child_manif in (Symbol.find_child_uris_extended(value)): |
4576 | - if child_manif: |
4577 | - event_manif_where.add_text_condition("manifestation", |
4578 | - child_manif, like=wildcard, cache=self._manifestation) |
4579 | - if event_manif_where: |
4580 | - subwhere.extend(event_manif_where) |
4581 | - |
4582 | - value, negation, wildcard = parse_operators(Event, Event.Actor, event_template.actor) |
4583 | - if value: |
4584 | - subwhere.add_text_condition("actor", value, wildcard, negation, cache=self._actor) |
4585 | - |
4586 | - value, negation, wildcard = parse_operators(Event, Event.Origin, event_template.origin) |
4587 | - if value: |
4588 | - subwhere.add_text_condition("origin", value, wildcard, negation) |
4589 | - |
4590 | - if subject_templates is not None: |
4591 | - for subject_template in subject_templates: |
4592 | - value, negation, wildcard = parse_operators(Subject, Subject.Interpretation, subject_template.interpretation) |
4593 | - # Expand subject interpretation children |
4594 | - su_interp_where = WhereClause(WhereClause.OR, negation) |
4595 | - for child_interp in (Symbol.find_child_uris_extended(value)): |
4596 | - if child_interp: |
4597 | - su_interp_where.add_text_condition("subj_interpretation", |
4598 | - child_interp, like=wildcard, cache=self._interpretation) |
4599 | - if su_interp_where: |
4600 | - subwhere.extend(su_interp_where) |
4601 | - |
4602 | - value, negation, wildcard = parse_operators(Subject, Subject.Manifestation, subject_template.manifestation) |
4603 | - # Expand subject manifestation children |
4604 | - su_manif_where = WhereClause(WhereClause.OR, negation) |
4605 | - for child_manif in (Symbol.find_child_uris_extended(value)): |
4606 | - if child_manif: |
4607 | - su_manif_where.add_text_condition("subj_manifestation", |
4608 | - child_manif, like=wildcard, cache=self._manifestation) |
4609 | - if su_manif_where: |
4610 | - subwhere.extend(su_manif_where) |
4611 | - |
4612 | - # FIXME: Expand mime children as well. |
4613 | - # Right now we only do exact matching for mimetypes |
4614 | - # thekorn: this will be fixed when wildcards are supported |
4615 | - value, negation, wildcard = parse_operators(Subject, Subject.Mimetype, subject_template.mimetype) |
4616 | - if value: |
4617 | - subwhere.add_text_condition("subj_mimetype", |
4618 | - value, wildcard, negation, cache=self._mimetype) |
4619 | - |
4620 | - for key in ("uri", "origin", "text"): |
4621 | - value = getattr(subject_template, key) |
4622 | - if value: |
4623 | - value, negation, wildcard = parse_operators(Subject, getattr(Subject, key.title()), value) |
4624 | - subwhere.add_text_condition("subj_%s" % key, value, wildcard, negation) |
4625 | - |
4626 | - if subject_template.current_uri: |
4627 | - value, negation, wildcard = parse_operators(Subject, |
4628 | - Subject.CurrentUri, subject_template.current_uri) |
4629 | - subwhere.add_text_condition("subj_current_uri", value, wildcard, negation) |
4630 | - |
4631 | - if subject_template.storage: |
4632 | - subwhere.add_text_condition("subj_storage", subject_template.storage) |
4633 | - |
4634 | - except KeyError, e: |
4635 | - # Value not in DB |
4636 | - log.debug("Unknown entity in query: %s" % e) |
4637 | - where_or.register_no_result() |
4638 | - continue |
4639 | - where_or.extend(subwhere) |
4640 | - return where_or |
4641 | - |
4642 | - def _build_sql_event_filter(self, time_range, templates, storage_state): |
4643 | - |
4644 | - where = WhereClause(WhereClause.AND) |
4645 | - |
4646 | - # thekorn: we are using the unary operator here to tell sql to not use |
4647 | - # the index on the timestamp column at the first place. This `fix` for |
4648 | - # (LP: #672965) is based on some benchmarks, which suggest a performance |
4649 | - # win, but we might not oversee all implications. |
4650 | - # (see http://www.sqlite.org/optoverview.html section 6.0) |
4651 | - min_time, max_time = time_range |
4652 | - if min_time != 0: |
4653 | - where.add("+timestamp >= ?", min_time) |
4654 | - if max_time != sys.maxint: |
4655 | - where.add("+timestamp <= ?", max_time) |
4656 | - |
4657 | - if storage_state in (StorageState.Available, StorageState.NotAvailable): |
4658 | - where.add("(subj_storage_state = ? OR subj_storage_state IS NULL)", |
4659 | - storage_state) |
4660 | - elif storage_state != StorageState.Any: |
4661 | - raise ValueError, "Unknown storage state '%d'" % storage_state |
4662 | - |
4663 | - where.extend(self._build_sql_from_event_templates(templates)) |
4664 | - |
4665 | - return where |
4666 | - |
4667 | -if __name__ == "__main__": |
4668 | - mainloop = gobject.MainLoop(is_running=True) |
4669 | - search_engine = SearchEngineExtension() |
4670 | - ZG_CLIENT._iface.connect_exit(lambda: mainloop.quit ()) |
4671 | - mainloop.run() |
4672 | - |
4673 | |
4674 | === removed file 'extensions/fts-python/lrucache.py' |
4675 | --- extensions/fts-python/lrucache.py 2011-10-10 14:07:42 +0000 |
4676 | +++ extensions/fts-python/lrucache.py 1970-01-01 00:00:00 +0000 |
4677 | @@ -1,125 +0,0 @@ |
4678 | -# -.- coding: utf-8 -.- |
4679 | - |
4680 | -# lrucache.py |
4681 | -# |
4682 | -# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
4683 | -# Copyright © 2009 Markus Korn <thekorn@gmx.de> |
4684 | -# Copyright © 2011 Seif Lotfy <seif@lotfy.com> |
4685 | -# |
4686 | -# This program is free software: you can redistribute it and/or modify |
4687 | -# it under the terms of the GNU Lesser General Public License as published by |
4688 | -# the Free Software Foundation, either version 2.1 of the License, or |
4689 | -# (at your option) any later version. |
4690 | -# |
4691 | -# This program is distributed in the hope that it will be useful, |
4692 | -# but WITHOUT ANY WARRANTY; without even the implied warranty of |
4693 | -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
4694 | -# GNU Lesser General Public License for more details. |
4695 | -# |
4696 | -# You should have received a copy of the GNU Lesser General Public License |
4697 | -# along with this program. If not, see <http://www.gnu.org/licenses/>. |
4698 | - |
4699 | -class LRUCache: |
4700 | - """ |
4701 | - A simple LRUCache implementation backed by a linked list and a dict. |
4702 | - It can be accessed and updated just like a dict. To check if an element |
4703 | - exists in the cache the following type of statements can be used: |
4704 | - if "foo" in cache |
4705 | - """ |
4706 | - |
4707 | - class _Item: |
4708 | - """ |
4709 | - A container for each item in LRUCache which knows about the |
4710 | - item's position and relations |
4711 | - """ |
4712 | - def __init__(self, item_key, item_value): |
4713 | - self.value = item_value |
4714 | - self.key = item_key |
4715 | - self.next = None |
4716 | - self.prev = None |
4717 | - |
4718 | - def __init__(self, max_size): |
4719 | - """ |
4720 | - The size of the cache (in number of cached items) is guaranteed to |
4721 | - never exceed 'size' |
4722 | - """ |
4723 | - self._max_size = max_size |
4724 | - self.clear() |
4725 | - |
4726 | - |
4727 | - def clear(self): |
4728 | - self._list_end = None # The newest item |
4729 | - self._list_start = None # Oldest item |
4730 | - self._map = {} |
4731 | - |
4732 | - def __len__(self): |
4733 | - return len(self._map) |
4734 | - |
4735 | - def __contains__(self, key): |
4736 | - return key in self._map |
4737 | - |
4738 | - def __delitem__(self, key): |
4739 | - item = self._map[key] |
4740 | - if item.prev: |
4741 | - item.prev.next = item.next |
4742 | - else: |
4743 | - # we are deleting the first item, so we need a new first one |
4744 | - self._list_start = item.next |
4745 | - if item.next: |
4746 | - item.next.prev = item.prev |
4747 | - else: |
4748 | - # we are deleting the last item, get a new last one |
4749 | - self._list_end = item.prev |
4750 | - del self._map[key], item |
4751 | - |
4752 | - def __setitem__(self, key, value): |
4753 | - if key in self._map: |
4754 | - item = self._map[key] |
4755 | - item.value = value |
4756 | - self._move_item_to_end(item) |
4757 | - else: |
4758 | - new = LRUCache._Item(key, value) |
4759 | - self._append_to_list(new) |
4760 | - |
4761 | - if len(self._map) > self._max_size : |
4762 | - # Remove eldest entry from list |
4763 | - self.remove_eldest_item() |
4764 | - |
4765 | - def __getitem__(self, key): |
4766 | - item = self._map[key] |
4767 | - self._move_item_to_end(item) |
4768 | - return item.value |
4769 | - |
4770 | - def __iter__(self): |
4771 | - """ |
4772 | - Iteration is in order from eldest to newest, |
4773 | - and returns (key,value) tuples |
4774 | - """ |
4775 | - iter = self._list_start |
4776 | - while iter != None: |
4777 | - yield (iter.key, iter.value) |
4778 | - iter = iter.next |
4779 | - |
4780 | - def _move_item_to_end(self, item): |
4781 | - del self[item.key] |
4782 | - self._append_to_list(item) |
4783 | - |
4784 | - def _append_to_list(self, item): |
4785 | - self._map[item.key] = item |
4786 | - if not self._list_start: |
4787 | - self._list_start = item |
4788 | - if self._list_end: |
4789 | - self._list_end.next = item |
4790 | - item.prev = self._list_end |
4791 | - item.next = None |
4792 | - self._list_end = item |
4793 | - |
4794 | - def remove_eldest_item(self): |
4795 | - if self._list_start == self._list_end: |
4796 | - self._list_start = None |
4797 | - self._list_end = None |
4798 | - return |
4799 | - old = self._list_start |
4800 | - old.next.prev = None |
4801 | - self._list_start = old.next |
4802 | - del self[old.key], old |
4803 | |
4804 | === removed file 'extensions/fts-python/org.gnome.zeitgeist.fts.service.in' |
4805 | --- extensions/fts-python/org.gnome.zeitgeist.fts.service.in 2011-10-10 18:51:40 +0000 |
4806 | +++ extensions/fts-python/org.gnome.zeitgeist.fts.service.in 1970-01-01 00:00:00 +0000 |
4807 | @@ -1,3 +0,0 @@ |
4808 | -[D-BUS Service] |
4809 | -Name=org.gnome.zeitgeist.SimpleIndexer |
4810 | -Exec=@pkgdatadir@/fts-python/fts.py |
4811 | |
4812 | === removed file 'extensions/fts-python/sql.py' |
4813 | --- extensions/fts-python/sql.py 2012-01-20 14:01:36 +0000 |
4814 | +++ extensions/fts-python/sql.py 1970-01-01 00:00:00 +0000 |
4815 | @@ -1,301 +0,0 @@ |
4816 | -# -.- coding: utf-8 -.- |
4817 | - |
4818 | -# Zeitgeist |
4819 | -# |
4820 | -# Copyright © 2009-2010 Siegfried-Angel Gevatter Pujals <rainct@ubuntu.com> |
4821 | -# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com> |
4822 | -# Copyright © 2009-2011 Markus Korn <thekorn@gmx.net> |
4823 | -# Copyright © 2009 Seif Lotfy <seif@lotfy.com> |
4824 | -# Copyright © 2011 J.P. Lacerda <jpaflacerda@gmail.com> |
4825 | -# Copyright © 2011 Collabora Ltd. |
4826 | -# By Siegfried-Angel Gevatter Pujals <rainct@ubuntu.com> |
4827 | -# |
4828 | -# This program is free software: you can redistribute it and/or modify |
4829 | -# it under the terms of the GNU Lesser General Public License as published by |
4830 | -# the Free Software Foundation, either version 2.1 of the License, or |
4831 | -# (at your option) any later version. |
4832 | -# |
4833 | -# This program is distributed in the hope that it will be useful, |
4834 | -# but WITHOUT ANY WARRANTY; without even the implied warranty of |
4835 | -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
4836 | -# GNU Lesser General Public License for more details. |
4837 | -# |
4838 | -# You should have received a copy of the GNU Lesser General Public License |
4839 | -# along with this program. If not, see <http://www.gnu.org/licenses/>. |
4840 | - |
4841 | -import sqlite3 |
4842 | -import logging |
4843 | -import time |
4844 | -import os |
4845 | -import shutil |
4846 | - |
4847 | -from constants import constants |
4848 | - |
4849 | -log = logging.getLogger("siis.zeitgeist.sql") |
4850 | - |
4851 | -TABLE_MAP = { |
4852 | - "origin": "uri", |
4853 | - "subj_mimetype": "mimetype", |
4854 | - "subj_origin": "uri", |
4855 | - "subj_uri": "uri", |
4856 | - "subj_current_uri": "uri", |
4857 | -} |
4858 | - |
4859 | -def explain_query(cursor, statement, arguments=()): |
4860 | - plan = "" |
4861 | - for r in cursor.execute("EXPLAIN QUERY PLAN "+statement, arguments).fetchall(): |
4862 | - plan += str(list(r)) + "\n" |
4863 | - log.debug("Got query:\nQUERY:\n%s (%s)\nPLAN:\n%s" % (statement, arguments, plan)) |
4864 | - |
4865 | -class UnicodeCursor(sqlite3.Cursor): |
4866 | - |
4867 | - debug_explain = os.getenv("ZEITGEIST_DEBUG_QUERY_PLANS") |
4868 | - |
4869 | - @staticmethod |
4870 | - def fix_unicode(obj): |
4871 | - if isinstance(obj, (int, long)): |
4872 | - # thekorn: as long as we are using the unary operator for timestamp |
4873 | - # related queries we have to make sure that integers are not |
4874 | - # converted to strings, same applies for long numbers. |
4875 | - return obj |
4876 | - if isinstance(obj, str): |
4877 | - obj = obj.decode("UTF-8") |
4878 | - # seif: Python’s default encoding is ASCII, so whenever a character with |
4879 | - # an ASCII value > 127 is in the input data, you’ll get a UnicodeDecodeError |
4880 | - # because that character can’t be handled by the ASCII encoding. |
4881 | - try: |
4882 | - obj = unicode(obj) |
4883 | - except UnicodeDecodeError, ex: |
4884 | - pass |
4885 | - return obj |
4886 | - |
4887 | - def execute(self, statement, parameters=()): |
4888 | - parameters = [self.fix_unicode(p) for p in parameters] |
4889 | - if UnicodeCursor.debug_explain: |
4890 | - explain_query(super(UnicodeCursor, self), statement, parameters) |
4891 | - return super(UnicodeCursor, self).execute(statement, parameters) |
4892 | - |
4893 | - def fetch(self, index=None): |
4894 | - if index is not None: |
4895 | - for row in self: |
4896 | - yield row[index] |
4897 | - else: |
4898 | - for row in self: |
4899 | - yield row |
4900 | - |
4901 | -def _get_schema_version (cursor, schema_name): |
4902 | - """ |
4903 | - Returns the schema version for schema_name or returns 0 in case |
4904 | - the schema doesn't exist. |
4905 | - """ |
4906 | - try: |
4907 | - schema_version_result = cursor.execute(""" |
4908 | - SELECT version FROM schema_version WHERE schema=? |
4909 | - """, (schema_name,)) |
4910 | - result = schema_version_result.fetchone() |
4911 | - return result[0] if result else 0 |
4912 | - except sqlite3.OperationalError, e: |
4913 | - # The schema isn't there... |
4914 | - log.debug ("Schema '%s' not found: %s" % (schema_name, e)) |
4915 | - return 0 |
4916 | - |
4917 | -def _connect_to_db(file_path): |
4918 | - conn = sqlite3.connect(file_path) |
4919 | - conn.row_factory = sqlite3.Row |
4920 | - cursor = conn.cursor(UnicodeCursor) |
4921 | - return cursor |
4922 | - |
4923 | -_cursor = None |
4924 | -def get_default_cursor(): |
4925 | - global _cursor |
4926 | - if not _cursor: |
4927 | - dbfile = constants.DATABASE_FILE |
4928 | - start = time.time() |
4929 | - log.info("Using database: %s" % dbfile) |
4930 | - new_database = not os.path.exists(dbfile) |
4931 | - _cursor = _connect_to_db(dbfile) |
4932 | - core_schema_version = _get_schema_version(_cursor, constants.CORE_SCHEMA) |
4933 | - if core_schema_version < constants.CORE_SCHEMA_VERSION: |
4934 | - log.exception( |
4935 | - "Database '%s' is on version %s, but %s is required" % \ |
4936 | - (constants.CORE_SCHEMA, core_schema_version, |
4937 | - constants.CORE_SCHEMA_VERSION)) |
4938 | - raise SystemExit(27) |
4939 | - return _cursor |
4940 | -def unset_cursor(): |
4941 | - global _cursor |
4942 | - _cursor = None |
4943 | - |
4944 | -class TableLookup(dict): |
4945 | - |
4946 | - # We are not using an LRUCache as pressumably there won't be thousands |
4947 | - # of manifestations/interpretations/mimetypes/actors on most |
4948 | - # installations, so we can save us the overhead of tracking their usage. |
4949 | - |
4950 | - def __init__(self, cursor, table): |
4951 | - |
4952 | - self._cursor = cursor |
4953 | - self._table = table |
4954 | - |
4955 | - for row in cursor.execute("SELECT id, value FROM %s" % table): |
4956 | - self[row["value"]] = row["id"] |
4957 | - |
4958 | - self._inv_dict = dict((value, key) for key, value in self.iteritems()) |
4959 | - |
4960 | - def __getitem__(self, name): |
4961 | - # Use this for inserting new properties into the database |
4962 | - if name in self: |
4963 | - return super(TableLookup, self).__getitem__(name) |
4964 | - id = self._cursor.execute("SELECT id FROM %s WHERE value=?" |
4965 | - % self._table, (name,)).fetchone()[0] |
4966 | - # If we are here it's a newly inserted value, insert it into cache |
4967 | - self[name] = id |
4968 | - self._inv_dict[id] = name |
4969 | - return id |
4970 | - |
4971 | - def value(self, id): |
4972 | - # When we fetch an event, it either was already in the database |
4973 | - # at the time Zeitgeist started or it was inserted later -using |
4974 | - # Zeitgeist-, so here we always have the data in memory already. |
4975 | - return self._inv_dict[id] |
4976 | - |
4977 | - def id(self, name): |
4978 | - # Use this when fetching values which are supposed to be in the |
4979 | - # database already. Eg., in find_eventids. |
4980 | - return super(TableLookup, self).__getitem__(name) |
4981 | - |
4982 | - def remove_id(self, id): |
4983 | - value = self.value(id) |
4984 | - del self._inv_dict[id] |
4985 | - del self[value] |
4986 | - |
4987 | -def get_right_boundary(text): |
4988 | - """ returns the smallest string which is greater than `text` """ |
4989 | - if not text: |
4990 | - # if the search prefix is empty we query for the whole range |
4991 | - # of 'utf-8 'unicode chars |
4992 | - return unichr(0x10ffff) |
4993 | - if isinstance(text, str): |
4994 | - # we need to make sure the text is decoded as 'utf-8' unicode |
4995 | - text = unicode(text, "UTF-8") |
4996 | - charpoint = ord(text[-1]) |
4997 | - if charpoint == 0x10ffff: |
4998 | - # if the last character is the biggest possible char we need to |
4999 | - # look at the second last |
5000 | - return get_right_boundary(text[:-1]) |
Awesome! C++ FTS ftw.
- Add COPYING.GPL3, otherwise the tarball can't be re-distributed.
- Considering sharing a get_flags_ for_log_ level or even set_log_level
function between ZG and FTS?
- s/ver != DatabaseSchema. CORE_SCHEMA_ VERSION) /ver < DatabaseSchema. CORE_SCHEMA_ VERSION/
What's the rationale for this? We don't know changes won't break compatibility
- Can you explain the "// Don't disconnect monitors using service names"?
I didn't really review the C++ stuff (I'm asuming you and Mikkel reviewed each other's stuff already?).