Merge lp:~zeitgeist/zeitgeist/fts++ into lp:~zeitgeist/zeitgeist/bluebird

Proposed by Michal Hruby
Status: Merged
Merged at revision: 390
Proposed branch: lp:~zeitgeist/zeitgeist/fts++
Merge into: lp:~zeitgeist/zeitgeist/bluebird
Diff against target: 5387 lines (+3135/-1900)
36 files modified
.bzrignore (+12/-1)
configure.ac (+3/-1)
extensions/Makefile.am (+1/-1)
extensions/fts++/Makefile.am (+113/-0)
extensions/fts++/controller.cpp (+136/-0)
extensions/fts++/controller.h (+72/-0)
extensions/fts++/ext-dummies.vala (+71/-0)
extensions/fts++/fts.cpp (+136/-0)
extensions/fts++/fts.h (+59/-0)
extensions/fts++/fts.vapi (+25/-0)
extensions/fts++/indexer.cpp (+897/-0)
extensions/fts++/indexer.h (+115/-0)
extensions/fts++/org.gnome.zeitgeist.fts.service.in (+3/-0)
extensions/fts++/stringutils.cpp (+128/-0)
extensions/fts++/stringutils.h (+42/-0)
extensions/fts++/task.cpp (+47/-0)
extensions/fts++/task.h (+100/-0)
extensions/fts++/test/Makefile.am (+27/-0)
extensions/fts++/test/test-fts.c (+37/-0)
extensions/fts++/test/test-indexer.cpp (+531/-0)
extensions/fts++/test/test-stringutils.cpp (+178/-0)
extensions/fts++/zeitgeist-fts.vala (+301/-0)
extensions/fts-python/Makefile.am (+0/-23)
extensions/fts-python/constants.py (+0/-71)
extensions/fts-python/datamodel.py (+0/-83)
extensions/fts-python/fts.py (+0/-1273)
extensions/fts-python/lrucache.py (+0/-125)
extensions/fts-python/org.gnome.zeitgeist.fts.service.in (+0/-3)
extensions/fts-python/sql.py (+0/-301)
extensions/fts.vala (+13/-1)
src/datamodel.vala (+0/-3)
src/engine.vala (+1/-0)
src/notify.vala (+65/-12)
src/sql.vala (+1/-1)
src/table-lookup.vala (+20/-0)
src/zeitgeist-daemon.vala (+1/-1)
To merge this branch: bzr merge lp:~zeitgeist/zeitgeist/fts++
Reviewer Review Type Date Requested Status
Siegfried Gevatter Approve
Mikkel Kamstrup Erlandsen Approve
Review via email: mp+92022@code.launchpad.net

Description of the change

More core changes so we can implement the new FTS daemon, plus the daemon itself.

To post a comment you must log in.
lp:~zeitgeist/zeitgeist/fts++ updated
432. By Michal Hruby

Remove debug warning

Revision history for this message
Siegfried Gevatter (rainct) wrote :

Awesome! C++ FTS ftw.

- Add COPYING.GPL3, otherwise the tarball can't be re-distributed.

- Considering sharing a get_flags_for_log_level or even set_log_level
  function between ZG and FTS?

- s/ver != DatabaseSchema.CORE_SCHEMA_VERSION)/ver < DatabaseSchema.CORE_SCHEMA_VERSION/
  What's the rationale for this? We don't know changes won't break compatibility

- Can you explain the "// Don't disconnect monitors using service names"?

I didn't really review the C++ stuff (I'm asuming you and Mikkel reviewed each other's stuff already?).

review: Needs Fixing
Revision history for this message
Michal Hruby (mhr3) wrote :

> Awesome! C++ FTS ftw.
>
> - Add COPYING.GPL3, otherwise the tarball can't be re-distributed.
>

On it...

> - Considering sharing a get_flags_for_log_level or even set_log_level
> function between ZG and FTS?
>

I don't think that's really necessary, strictly speaking it'd be a utility function for a specific app, and has no place in a library.

> - s/ver != DatabaseSchema.CORE_SCHEMA_VERSION)/ver <
> DatabaseSchema.CORE_SCHEMA_VERSION/
> What's the rationale for this? We don't know changes won't break
> compatibility
>

Does that mean we should automatically assume that the possible changes do break stuff? This is only used with read-only database so I don't see any harm - either the reading will continue to work or you'll get some run-time errors, I find that better than just not working with even trying.

> - Can you explain the "// Don't disconnect monitors using service names"?
>

As said on IRC, it prevents some races by allowing the internal extensions to register a monitor with a service name (races that would otherwise cause missed notifications when the external daemon is starting and didn't have a chance to register a monitor)

> I didn't really review the C++ stuff (I'm asuming you and Mikkel reviewed each
> other's stuff already?).

Partially, but we have tests, so it has to work, right?! :)

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Functionally tested in Unity and working well. Unit tests passing. However -

There seems to be a fairly bad leak somewhere. Try repeatedly searching for 'u' or something like that and you'll see the memory consumption go up fairly fast.

review: Needs Fixing
Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

1583 +void Indexer::Flush ()
1584 +{
1585 + db->flush ();
1586 +}

This need to be Commit() and db->commit(). See http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#cbea2163142de795024880a7123bc693. You should probably also surround it with a try/catch.

review: Needs Fixing
Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

1550 +void Indexer::DropIndex ()

Are we not leaking db and enquire in this method?

Revision history for this message
Michal Hruby (mhr3) wrote :

> Functionally tested in Unity and working well. Unit tests passing. However -
>
> There seems to be a fairly bad leak somewhere. Try repeatedly searching for
> 'u' or something like that and you'll see the memory consumption go up fairly
> fast.

Nope, sorry can't reproduce that, the first search does indeed increase the mem usage considerably, but that is just xapian initializing its caches afaict. If i search for the same thing over and over again the mem usage stays constant here.

> This need to be Commit() and db->commit(). You should probably also surround it with a try/catch.

Fixing...

> Are we not leaking db and enquire in this method?

db is closed and deleted, but yes enquire is leaked. Fixing.

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

> > Functionally tested in Unity and working well. Unit tests passing. However -
> >
> > There seems to be a fairly bad leak somewhere. Try repeatedly searching for
> > 'u' or something like that and you'll see the memory consumption go up
> fairly
> > fast.
>
> Nope, sorry can't reproduce that, the first search does indeed increase the
> mem usage considerably, but that is just xapian initializing its caches
> afaict. If i search for the same thing over and over again the mem usage stays
> constant here.

Odd, now I can't reproduce it here either... I swear I had it sitting at around 16mb writable, and while searching I could see it crawl 1mb at a time all the way past 30mb... But now it sits steady at around 14mb writable (which is still surprisingly much, but stable at least).

lp:~zeitgeist/zeitgeist/fts++ updated
433. By Michal Hruby

We wanted to use GPL2+. Make it so

434. By Michal Hruby

Fix issues found during review

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Looking good to me. I'd like someone else to +1 it before we merge though...

Outstanding work Michal!

review: Approve
Revision history for this message
Seif Lotfy (seif) wrote :

I have been using it for 2 days now...
I noticed an small increase in memory consumption around 2-4 MB
However this is nothing that really bothers me
AWESOME WORK

On Thu, Feb 9, 2012 at 11:47 AM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:

> Review: Approve
>
> Looking good to me. I'd like someone else to +1 it before we merge
> though...
>
> Outstanding work Michal!
> --
> https://code.launchpad.net/~zeitgeist/zeitgeist/fts++/+merge/92022
> You are subscribed to branch lp:zeitgeist.
>

lp:~zeitgeist/zeitgeist/fts++ updated
435. By Michal Hruby

Update TableLookup if necessary

Revision history for this message
Siegfried Gevatter (rainct) wrote :

OK, merging it, but there's some outstanding stuff:

 * Important: the TableLookup in FTS can currently explode. The new schema version needs to add AUTOINCREMENT to the `id' row of all tables in TableLookup. We should do this before releasing a new tarball.

 * TableLookup.get_value: please prepare the query in the constructor.

 * Configure isn't checking for xapian being there

 * Add a flag to disable FTS? (keeping the Xapian dependency avoidable)

review: Approve
Revision history for this message
Seif Lotfy (seif) wrote :

I like the last one :)

On Fri, Feb 10, 2012 at 12:30 PM, Siegfried Gevatter <email address hidden>wrote:

> Review: Approve
>
> OK, merging it, but there's some outstanding stuff:
>
> * Important: the TableLookup in FTS can currently explode. The new schema
> version needs to add AUTOINCREMENT to the `id' row of all tables in
> TableLookup. We should do this before releasing a new tarball.
>
> * TableLookup.get_value: please prepare the query in the constructor.
>
> * Configure isn't checking for xapian being there
>
> * Add a flag to disable FTS? (keeping the Xapian dependency avoidable)
> --
> https://code.launchpad.net/~zeitgeist/zeitgeist/fts++/+merge/92022
> You are subscribed to branch lp:zeitgeist.
>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '.bzrignore'
2--- .bzrignore 2011-12-31 13:30:23 +0000
3+++ .bzrignore 2012-02-09 22:47:22 +0000
4@@ -44,12 +44,23 @@
5 extensions/*.stamp
6 extensions/*.la
7 extensions/*.lo
8+extensions/fts++/.deps
9+extensions/fts++/.libs
10+extensions/fts++/*.c
11+extensions/fts++/*.stamp
12+extensions/fts++/*.la
13+extensions/fts++/*.lo
14+extensions/fts++/zeitgeist-internal.*
15+extensions/fts++/test/.deps
16+extensions/fts++/test/.libs
17+extensions/fts++/test/test-fts
18+extensions/fts++/org.gnome.zeitgeist.fts.service
19+extensions/fts++/zeitgeist-fts
20 test/direct/marshalling
21 test/dbus/__pycache__
22 test/direct/table-lookup-test
23 src/zeitgeist-engine.vapi
24 src/zeitgeist-engine.h
25-extensions/fts-python/org.gnome.zeitgeist.fts.service
26 py-compile
27 python/_ontology.py
28 test/direct/*.c
29
30=== modified file 'configure.ac'
31--- configure.ac 2012-01-27 15:39:16 +0000
32+++ configure.ac 2012-02-09 22:47:22 +0000
33@@ -8,6 +8,7 @@
34
35 AC_PROG_CC
36 AM_PROG_CC_C_O
37+AC_PROG_CXX
38 AC_DISABLE_STATIC
39 AC_PROG_LIBTOOL
40
41@@ -59,7 +60,8 @@
42 Makefile
43 src/Makefile
44 extensions/Makefile
45- extensions/fts-python/Makefile
46+ extensions/fts++/Makefile
47+ extensions/fts++/test/Makefile
48 data/Makefile
49 data/ontology/Makefile
50 python/Makefile
51
52=== modified file 'extensions/Makefile.am'
53--- extensions/Makefile.am 2011-12-25 16:24:04 +0000
54+++ extensions/Makefile.am 2012-02-09 22:47:22 +0000
55@@ -1,4 +1,4 @@
56-SUBDIRS = fts-python
57+SUBDIRS = fts++
58
59 NULL =
60
61
62=== added directory 'extensions/fts++'
63=== added file 'extensions/fts++/Makefile.am'
64--- extensions/fts++/Makefile.am 1970-01-01 00:00:00 +0000
65+++ extensions/fts++/Makefile.am 2012-02-09 22:47:22 +0000
66@@ -0,0 +1,113 @@
67+SUBDIRS = test
68+NULL =
69+
70+noinst_LTLIBRARIES = libzeitgeist-internal.la
71+libexec_PROGRAMS = zeitgeist-fts
72+
73+servicedir = $(DBUS_SERVICES_DIR)
74+service_DATA = org.gnome.zeitgeist.fts.service
75+
76+org.gnome.zeitgeist.fts.service: org.gnome.zeitgeist.fts.service.in
77+ $(AM_V_GEN)sed -e s!\@libexecdir\@!$(libexecdir)! < $< > $@
78+org.gnome.zeitgeist.fts.service: Makefile
79+
80+AM_CPPFLAGS = \
81+ $(ZEITGEIST_CFLAGS) \
82+ -include $(CONFIG_HEADER) \
83+ -w \
84+ $(NULL)
85+
86+AM_VALAFLAGS = \
87+ --target-glib=2.26 \
88+ --pkg gio-2.0 \
89+ --pkg sqlite3 \
90+ --pkg posix \
91+ --pkg gmodule-2.0 \
92+ $(top_srcdir)/config.vapi \
93+ $(NULL)
94+
95+libzeitgeist_internal_la_VALASOURCES = \
96+ datamodel.vala \
97+ db-reader.vala \
98+ engine.vala \
99+ sql.vala \
100+ remote.vala \
101+ utils.vala \
102+ errors.vala \
103+ table-lookup.vala \
104+ sql-schema.vala \
105+ where-clause.vala \
106+ ontology.vala \
107+ ontology-uris.vala \
108+ mimetype.vala \
109+ ext-dummies.vala \
110+ $(NULL)
111+
112+libzeitgeist_internal_la_SOURCES = \
113+ zeitgeist-internal.stamp \
114+ $(libzeitgeist_internal_la_VALASOURCES:.vala=.c) \
115+ $(NULL)
116+
117+libzeitgeist_internal_la_LIBADD = \
118+ $(ZEITGEIST_LIBS) \
119+ $(NULL)
120+
121+zeitgeist_fts_VALASOURCES = \
122+ zeitgeist-fts.vala \
123+ $(NULL)
124+
125+zeitgeist_fts_SOURCES = \
126+ zeitgeist-fts_vala.stamp \
127+ $(zeitgeist_fts_VALASOURCES:.vala=.c) \
128+ controller.cpp \
129+ controller.h \
130+ fts.cpp \
131+ fts.h \
132+ indexer.cpp \
133+ indexer.h \
134+ task.cpp \
135+ task.h \
136+ stringutils.cpp \
137+ stringutils.h \
138+ $(NULL)
139+
140+zeitgeist_fts_LDADD = \
141+ $(builddir)/libzeitgeist-internal.la \
142+ -lxapian \
143+ $(NULL)
144+
145+BUILT_SOURCES = \
146+ zeitgeist-internal.stamp \
147+ zeitgeist-fts_vala.stamp \
148+ $(NULL)
149+
150+zeitgeist-internal.stamp: $(libzeitgeist_internal_la_VALASOURCES)
151+ $(VALA_V)$(VALAC) $(AM_VALAFLAGS) $(VALAFLAGS) -C -H zeitgeist-internal.h --library zeitgeist-internal $^
152+ @touch "$@"
153+
154+zeitgeist-fts_vala.stamp: $(zeitgeist_fts_VALASOURCES)
155+ $(VALA_V)$(VALAC) $(AM_VALAFLAGS) $(VALAFLAGS) \
156+ $(srcdir)/zeitgeist-internal.vapi $(srcdir)/fts.vapi -C $^
157+ @touch "$@"
158+
159+EXTRA_DIST = \
160+ $(libzeitgeist_internal_la_VALASOURCES) \
161+ $(zeitgeist_fts_VALASOURCES) \
162+ zeitgeist-fts_vala.stamp \
163+ zeitgeist-internal.h \
164+ zeitgeist-internal.vapi \
165+ org.gnome.zeitgeist.fts.service.in \
166+ $(NULL)
167+
168+CLEANFILES = org.gnome.zeitgeist.fts.service
169+
170+DISTCLEANFILES = \
171+ $(NULL)
172+
173+distclean-local:
174+ rm -f *.c *.o *.stamp *.~[0-9]~
175+
176+VALA_V = $(VALA_V_$(V))
177+VALA_V_ = $(VALA_V_$(AM_DEFAULT_VERBOSITY))
178+VALA_V_0 = @echo " VALAC " $^;
179+
180
181=== added file 'extensions/fts++/controller.cpp'
182--- extensions/fts++/controller.cpp 1970-01-01 00:00:00 +0000
183+++ extensions/fts++/controller.cpp 2012-02-09 22:47:22 +0000
184@@ -0,0 +1,136 @@
185+/*
186+ * Copyright (C) 2012 Mikkel Kamstrup Erlandsen
187+ *
188+ * This program is free software; you can redistribute it and/or
189+ * modify it under the terms of the GNU General Public License
190+ * as published by the Free Software Foundation; either version 2
191+ * of the License, or (at your option) any later version.
192+ *
193+ * This program is distributed in the hope that it will be useful,
194+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
195+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
196+ * GNU General Public License for more details.
197+ *
198+ * You should have received a copy of the GNU General Public License
199+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
200+ *
201+ * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
202+ *
203+ */
204+
205+#include "controller.h"
206+
207+namespace ZeitgeistFTS {
208+
209+void Controller::Initialize (GError **error)
210+{
211+ indexer->Initialize (error);
212+}
213+
214+void Controller::Run ()
215+{
216+ if (!indexer->CheckIndex ())
217+ {
218+ indexer->DropIndex ();
219+ RebuildIndex ();
220+ }
221+}
222+
223+void Controller::RebuildIndex ()
224+{
225+ GError *error = NULL;
226+ GPtrArray *events;
227+ GPtrArray *templates = g_ptr_array_new ();
228+ ZeitgeistTimeRange *time_range = zeitgeist_time_range_new_anytime ();
229+
230+ g_debug ("asking reader for all events");
231+ events = zeitgeist_db_reader_find_events (zg_reader,
232+ time_range,
233+ templates,
234+ ZEITGEIST_STORAGE_STATE_ANY,
235+ 0,
236+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
237+ NULL,
238+ &error);
239+
240+ if (error)
241+ {
242+ g_warning ("%s", error->message);
243+ g_error_free (error);
244+ }
245+ else
246+ {
247+ g_debug ("reader returned %u events", events->len);
248+
249+ IndexEvents (events);
250+ g_ptr_array_unref (events);
251+
252+ // Set the db metadata key only once we're done
253+ PushTask (new MetadataTask ("fts_index_version", INDEX_VERSION));
254+ }
255+
256+ g_object_unref (time_range);
257+ g_ptr_array_unref (templates);
258+}
259+
260+void Controller::IndexEvents (GPtrArray *events)
261+{
262+ const int CHUNK_SIZE = 32;
263+ // Break down index tasks into suitable chunks
264+ for (unsigned i = 0; i < events->len; i += CHUNK_SIZE)
265+ {
266+ PushTask (new IndexEventsTask (g_ptr_array_ref (events), i, CHUNK_SIZE));
267+ }
268+}
269+
270+void Controller::DeleteEvents (guint *event_ids, int event_ids_size)
271+{
272+ // FIXME: Should we break the task here as well?
273+ PushTask (new DeleteEventsTask (event_ids, event_ids_size));
274+}
275+
276+void Controller::PushTask (Task* task)
277+{
278+ queued_tasks.push (task);
279+
280+ if (processing_source_id == 0)
281+ {
282+ processing_source_id =
283+ g_idle_add ((GSourceFunc) &Controller::ProcessTask, this);
284+ }
285+}
286+
287+gboolean Controller::ProcessTask ()
288+{
289+ if (!queued_tasks.empty ())
290+ {
291+ Task *task;
292+
293+ task = queued_tasks.front ();
294+ queued_tasks.pop ();
295+
296+ task->Process (indexer);
297+ delete task;
298+ }
299+
300+ bool all_done = queued_tasks.empty ();
301+ if (all_done)
302+ {
303+ indexer->Commit ();
304+ if (processing_source_id != 0)
305+ {
306+ g_source_remove (processing_source_id);
307+ processing_source_id = 0;
308+ }
309+ return FALSE;
310+ }
311+
312+ return TRUE;
313+}
314+
315+bool Controller::HasPendingTasks ()
316+{
317+ return !queued_tasks.empty ();
318+}
319+
320+}
321
322=== added file 'extensions/fts++/controller.h'
323--- extensions/fts++/controller.h 1970-01-01 00:00:00 +0000
324+++ extensions/fts++/controller.h 2012-02-09 22:47:22 +0000
325@@ -0,0 +1,72 @@
326+/*
327+ * Copyright (C) 2012 Mikkel Kamstrup Erlandsen
328+ *
329+ * This program is free software; you can redistribute it and/or
330+ * modify it under the terms of the GNU General Public License
331+ * as published by the Free Software Foundation; either version 2
332+ * of the License, or (at your option) any later version.
333+ *
334+ * This program is distributed in the hope that it will be useful,
335+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
336+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
337+ * GNU General Public License for more details.
338+ *
339+ * You should have received a copy of the GNU General Public License
340+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
341+ *
342+ * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
343+ *
344+ */
345+
346+#ifndef _ZGFTS_CONTROLLER_H_
347+#define _ZGFTS_CONTROLLER_H_
348+
349+#include <glib-object.h>
350+#include <queue>
351+#include <vector>
352+
353+#include "indexer.h"
354+#include "task.h"
355+#include "zeitgeist-internal.h"
356+
357+namespace ZeitgeistFTS {
358+
359+class Controller {
360+public:
361+ Controller (ZeitgeistDbReader *reader)
362+ : zg_reader (reader)
363+ , processing_source_id (0)
364+ , indexer (new Indexer (reader)) {};
365+
366+ ~Controller ()
367+ {
368+ if (processing_source_id != 0)
369+ {
370+ g_source_remove (processing_source_id);
371+ }
372+ }
373+
374+ void Initialize (GError **error);
375+ void Run ();
376+ void RebuildIndex ();
377+
378+ void IndexEvents (GPtrArray *events);
379+ void DeleteEvents (guint *event_ids, int event_ids_size);
380+
381+ void PushTask (Task* task);
382+ bool HasPendingTasks ();
383+ gboolean ProcessTask ();
384+
385+ Indexer *indexer;
386+
387+private:
388+ ZeitgeistDbReader *zg_reader;
389+
390+ typedef std::queue<Task*> TaskQueue;
391+ TaskQueue queued_tasks;
392+ guint processing_source_id;
393+};
394+
395+}
396+
397+#endif /* _ZGFTS_CONTROLLER_H_ */
398
399=== added symlink 'extensions/fts++/datamodel.vala'
400=== target is u'../../src/datamodel.vala'
401=== added symlink 'extensions/fts++/db-reader.vala'
402=== target is u'../../src/db-reader.vala'
403=== added symlink 'extensions/fts++/engine.vala'
404=== target is u'../../src/engine.vala'
405=== added symlink 'extensions/fts++/errors.vala'
406=== target is u'../../src/errors.vala'
407=== added file 'extensions/fts++/ext-dummies.vala'
408--- extensions/fts++/ext-dummies.vala 1970-01-01 00:00:00 +0000
409+++ extensions/fts++/ext-dummies.vala 2012-02-09 22:47:22 +0000
410@@ -0,0 +1,71 @@
411+/* ext-dummies.vala
412+ *
413+ * Copyright © 2011-2012 Michal Hruby <michal.mhr@gmail.com>
414+ *
415+ * This program is free software: you can redistribute it and/or modify
416+ * it under the terms of the GNU Lesser General Public License as published by
417+ * the Free Software Foundation, either version 2.1 of the License, or
418+ * (at your option) any later version.
419+ *
420+ * This program is distributed in the hope that it will be useful,
421+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
422+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
423+ * GNU General Public License for more details.
424+ *
425+ * You should have received a copy of the GNU Lesser General Public License
426+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
427+ *
428+ */
429+
430+namespace Zeitgeist
431+{
432+ public class ExtensionCollection : Object
433+ {
434+ public unowned Engine engine { get; construct; }
435+
436+ public ExtensionCollection (Engine engine)
437+ {
438+ Object (engine: engine);
439+ }
440+
441+ public string[] get_extension_names ()
442+ {
443+ string[] result = {};
444+ return result;
445+ }
446+
447+ public void call_pre_insert_events (GenericArray<Event?> events,
448+ BusName? sender)
449+ {
450+ }
451+
452+ public void call_post_insert_events (GenericArray<Event?> events,
453+ BusName? sender)
454+ {
455+ }
456+
457+ public unowned uint32[] call_pre_delete_events (uint32[] event_ids,
458+ BusName? sender)
459+ {
460+ return event_ids;
461+ }
462+
463+ public void call_post_delete_events (uint32[] event_ids,
464+ BusName? sender)
465+ {
466+ }
467+ }
468+
469+ public class ExtensionStore : Object
470+ {
471+ public unowned Engine engine { get; construct; }
472+
473+ public ExtensionStore (Engine engine)
474+ {
475+ Object (engine: engine);
476+ }
477+ }
478+
479+}
480+
481+// vim:expandtab:ts=4:sw=4
482
483=== added file 'extensions/fts++/fts.cpp'
484--- extensions/fts++/fts.cpp 1970-01-01 00:00:00 +0000
485+++ extensions/fts++/fts.cpp 2012-02-09 22:47:22 +0000
486@@ -0,0 +1,136 @@
487+/*
488+ * Copyright (C) 2012 Canonical Ltd
489+ *
490+ * This program is free software; you can redistribute it and/or
491+ * modify it under the terms of the GNU General Public License
492+ * as published by the Free Software Foundation; either version 2
493+ * of the License, or (at your option) any later version.
494+ *
495+ * This program is distributed in the hope that it will be useful,
496+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
497+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
498+ * GNU General Public License for more details.
499+ *
500+ * You should have received a copy of the GNU General Public License
501+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
502+ *
503+ * Authored by Michal Hruby <michal.hruby@canonical.com>
504+ *
505+ */
506+
507+#include "fts.h"
508+#include "indexer.h"
509+#include "controller.h"
510+
511+ZeitgeistIndexer*
512+zeitgeist_indexer_new (ZeitgeistDbReader *reader, GError **error)
513+{
514+ ZeitgeistFTS::Controller *ctrl;
515+ GError *local_error;
516+
517+ g_return_val_if_fail (ZEITGEIST_IS_DB_READER (reader), NULL);
518+ g_return_val_if_fail (error == NULL || *error == NULL, NULL);
519+
520+ g_setenv ("XAPIAN_CJK_NGRAM", "1", TRUE);
521+ ctrl = new ZeitgeistFTS::Controller (reader);
522+
523+ local_error = NULL;
524+ ctrl->Initialize (&local_error);
525+ if (local_error)
526+ {
527+ delete ctrl;
528+ g_propagate_error (error, local_error);
529+ return NULL;
530+ }
531+
532+
533+ ctrl->Run ();
534+
535+ return (ZeitgeistIndexer*) ctrl;
536+}
537+
538+void
539+zeitgeist_indexer_free (ZeitgeistIndexer* indexer)
540+{
541+ g_return_if_fail (indexer != NULL);
542+
543+ delete (ZeitgeistFTS::Controller*) indexer;
544+}
545+
546+GPtrArray* zeitgeist_indexer_search (ZeitgeistIndexer *indexer,
547+ const gchar *search_string,
548+ ZeitgeistTimeRange *time_range,
549+ GPtrArray *templates,
550+ guint offset,
551+ guint count,
552+ ZeitgeistResultType result_type,
553+ guint *matches,
554+ GError **error)
555+{
556+ GPtrArray *results;
557+ ZeitgeistFTS::Controller *_indexer;
558+
559+ g_return_val_if_fail (indexer != NULL, NULL);
560+ g_return_val_if_fail (search_string != NULL, NULL);
561+ g_return_val_if_fail (ZEITGEIST_IS_TIME_RANGE (time_range), NULL);
562+ g_return_val_if_fail (error == NULL || *error == NULL, NULL);
563+
564+ _indexer = (ZeitgeistFTS::Controller*) indexer;
565+
566+ results = _indexer->indexer->Search (search_string, time_range,
567+ templates, offset, count, result_type,
568+ matches, error);
569+
570+ return results;
571+}
572+
573+void zeitgeist_indexer_index_events (ZeitgeistIndexer *indexer,
574+ GPtrArray *events)
575+{
576+ ZeitgeistFTS::Controller *_indexer;
577+
578+ g_return_if_fail (indexer != NULL);
579+ g_return_if_fail (events != NULL);
580+
581+ _indexer = (ZeitgeistFTS::Controller*) indexer;
582+
583+ _indexer->IndexEvents (events);
584+}
585+
586+void zeitgeist_indexer_delete_events (ZeitgeistIndexer *indexer,
587+ guint *event_ids,
588+ int event_ids_size)
589+{
590+ ZeitgeistFTS::Controller *_indexer;
591+
592+ g_return_if_fail (indexer != NULL);
593+
594+ if (event_ids_size <= 0) return;
595+
596+ _indexer = (ZeitgeistFTS::Controller*) indexer;
597+
598+ _indexer->DeleteEvents (event_ids, event_ids_size);
599+}
600+
601+gboolean zeitgeist_indexer_has_pending_tasks (ZeitgeistIndexer *indexer)
602+{
603+ ZeitgeistFTS::Controller *_indexer;
604+
605+ g_return_val_if_fail (indexer != NULL, FALSE);
606+
607+ _indexer = (ZeitgeistFTS::Controller*) indexer;
608+
609+ return _indexer->HasPendingTasks () ? TRUE : FALSE;
610+}
611+
612+void zeitgeist_indexer_process_task (ZeitgeistIndexer *indexer)
613+{
614+ ZeitgeistFTS::Controller *_indexer;
615+
616+ g_return_if_fail (indexer != NULL);
617+
618+ _indexer = (ZeitgeistFTS::Controller*) indexer;
619+
620+ _indexer->ProcessTask ();
621+}
622+
623
624=== added file 'extensions/fts++/fts.h'
625--- extensions/fts++/fts.h 1970-01-01 00:00:00 +0000
626+++ extensions/fts++/fts.h 2012-02-09 22:47:22 +0000
627@@ -0,0 +1,59 @@
628+/*
629+ * Copyright (C) 2012 Canonical Ltd
630+ *
631+ * This program is free software; you can redistribute it and/or
632+ * modify it under the terms of the GNU General Public License
633+ * as published by the Free Software Foundation; either version 2
634+ * of the License, or (at your option) any later version.
635+ *
636+ * This program is distributed in the hope that it will be useful,
637+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
638+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
639+ * GNU General Public License for more details.
640+ *
641+ * You should have received a copy of the GNU General Public License
642+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
643+ *
644+ * Authored by Michal Hruby <michal.hruby@canonical.com>
645+ *
646+ */
647+
648+#ifndef _ZGFTS_H_
649+#define _ZGFTS_H_
650+
651+#include <glib.h>
652+#include "zeitgeist-internal.h"
653+
654+typedef struct _ZeitgeistIndexer ZeitgeistIndexer;
655+
656+G_BEGIN_DECLS
657+
658+ZeitgeistIndexer* zeitgeist_indexer_new (ZeitgeistDbReader* reader,
659+ GError **error);
660+
661+void zeitgeist_indexer_free (ZeitgeistIndexer* indexer);
662+
663+GPtrArray* zeitgeist_indexer_search (ZeitgeistIndexer *indexer,
664+ const gchar *search_string,
665+ ZeitgeistTimeRange *time_range,
666+ GPtrArray *templates,
667+ guint offset,
668+ guint count,
669+ ZeitgeistResultType result_type,
670+ guint *matches,
671+ GError **error);
672+
673+void zeitgeist_indexer_index_events (ZeitgeistIndexer *indexer,
674+ GPtrArray *events);
675+
676+void zeitgeist_indexer_delete_events (ZeitgeistIndexer *indexer,
677+ guint *event_ids,
678+ int event_ids_size);
679+
680+gboolean zeitgeist_indexer_has_pending_tasks (ZeitgeistIndexer *indexer);
681+
682+void zeitgeist_indexer_process_task (ZeitgeistIndexer *indexer);
683+
684+G_END_DECLS
685+
686+#endif /* _ZGFTS_H_ */
687
688=== added file 'extensions/fts++/fts.vapi'
689--- extensions/fts++/fts.vapi 1970-01-01 00:00:00 +0000
690+++ extensions/fts++/fts.vapi 2012-02-09 22:47:22 +0000
691@@ -0,0 +1,25 @@
692+/* indexer.vapi is hand-written - not a big deal for these ~10 lines */
693+
694+namespace Zeitgeist {
695+ [Compact]
696+ [CCode (free_function = "zeitgeist_indexer_free", cheader_filename = "fts.h")]
697+ public class Indexer {
698+ public Indexer (DbReader reader) throws EngineError;
699+
700+ public GLib.GenericArray<Event> search (string search_string,
701+ TimeRange time_range,
702+ GLib.GenericArray<Event> templates,
703+ uint offset,
704+ uint count,
705+ ResultType result_type,
706+ out uint matches) throws GLib.Error;
707+
708+ public void index_events (GLib.GenericArray<Event> events);
709+
710+ public void delete_events (uint[] event_ids);
711+
712+ public bool has_pending_tasks ();
713+
714+ public void process_task ();
715+ }
716+}
717
718=== added file 'extensions/fts++/indexer.cpp'
719--- extensions/fts++/indexer.cpp 1970-01-01 00:00:00 +0000
720+++ extensions/fts++/indexer.cpp 2012-02-09 22:47:22 +0000
721@@ -0,0 +1,897 @@
722+/*
723+ * Copyright (C) 2012 Canonical Ltd
724+ * 2012 Mikkel Kamstrup Erlandsen
725+ *
726+ * This program is free software; you can redistribute it and/or
727+ * modify it under the terms of the GNU General Public License
728+ * as published by the Free Software Foundation; either version 2
729+ * of the License, or (at your option) any later version.
730+ *
731+ * This program is distributed in the hope that it will be useful,
732+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
733+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
734+ * GNU General Public License for more details.
735+ *
736+ * You should have received a copy of the GNU General Public License
737+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
738+ *
739+ * Authored by Michal Hruby <michal.hruby@canonical.com>
740+ * Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
741+ *
742+ */
743+
744+#include "indexer.h"
745+#include "stringutils.h"
746+#include <xapian.h>
747+#include <queue>
748+#include <vector>
749+
750+#include <gio/gio.h>
751+#include <gio/gdesktopappinfo.h>
752+
753+namespace ZeitgeistFTS {
754+
755+const std::string FILTER_PREFIX_EVENT_INTERPRETATION = "ZGEI";
756+const std::string FILTER_PREFIX_EVENT_MANIFESTATION = "ZGEM";
757+const std::string FILTER_PREFIX_ACTOR = "ZGA";
758+const std::string FILTER_PREFIX_SUBJECT_URI = "ZGSU";
759+const std::string FILTER_PREFIX_SUBJECT_INTERPRETATION = "ZGSI";
760+const std::string FILTER_PREFIX_SUBJECT_MANIFESTATION = "ZGSM";
761+const std::string FILTER_PREFIX_SUBJECT_ORIGIN = "ZGSO";
762+const std::string FILTER_PREFIX_SUBJECT_MIMETYPE = "ZGST";
763+const std::string FILTER_PREFIX_SUBJECT_STORAGE = "ZGSS";
764+const std::string FILTER_PREFIX_XDG_CATEGORY = "AC";
765+
766+const Xapian::valueno VALUE_EVENT_ID = 0;
767+const Xapian::valueno VALUE_TIMESTAMP = 1;
768+
769+#define QUERY_PARSER_FLAGS \
770+ Xapian::QueryParser::FLAG_PHRASE | Xapian::QueryParser::FLAG_BOOLEAN | \
771+ Xapian::QueryParser::FLAG_PURE_NOT | Xapian::QueryParser::FLAG_LOVEHATE | \
772+ Xapian::QueryParser::FLAG_WILDCARD
773+
774+const std::string FTS_MAIN_DIR = "ftspp.index";
775+
776+void Indexer::Initialize (GError **error)
777+{
778+ try
779+ {
780+ if (zeitgeist_utils_using_in_memory_database ())
781+ {
782+ this->db = new Xapian::WritableDatabase;
783+ this->db->add_database (Xapian::InMemory::open ());
784+ }
785+ else
786+ {
787+ gchar *path = g_build_filename (zeitgeist_utils_get_data_path (),
788+ FTS_MAIN_DIR.c_str (), NULL);
789+ this->db = new Xapian::WritableDatabase (path,
790+ Xapian::DB_CREATE_OR_OPEN);
791+ g_free (path);
792+ }
793+
794+ this->tokenizer = new Xapian::TermGenerator ();
795+ this->query_parser = new Xapian::QueryParser ();
796+ this->query_parser->add_prefix ("name", "N");
797+ this->query_parser->add_prefix ("title", "N");
798+ this->query_parser->add_prefix ("site", "S");
799+ this->query_parser->add_prefix ("app", "A");
800+ this->query_parser->add_boolean_prefix ("zgei",
801+ FILTER_PREFIX_EVENT_INTERPRETATION);
802+ this->query_parser->add_boolean_prefix ("zgem",
803+ FILTER_PREFIX_EVENT_MANIFESTATION);
804+ this->query_parser->add_boolean_prefix ("zga", FILTER_PREFIX_ACTOR);
805+ this->query_parser->add_prefix ("zgsu", FILTER_PREFIX_SUBJECT_URI);
806+ this->query_parser->add_boolean_prefix ("zgsi",
807+ FILTER_PREFIX_SUBJECT_INTERPRETATION);
808+ this->query_parser->add_boolean_prefix ("zgsm",
809+ FILTER_PREFIX_SUBJECT_MANIFESTATION);
810+ this->query_parser->add_prefix ("zgso", FILTER_PREFIX_SUBJECT_ORIGIN);
811+ this->query_parser->add_boolean_prefix ("zgst",
812+ FILTER_PREFIX_SUBJECT_MIMETYPE);
813+ this->query_parser->add_boolean_prefix ("zgss",
814+ FILTER_PREFIX_SUBJECT_STORAGE);
815+ this->query_parser->add_prefix ("category", FILTER_PREFIX_XDG_CATEGORY);
816+
817+ this->query_parser->add_valuerangeprocessor (
818+ new Xapian::NumberValueRangeProcessor (VALUE_EVENT_ID, "id"));
819+ this->query_parser->add_valuerangeprocessor (
820+ new Xapian::NumberValueRangeProcessor (VALUE_TIMESTAMP, "ms", false));
821+
822+ this->query_parser->set_default_op (Xapian::Query::OP_AND);
823+ this->query_parser->set_database (*this->db);
824+
825+ this->enquire = new Xapian::Enquire (*this->db);
826+
827+ }
828+ catch (const Xapian::Error &xp_error)
829+ {
830+ g_set_error_literal (error,
831+ ZEITGEIST_ENGINE_ERROR,
832+ ZEITGEIST_ENGINE_ERROR_DATABASE_ERROR,
833+ xp_error.get_msg ().c_str ());
834+ this->db = NULL;
835+ }
836+}
837+
838+/**
839+ * Returns true if and only if the index is good.
840+ * Otherwise the index should be rebuild.
841+ */
842+bool Indexer::CheckIndex ()
843+{
844+ std::string db_version (db->get_metadata ("fts_index_version"));
845+ if (db_version != INDEX_VERSION)
846+ {
847+ g_message ("Index must be upgraded. Doing full rebuild");
848+ return false;
849+ }
850+ else if (db->get_doccount () == 0)
851+ {
852+ g_message ("Empty index detected. Doing full rebuild");
853+ return false;
854+ }
855+
856+ return true;
857+}
858+
859+/**
860+ * Clear the index and create a new empty one
861+ */
862+void Indexer::DropIndex ()
863+{
864+ try
865+ {
866+ if (this->db != NULL)
867+ {
868+ this->db->close ();
869+ delete this->db;
870+ this->db = NULL;
871+ }
872+
873+ if (this->enquire != NULL)
874+ {
875+ delete this->enquire;
876+ this->enquire = NULL;
877+ }
878+
879+ if (zeitgeist_utils_using_in_memory_database ())
880+ {
881+ this->db = new Xapian::WritableDatabase;
882+ this->db->add_database (Xapian::InMemory::open ());
883+ }
884+ else
885+ {
886+ gchar *path = g_build_filename (zeitgeist_utils_get_data_path (),
887+ FTS_MAIN_DIR.c_str (), NULL);
888+ this->db = new Xapian::WritableDatabase (path,
889+ Xapian::DB_CREATE_OR_OVERWRITE);
890+ // FIXME: leaks on error
891+ g_free (path);
892+ }
893+
894+ this->query_parser->set_database (*this->db);
895+ this->enquire = new Xapian::Enquire (*this->db);
896+ }
897+ catch (const Xapian::Error &xp_error)
898+ {
899+ g_error ("Error ocurred during database reindex: %s",
900+ xp_error.get_msg ().c_str ());
901+ }
902+}
903+
904+void Indexer::Commit ()
905+{
906+ try
907+ {
908+ db->commit ();
909+ }
910+ catch (Xapian::Error const& e)
911+ {
912+ g_warning ("Failed to commit changes: %s", e.get_msg ().c_str ());
913+ }
914+}
915+
916+std::string Indexer::ExpandType (std::string const& prefix,
917+ const gchar* unparsed_uri)
918+{
919+ gchar* uri = g_strdup (unparsed_uri);
920+ gboolean is_negation = zeitgeist_utils_parse_negation (&uri);
921+ gboolean noexpand = zeitgeist_utils_parse_noexpand (&uri);
922+
923+ std::string result;
924+ GList *symbols = NULL;
925+ symbols = g_list_append (symbols, uri);
926+ if (!noexpand)
927+ {
928+ GList *children = zeitgeist_symbol_get_all_children (uri);
929+ symbols = g_list_concat (symbols, children);
930+ }
931+
932+ for (GList *iter = symbols; iter != NULL; iter = iter->next)
933+ {
934+ result += prefix + std::string((gchar*) iter->data);
935+ if (iter->next != NULL) result += " OR ";
936+ }
937+
938+ g_list_free (symbols);
939+ g_free (uri);
940+
941+ if (is_negation) result = "NOT (" + result + ")";
942+
943+ return result;
944+}
945+
946+std::string Indexer::CompileEventFilterQuery (GPtrArray *templates)
947+{
948+ std::vector<std::string> query;
949+
950+ for (unsigned i = 0; i < templates->len; i++)
951+ {
952+ const gchar* val;
953+ std::vector<std::string> tmpl;
954+ ZeitgeistEvent *event = (ZeitgeistEvent*) g_ptr_array_index (templates, i);
955+
956+ val = zeitgeist_event_get_interpretation (event);
957+ if (val && val[0] != '\0')
958+ tmpl.push_back (ExpandType ("zgei:", val));
959+
960+ val = zeitgeist_event_get_manifestation (event);
961+ if (val && val[0] != '\0')
962+ tmpl.push_back (ExpandType ("zgem:", val));
963+
964+ val = zeitgeist_event_get_actor (event);
965+ if (val && val[0] != '\0')
966+ tmpl.push_back ("zga:" + StringUtils::MangleUri (val));
967+
968+ GPtrArray *subjects = zeitgeist_event_get_subjects (event);
969+ for (unsigned j = 0; j < subjects->len; j++)
970+ {
971+ ZeitgeistSubject *subject = (ZeitgeistSubject*) g_ptr_array_index (subjects, j);
972+ val = zeitgeist_subject_get_uri (subject);
973+ if (val && val[0] != '\0')
974+ tmpl.push_back ("zgsu:" + StringUtils::MangleUri (val));
975+
976+ val = zeitgeist_subject_get_interpretation (subject);
977+ if (val && val[0] != '\0')
978+ tmpl.push_back (ExpandType ("zgsi:", val));
979+
980+ val = zeitgeist_subject_get_manifestation (subject);
981+ if (val && val[0] != '\0')
982+ tmpl.push_back (ExpandType ("zgsm:", val));
983+
984+ val = zeitgeist_subject_get_origin (subject);
985+ if (val && val[0] != '\0')
986+ tmpl.push_back ("zgso:" + StringUtils::MangleUri (val));
987+
988+ val = zeitgeist_subject_get_mimetype (subject);
989+ if (val && val[0] != '\0')
990+ tmpl.push_back (std::string ("zgst:") + val);
991+
992+ val = zeitgeist_subject_get_storage (subject);
993+ if (val && val[0] != '\0')
994+ tmpl.push_back (std::string ("zgss:") + val);
995+ }
996+
997+ if (tmpl.size () == 0) continue;
998+
999+ std::string event_query ("(");
1000+ for (int i = 0; i < tmpl.size (); i++)
1001+ {
1002+ event_query += tmpl[i];
1003+ if (i < tmpl.size () - 1) event_query += ") AND (";
1004+ }
1005+ query.push_back (event_query + ")");
1006+ }
1007+
1008+ if (query.size () == 0) return std::string ("");
1009+
1010+ std::string result;
1011+ for (int i = 0; i < query.size (); i++)
1012+ {
1013+ result += query[i];
1014+ if (i < query.size () - 1) result += " OR ";
1015+ }
1016+ return result;
1017+}
1018+
1019+std::string Indexer::CompileTimeRangeFilterQuery (gint64 start, gint64 end)
1020+{
1021+ // let's use gprinting to be safe
1022+ gchar *q = g_strdup_printf ("%" G_GINT64_FORMAT "..%" G_GINT64_FORMAT "ms",
1023+ start, end);
1024+ std::string query (q);
1025+ g_free (q);
1026+
1027+ return query;
1028+}
1029+
1030+/**
1031+ * Adds the filtering rules to the doc. Filtering rules will
1032+ * not affect the relevancy ranking of the event/doc
1033+ */
1034+void Indexer::AddDocFilters (ZeitgeistEvent *event, Xapian::Document &doc)
1035+{
1036+ const gchar* val;
1037+
1038+ val = zeitgeist_event_get_interpretation (event);
1039+ if (val && val[0] != '\0')
1040+ doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_EVENT_INTERPRETATION + val));
1041+
1042+ val = zeitgeist_event_get_manifestation (event);
1043+ if (val && val[0] != '\0')
1044+ doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_EVENT_MANIFESTATION + val));
1045+
1046+ val = zeitgeist_event_get_actor (event);
1047+ if (val && val[0] != '\0')
1048+ doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_ACTOR + StringUtils::MangleUri (val)));
1049+
1050+ GPtrArray *subjects = zeitgeist_event_get_subjects (event);
1051+ for (unsigned j = 0; j < subjects->len; j++)
1052+ {
1053+ ZeitgeistSubject *subject = (ZeitgeistSubject*) g_ptr_array_index (subjects, j);
1054+ val = zeitgeist_subject_get_uri (subject);
1055+ if (val && val[0] != '\0')
1056+ doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_URI + StringUtils::MangleUri (val)));
1057+
1058+ val = zeitgeist_subject_get_interpretation (subject);
1059+ if (val && val[0] != '\0')
1060+ doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_INTERPRETATION + val));
1061+
1062+ val = zeitgeist_subject_get_manifestation (subject);
1063+ if (val && val[0] != '\0')
1064+ doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_MANIFESTATION + val));
1065+
1066+ val = zeitgeist_subject_get_origin (subject);
1067+ if (val && val[0] != '\0')
1068+ doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_ORIGIN + StringUtils::MangleUri (val)));
1069+
1070+ val = zeitgeist_subject_get_mimetype (subject);
1071+ if (val && val[0] != '\0')
1072+ doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_MIMETYPE + val));
1073+
1074+ val = zeitgeist_subject_get_storage (subject);
1075+ if (val && val[0] != '\0')
1076+ doc.add_boolean_term (StringUtils::Truncate (FILTER_PREFIX_SUBJECT_STORAGE + val));
1077+ }
1078+}
1079+
1080+void Indexer::IndexText (std::string const& text)
1081+{
1082+ // FIXME: ascii folding!
1083+ tokenizer->index_text (text, 5);
1084+}
1085+
1086+void Indexer::IndexUri (std::string const& uri, std::string const& origin)
1087+{
1088+ GFile *f = g_file_new_for_uri (uri.c_str ());
1089+
1090+ gchar *scheme = g_file_get_uri_scheme (f);
1091+ if (scheme == NULL)
1092+ {
1093+ g_warning ("Invalid URI: %s", uri.c_str ());
1094+ return;
1095+ }
1096+
1097+ std::string scheme_str(scheme);
1098+ g_free (scheme);
1099+
1100+ if (scheme_str == "file")
1101+ {
1102+ // FIXME: special case some typical filenames (like photos)
1103+ // examples of typical filenames from cameras:
1104+ // P07-08-08_16.25.JPG
1105+ // P070608_18.54.JPG
1106+ // P180308_22.27[1].jpg
1107+ // P6220111.JPG
1108+ // PC220006.JPG
1109+ // DSCN0149.JPG
1110+ // DSC01166.JPG
1111+ // SDC12583.JPG
1112+ // IMGP3199.JPG
1113+ // IMGP1251-4.jpg
1114+ // IMG_101_8987.JPG
1115+ // 10052010152.jpg
1116+ // 4867_93080512835_623012835_1949065_8351752_n.jpg
1117+ // 2011-05-29 10.49.37.jpg
1118+ // V100908_11.24.AVI
1119+ // video-2011-05-29-15-14-58.mp4
1120+
1121+ // get_parse_name will convert escaped characters to UTF-8, but only for
1122+ // the "file" scheme, so using it elsewhere won't be of much help
1123+
1124+ gchar *pn = g_file_get_parse_name (f);
1125+ gchar *basename = g_path_get_basename (pn);
1126+
1127+ // FIXME: remove unscores, CamelCase and process digits
1128+ tokenizer->index_text (basename, 5);
1129+ tokenizer->index_text (basename, 5, "N");
1130+
1131+ g_free (basename);
1132+ // limit the directory indexing to just a few levels
1133+ // (the original formula was weight = 5.0 / (1.5^n)
1134+ unsigned path_weights[] = { 3, 2, 1, 0 };
1135+ unsigned weight_index = 0;
1136+
1137+ // this should be equal to origin, but we already got a nice utf-8 display
1138+ // name, so we'll use that
1139+ gchar *dir = g_path_get_dirname (pn);
1140+ std::string path_component (dir);
1141+ g_free (dir);
1142+ g_free (pn);
1143+
1144+ while (path_component.length () > 2 &&
1145+ weight_index < G_N_ELEMENTS (path_weights))
1146+ {
1147+ // if this is already home directory we don't want it
1148+ if (path_component.length () == home_dir_path.length () &&
1149+ path_component == home_dir_path) return;
1150+
1151+ gchar *name = g_path_get_basename (path_component.c_str ());
1152+
1153+ // FIXME: un-underscore, uncamelcase, ascii fold
1154+ tokenizer->index_text (name, path_weights[weight_index++]);
1155+
1156+ dir = g_path_get_dirname (path_component.c_str ());
1157+ path_component = dir;
1158+ g_free (dir);
1159+ g_free (name);
1160+ }
1161+ }
1162+ else if (scheme_str == "mailto")
1163+ {
1164+ // mailto:username@server.com
1165+ size_t scheme_len = scheme_str.length () + 1;
1166+ size_t at_pos = uri.find ('@', scheme_len);
1167+ if (at_pos == std::string::npos) return;
1168+
1169+ tokenizer->index_text (uri.substr (scheme_len, at_pos - scheme_len), 5);
1170+ tokenizer->index_text (uri.substr (at_pos + 1), 1);
1171+ }
1172+ else if (scheme_str.compare (0, 4, "http") == 0)
1173+ {
1174+ // http / https - we'll index just the basename of the uri (minus query
1175+ // part) and the hostname/domain
1176+
1177+ // step 1) strip query part
1178+ gchar *basename;
1179+ size_t question_mark = uri.find ('?');
1180+ if (question_mark != std::string::npos)
1181+ {
1182+ std::string stripped (uri, 0, question_mark - 1);
1183+ basename = g_path_get_basename (stripped.c_str ());
1184+ }
1185+ else
1186+ {
1187+ basename = g_file_get_basename (f);
1188+ }
1189+
1190+ // step 2) unescape and check that it's valid utf8
1191+ gchar *unescaped_basename = g_uri_unescape_string (basename, "");
1192+
1193+ if (g_utf8_validate (unescaped_basename, -1, NULL))
1194+ {
1195+ // FIXME: remove unscores, CamelCase and process digits
1196+ tokenizer->index_text (unescaped_basename, 5);
1197+ tokenizer->index_text (unescaped_basename, 5, "N");
1198+ }
1199+
1200+ // and also index hostname (taken from origin field if possible)
1201+ std::string host_str (origin.empty () ? uri : origin);
1202+ size_t hostname_start = host_str.find ("://");
1203+ if (hostname_start != std::string::npos)
1204+ {
1205+ std::string hostname (host_str, hostname_start + 3);
1206+ size_t slash_pos = hostname.find ("/");
1207+ if (slash_pos != std::string::npos) hostname.resize (slash_pos);
1208+
1209+ // support IDN
1210+ if (g_hostname_is_ascii_encoded (hostname.c_str ()))
1211+ {
1212+ gchar *printable_hostname = g_hostname_to_unicode (hostname.c_str ());
1213+ if (printable_hostname != NULL) hostname = printable_hostname;
1214+ g_free (printable_hostname);
1215+ }
1216+
1217+ tokenizer->index_text (hostname, 2);
1218+ tokenizer->index_text (hostname, 2, "N");
1219+ tokenizer->index_text (hostname, 2, "S");
1220+ }
1221+
1222+ g_free (unescaped_basename);
1223+ g_free (basename);
1224+ }
1225+ else if (scheme_str == "data")
1226+ {
1227+ // we *really* don't want to index anything with this scheme
1228+ }
1229+ else
1230+ {
1231+ std::string authority, path, query;
1232+ StringUtils::SplitUri (uri, authority, path, query);
1233+
1234+ if (!path.empty ())
1235+ {
1236+ gchar *basename = g_path_get_basename (path.c_str ());
1237+ gchar *unescaped_basename = g_uri_unescape_string (basename, "");
1238+
1239+ if (g_utf8_validate (unescaped_basename, -1, NULL))
1240+ {
1241+ std::string capped (StringUtils::Truncate (unescaped_basename, 30));
1242+ tokenizer->index_text (capped, 5);
1243+ tokenizer->index_text (capped, 5, "N");
1244+ }
1245+
1246+ // FIXME: rest of the path?
1247+ g_free (unescaped_basename);
1248+ g_free (basename);
1249+ }
1250+
1251+ if (!authority.empty ())
1252+ {
1253+ std::string capped (StringUtils::Truncate (authority, 30));
1254+
1255+ tokenizer->index_text (capped, 2);
1256+ tokenizer->index_text (capped, 2, "N");
1257+ tokenizer->index_text (capped, 2, "S");
1258+ }
1259+ }
1260+
1261+ g_object_unref (f);
1262+}
1263+
1264+bool Indexer::IndexActor (std::string const& actor, bool is_subject)
1265+{
1266+ GDesktopAppInfo *dai = NULL;
1267+ // check the cache first
1268+ GAppInfo *ai = app_info_cache[actor];
1269+
1270+ if (ai == NULL)
1271+ {
1272+ // check also the failed cache
1273+ if (failed_lookups.count (actor) != 0) return false;
1274+
1275+ // and now try to load from the disk
1276+ if (g_path_is_absolute (actor.c_str ()))
1277+ {
1278+ dai = g_desktop_app_info_new_from_filename (actor.c_str ());
1279+ }
1280+ else if (g_str_has_prefix (actor.c_str (), "application://"))
1281+ {
1282+ dai = g_desktop_app_info_new (actor.substr (14).c_str ());
1283+ }
1284+
1285+ if (dai != NULL)
1286+ {
1287+ ai = G_APP_INFO (dai);
1288+ app_info_cache[actor] = ai;
1289+ }
1290+ else
1291+ {
1292+ // cache failed lookup
1293+ failed_lookups.insert (actor);
1294+ if (clear_failed_id == 0)
1295+ {
1296+ // but clear the failed cache in 30 seconds
1297+ clear_failed_id = g_timeout_add_seconds (30,
1298+ (GSourceFunc) &Indexer::ClearFailedLookupsCb, this);
1299+ }
1300+ }
1301+ }
1302+ else
1303+ {
1304+ dai = G_DESKTOP_APP_INFO (ai);
1305+ }
1306+
1307+ if (dai == NULL)
1308+ {
1309+ g_warning ("Unable to get info on %s", actor.c_str ());
1310+ return false;
1311+ }
1312+
1313+ const gchar *val;
1314+ unsigned name_weight = is_subject ? 5 : 2;
1315+ unsigned comment_weight = 2;
1316+
1317+ // FIXME: ascii folding somewhere
1318+
1319+ val = g_app_info_get_display_name (ai);
1320+ if (val && val[0] != '\0')
1321+ {
1322+ std::string display_name (val);
1323+ tokenizer->index_text (display_name, name_weight);
1324+ tokenizer->index_text (display_name, name_weight, "A");
1325+ }
1326+
1327+ val = g_desktop_app_info_get_generic_name (dai);
1328+ if (val && val[0] != '\0')
1329+ {
1330+ std::string generic_name (val);
1331+ tokenizer->index_text (generic_name, name_weight);
1332+ tokenizer->index_text (generic_name, name_weight, "A");
1333+ }
1334+
1335+ if (!is_subject) return true;
1336+ // the rest of the code only applies to events with application subject uris:
1337+ // index the comment field, add category terms, index keywords
1338+
1339+ val = g_app_info_get_description (ai);
1340+ if (val && val[0] != '\0')
1341+ {
1342+ std::string comment (val);
1343+ tokenizer->index_text (comment, comment_weight);
1344+ tokenizer->index_text (comment, comment_weight, "A");
1345+ }
1346+
1347+ val = g_desktop_app_info_get_categories (dai);
1348+ if (val && val[0] != '\0')
1349+ {
1350+ gchar **categories = g_strsplit (val, ";", 0);
1351+ Xapian::Document doc(tokenizer->get_document ());
1352+ for (gchar **iter = categories; *iter != NULL; ++iter)
1353+ {
1354+ // FIXME: what if this isn't ascii? but it should, that's what
1355+ // the fdo menu spec says
1356+ gchar *category = g_ascii_strdown (*iter, -1);
1357+ doc.add_boolean_term (FILTER_PREFIX_XDG_CATEGORY + category);
1358+ g_free (category);
1359+ }
1360+ g_strfreev (categories);
1361+ }
1362+
1363+ return true;
1364+}
1365+
1366+GPtrArray* Indexer::Search (const gchar *search_string,
1367+ ZeitgeistTimeRange *time_range,
1368+ GPtrArray *templates,
1369+ guint offset,
1370+ guint count,
1371+ ZeitgeistResultType result_type,
1372+ guint *matches,
1373+ GError **error)
1374+{
1375+ GPtrArray *results = NULL;
1376+ try
1377+ {
1378+ std::string query_string(search_string);
1379+
1380+ if (templates && templates->len > 0)
1381+ {
1382+ std::string filters (CompileEventFilterQuery (templates));
1383+ query_string = "(" + query_string + ") AND (" + filters + ")";
1384+ }
1385+
1386+ if (time_range)
1387+ {
1388+ gint64 start_time = zeitgeist_time_range_get_start (time_range);
1389+ gint64 end_time = zeitgeist_time_range_get_end (time_range);
1390+
1391+ if (start_time > 0 || end_time < G_MAXINT64)
1392+ {
1393+ std::string time_filter (CompileTimeRangeFilterQuery (start_time, end_time));
1394+ query_string = "(" + query_string + ") AND (" + time_filter + ")";
1395+ }
1396+ }
1397+
1398+ // FIXME: which result types coalesce?
1399+ guint maxhits = count * 3;
1400+
1401+ if (result_type == 100)
1402+ {
1403+ enquire->set_sort_by_relevance ();
1404+ }
1405+ else
1406+ {
1407+ enquire->set_sort_by_value (VALUE_TIMESTAMP, true);
1408+ }
1409+
1410+ g_debug ("query: %s", query_string.c_str ());
1411+ Xapian::Query q(query_parser->parse_query (query_string, QUERY_PARSER_FLAGS));
1412+ enquire->set_query (q);
1413+ Xapian::MSet hits (enquire->get_mset (offset, maxhits));
1414+ Xapian::doccount hitcount = hits.get_matches_estimated ();
1415+
1416+ if (result_type == 100)
1417+ {
1418+ std::vector<unsigned> event_ids;
1419+ for (Xapian::MSetIterator iter = hits.begin (); iter != hits.end (); ++iter)
1420+ {
1421+ Xapian::Document doc(iter.get_document ());
1422+ double unserialized =
1423+ Xapian::sortable_unserialise(doc.get_value (VALUE_EVENT_ID));
1424+ event_ids.push_back (static_cast<unsigned>(unserialized));
1425+ }
1426+
1427+ results = zeitgeist_db_reader_get_events (zg_reader,
1428+ &event_ids[0],
1429+ event_ids.size (),
1430+ NULL,
1431+ error);
1432+ }
1433+ else
1434+ {
1435+ GPtrArray *event_templates;
1436+ event_templates = g_ptr_array_new_with_free_func (g_object_unref);
1437+ for (Xapian::MSetIterator iter = hits.begin (); iter != hits.end (); ++iter)
1438+ {
1439+ Xapian::Document doc(iter.get_document ());
1440+ double unserialized =
1441+ Xapian::sortable_unserialise(doc.get_value (VALUE_EVENT_ID));
1442+ // this doesn't need ref sinking, does it?
1443+ ZeitgeistEvent *event = zeitgeist_event_new ();
1444+ zeitgeist_event_set_id (event, static_cast<unsigned>(unserialized));
1445+ g_ptr_array_add (event_templates, event);
1446+ }
1447+
1448+ if (event_templates->len > 0)
1449+ {
1450+ ZeitgeistTimeRange *time_range = zeitgeist_time_range_new_anytime ();
1451+ results = zeitgeist_db_reader_find_events (zg_reader,
1452+ time_range,
1453+ event_templates,
1454+ ZEITGEIST_STORAGE_STATE_ANY,
1455+ 0,
1456+ result_type,
1457+ NULL,
1458+ error);
1459+
1460+ g_object_unref (time_range);
1461+ }
1462+ else
1463+ {
1464+ results = g_ptr_array_new ();
1465+ }
1466+
1467+ g_ptr_array_unref (event_templates);
1468+ }
1469+
1470+ if (matches)
1471+ {
1472+ *matches = hitcount;
1473+ }
1474+ }
1475+ catch (Xapian::Error const& e)
1476+ {
1477+ g_warning ("Failed to index event: %s", e.get_msg ().c_str ());
1478+ g_set_error_literal (error,
1479+ ZEITGEIST_ENGINE_ERROR,
1480+ ZEITGEIST_ENGINE_ERROR_DATABASE_ERROR,
1481+ e.get_msg ().c_str ());
1482+ }
1483+
1484+ return results;
1485+}
1486+
1487+void Indexer::IndexEvent (ZeitgeistEvent *event)
1488+{
1489+ try
1490+ {
1491+ // FIXME: we need to special case MOVE_EVENTs
1492+ const gchar *val;
1493+ guint event_id = zeitgeist_event_get_id (event);
1494+ g_return_if_fail (event_id > 0);
1495+
1496+ g_debug ("Indexing event with ID: %u", event_id);
1497+
1498+ Xapian::Document doc;
1499+ doc.add_value (VALUE_EVENT_ID,
1500+ Xapian::sortable_serialise (static_cast<double>(event_id)));
1501+ doc.add_value (VALUE_TIMESTAMP,
1502+ Xapian::sortable_serialise (static_cast<double>(zeitgeist_event_get_timestamp (event))));
1503+
1504+ tokenizer->set_document (doc);
1505+
1506+ val = zeitgeist_event_get_actor (event);
1507+ if (val && val[0] != '\0')
1508+ {
1509+ // it's nice that searching for "gedit" will find all files you worked
1510+ // with in gedit, but the relevancy has to be low
1511+ IndexActor (val, false);
1512+ }
1513+
1514+ GPtrArray *subjects = zeitgeist_event_get_subjects (event);
1515+ for (unsigned i = 0; i < subjects->len; i++)
1516+ {
1517+ ZeitgeistSubject *subject;
1518+ subject = (ZeitgeistSubject*) g_ptr_array_index (subjects, i);
1519+
1520+ val = zeitgeist_subject_get_uri (subject);
1521+ if (val == NULL || val[0] == '\0') continue;
1522+
1523+ std::string uri(val);
1524+
1525+ if (uri.length () > 512)
1526+ {
1527+ g_warning ("URI too long (%lu). Discarding:\n%s",
1528+ uri.length (), uri.substr (0, 32).c_str ());
1529+ return; // ignore this event completely...
1530+ }
1531+
1532+ val = zeitgeist_subject_get_text (subject);
1533+ if (val && val[0] != '\0')
1534+ {
1535+ IndexText (val);
1536+ }
1537+
1538+ val = zeitgeist_subject_get_origin (subject);
1539+ std::string origin (val != NULL ? val : "");
1540+
1541+ if (uri.compare (0, 14, "application://") == 0)
1542+ {
1543+ if (!IndexActor (uri, true))
1544+ IndexUri (uri, origin);
1545+ }
1546+ else
1547+ {
1548+ IndexUri (uri, origin);
1549+ }
1550+ }
1551+
1552+ AddDocFilters (event, doc);
1553+
1554+ this->db->add_document (doc);
1555+ }
1556+ catch (Xapian::Error const& e)
1557+ {
1558+ g_warning ("Failed to index event: %s", e.get_msg ().c_str ());
1559+ }
1560+}
1561+
1562+void Indexer::DeleteEvent (guint32 event_id)
1563+{
1564+ g_debug ("Deleting event with ID: %u", event_id);
1565+
1566+ try
1567+ {
1568+ std::string id(Xapian::sortable_serialise (static_cast<double>(event_id)));
1569+ Xapian::Query query (Xapian::Query::OP_VALUE_RANGE, VALUE_EVENT_ID, id, id);
1570+
1571+ enquire->set_query(query);
1572+ Xapian::MSet mset = enquire->get_mset(0, 10);
1573+
1574+ Xapian::doccount total = mset.get_matches_estimated();
1575+ if (total > 1)
1576+ {
1577+ g_warning ("More than one event found with id '%s", id.c_str ());
1578+ }
1579+ else if (total == 0)
1580+ {
1581+ g_warning ("No event for id '%s'", id.c_str ());
1582+ return;
1583+ }
1584+
1585+ Xapian::MSetIterator i, end;
1586+ for (i= mset.begin(), end = mset.end(); i != end; i++)
1587+ {
1588+ db->delete_document (*i);
1589+ }
1590+ }
1591+ catch (Xapian::Error const& e)
1592+ {
1593+ g_warning ("Failed to delete event '%u': %s",
1594+ event_id, e.get_msg().c_str ());
1595+ }
1596+}
1597+
1598+void Indexer::SetDbMetadata (std::string const& key, std::string const& value)
1599+{
1600+ try
1601+ {
1602+ db->set_metadata (key, value);
1603+ }
1604+ catch (Xapian::Error const& e)
1605+ {
1606+ g_warning ("Failed to set metadata: %s", e.get_msg ().c_str ());
1607+ }
1608+}
1609+
1610+gboolean Indexer::ClearFailedLookupsCb ()
1611+{
1612+ failed_lookups.clear ();
1613+
1614+ clear_failed_id = 0;
1615+ return FALSE;
1616+}
1617+
1618+} /* namespace */
1619
1620=== added file 'extensions/fts++/indexer.h'
1621--- extensions/fts++/indexer.h 1970-01-01 00:00:00 +0000
1622+++ extensions/fts++/indexer.h 2012-02-09 22:47:22 +0000
1623@@ -0,0 +1,115 @@
1624+/*
1625+ * Copyright (C) 2012 Canonical Ltd
1626+ *
1627+ * This program is free software; you can redistribute it and/or
1628+ * modify it under the terms of the GNU General Public License
1629+ * as published by the Free Software Foundation; either version 2
1630+ * of the License, or (at your option) any later version.
1631+ *
1632+ * This program is distributed in the hope that it will be useful,
1633+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
1634+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
1635+ * GNU General Public License for more details.
1636+ *
1637+ * You should have received a copy of the GNU General Public License
1638+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
1639+ *
1640+ * Authored by Michal Hruby <michal.hruby@canonical.com>
1641+ *
1642+ */
1643+
1644+#ifndef _ZGFTS_INDEXER_H_
1645+#define _ZGFTS_INDEXER_H_
1646+
1647+#include <glib-object.h>
1648+#include <gio/gio.h>
1649+#include <xapian.h>
1650+
1651+#include "zeitgeist-internal.h"
1652+
1653+namespace ZeitgeistFTS {
1654+
1655+const std::string INDEX_VERSION = "1";
1656+
1657+class Indexer
1658+{
1659+public:
1660+ typedef std::map<std::string, GAppInfo*> AppInfoMap;
1661+ typedef std::set<std::string> ApplicationSet;
1662+
1663+ Indexer (ZeitgeistDbReader *reader)
1664+ : zg_reader (reader)
1665+ , db (NULL)
1666+ , query_parser (NULL)
1667+ , enquire (NULL)
1668+ , tokenizer (NULL)
1669+ , clear_failed_id (0)
1670+ {
1671+ const gchar *home_dir = g_get_home_dir ();
1672+ home_dir_path = home_dir != NULL ? home_dir : "/home";
1673+ }
1674+
1675+ ~Indexer ()
1676+ {
1677+ if (tokenizer) delete tokenizer;
1678+ if (enquire) delete enquire;
1679+ if (query_parser) delete query_parser;
1680+ if (db) delete db;
1681+
1682+ for (AppInfoMap::iterator it = app_info_cache.begin ();
1683+ it != app_info_cache.end (); ++it)
1684+ {
1685+ g_object_unref (it->second);
1686+ }
1687+
1688+ if (clear_failed_id != 0)
1689+ {
1690+ g_source_remove (clear_failed_id);
1691+ }
1692+ }
1693+
1694+ void Initialize (GError **error);
1695+ bool CheckIndex ();
1696+ void DropIndex ();
1697+ void Commit ();
1698+
1699+ void IndexEvent (ZeitgeistEvent *event);
1700+ void DeleteEvent (guint32 event_id);
1701+ void SetDbMetadata (std::string const& key, std::string const& value);
1702+
1703+ GPtrArray* Search (const gchar *search_string,
1704+ ZeitgeistTimeRange *time_range,
1705+ GPtrArray *templates,
1706+ guint offset,
1707+ guint count,
1708+ ZeitgeistResultType result_type,
1709+ guint *matches,
1710+ GError **error);
1711+
1712+private:
1713+ std::string ExpandType (std::string const& prefix, const gchar* unparsed_uri);
1714+ std::string CompileEventFilterQuery (GPtrArray *templates);
1715+ std::string CompileTimeRangeFilterQuery (gint64 start, gint64 end);
1716+
1717+ void AddDocFilters (ZeitgeistEvent *event, Xapian::Document &doc);
1718+ void IndexText (std::string const& text);
1719+ void IndexUri (std::string const& uri, std::string const& origin);
1720+ bool IndexActor (std::string const& actor, bool is_subject);
1721+
1722+ gboolean ClearFailedLookupsCb ();
1723+
1724+ ZeitgeistDbReader *zg_reader;
1725+ Xapian::WritableDatabase *db;
1726+ Xapian::QueryParser *query_parser;
1727+ Xapian::Enquire *enquire;
1728+ Xapian::TermGenerator *tokenizer;
1729+ AppInfoMap app_info_cache;
1730+ ApplicationSet failed_lookups;
1731+
1732+ guint clear_failed_id;
1733+ std::string home_dir_path;
1734+};
1735+
1736+}
1737+
1738+#endif /* _ZGFTS_INDEXER_H_ */
1739
1740=== added symlink 'extensions/fts++/mimetype.vala'
1741=== target is u'../../src/mimetype.vala'
1742=== added symlink 'extensions/fts++/ontology-uris.vala'
1743=== target is u'../../src/ontology-uris.vala'
1744=== added symlink 'extensions/fts++/ontology.vala'
1745=== target is u'../../src/ontology.vala'
1746=== added file 'extensions/fts++/org.gnome.zeitgeist.fts.service.in'
1747--- extensions/fts++/org.gnome.zeitgeist.fts.service.in 1970-01-01 00:00:00 +0000
1748+++ extensions/fts++/org.gnome.zeitgeist.fts.service.in 2012-02-09 22:47:22 +0000
1749@@ -0,0 +1,3 @@
1750+[D-BUS Service]
1751+Name=org.gnome.zeitgeist.SimpleIndexer
1752+Exec=@libexecdir@/zeitgeist-fts
1753
1754=== added symlink 'extensions/fts++/remote.vala'
1755=== target is u'../../src/remote.vala'
1756=== added symlink 'extensions/fts++/sql-schema.vala'
1757=== target is u'../../src/sql-schema.vala'
1758=== added symlink 'extensions/fts++/sql.vala'
1759=== target is u'../../src/sql.vala'
1760=== added file 'extensions/fts++/stringutils.cpp'
1761--- extensions/fts++/stringutils.cpp 1970-01-01 00:00:00 +0000
1762+++ extensions/fts++/stringutils.cpp 2012-02-09 22:47:22 +0000
1763@@ -0,0 +1,128 @@
1764+/*
1765+ * Copyright (C) 2012 Mikkel Kamstrup Erlandsen
1766+ *
1767+ * This program is free software; you can redistribute it and/or
1768+ * modify it under the terms of the GNU General Public License
1769+ * as published by the Free Software Foundation; either version 2
1770+ * of the License, or (at your option) any later version.
1771+ *
1772+ * This program is distributed in the hope that it will be useful,
1773+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
1774+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
1775+ * GNU General Public License for more details.
1776+ *
1777+ * You should have received a copy of the GNU General Public License
1778+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
1779+ *
1780+ * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
1781+ *
1782+ */
1783+#include <string>
1784+
1785+#include "stringutils.h"
1786+
1787+using namespace std;
1788+
1789+namespace ZeitgeistFTS {
1790+
1791+namespace StringUtils {
1792+
1793+/**
1794+ * Make sure s has equal or less than 'nbytes' bytes making sure the returned
1795+ * string is still valid UTF-8.
1796+ *
1797+ * NOTE: It is assumed the input string is valid UTF-8. Untrusted text
1798+ * should be validated with g_utf8_validate().
1799+ *
1800+ * This function useful for working with Xapian terms because Xapian has
1801+ * a max term length of 245 (which is not very well documented, but see
1802+ * http://xapian.org/docs/omega/termprefixes.html).
1803+ */
1804+string Truncate (string const& s, unsigned int nbytes)
1805+{
1806+ const gchar *str = s.c_str();
1807+ const gchar *iter = str;
1808+
1809+ nbytes = MIN(nbytes, s.length());
1810+
1811+ while (iter - str < nbytes)
1812+ {
1813+ const gchar *tmp = g_utf8_next_char (iter);
1814+ if (tmp - str > nbytes) break;
1815+ iter = tmp;
1816+ }
1817+
1818+
1819+ return s.substr(0, iter - str);
1820+}
1821+
1822+/**
1823+ * Converts a URI into an index- and query friendly string. The problem
1824+ * is that Xapian doesn't handle CAPITAL letters or most non-alphanumeric
1825+ * symbols in a boolean term when it does prefix matching. The mangled
1826+ * URIs returned from this function are suitable for boolean prefix searches.
1827+ *
1828+ * IMPORTANT: This is a 1-way function! You can not convert back.
1829+ */
1830+string MangleUri (string const& orig)
1831+{
1832+ string s(orig);
1833+ size_t pos = 0;
1834+ while ((pos = s.find_first_of (": /", pos)) != string::npos)
1835+ {
1836+ s.replace (pos, 1, 1, '_');
1837+ pos++;
1838+ }
1839+
1840+ return s;
1841+}
1842+
1843+/**
1844+ * This method expects a valid uri and tries to split it into authority,
1845+ * path and query.
1846+ *
1847+ * Note that any and all parts may be left untouched.
1848+ */
1849+void SplitUri (string const& uri, string &authority,
1850+ string &path, string &query)
1851+{
1852+ size_t colon_pos = uri.find (':');
1853+ if (colon_pos == string::npos) return; // not an uri?
1854+ bool has_double_slash = uri.length () > colon_pos + 2 &&
1855+ uri.compare (colon_pos + 1, 2, "//") == 0;
1856+
1857+ size_t start_pos = has_double_slash ? colon_pos + 3 : colon_pos + 1;
1858+
1859+ size_t first_slash = uri.find ('/', start_pos);
1860+ size_t question_mark_pos = uri.find ('?', first_slash == string::npos ?
1861+ start_pos : first_slash + 1);
1862+
1863+ authority = uri.substr (start_pos);
1864+ if (first_slash != string::npos)
1865+ {
1866+ authority.resize (first_slash - start_pos);
1867+ }
1868+ else if (question_mark_pos != string::npos)
1869+ {
1870+ authority.resize (question_mark_pos - start_pos);
1871+ }
1872+
1873+ if (first_slash == string::npos)
1874+ {
1875+ first_slash = start_pos + authority.length ();
1876+ }
1877+
1878+ if (question_mark_pos != string::npos)
1879+ {
1880+ path = uri.substr (first_slash, question_mark_pos - first_slash);
1881+ query = uri.substr (question_mark_pos + 1);
1882+ }
1883+ else
1884+ {
1885+ path = uri.substr (first_slash);
1886+ }
1887+}
1888+
1889+} /* namespace StringUtils */
1890+
1891+} /* namespace ZeitgeistFTS */
1892
1893=== added file 'extensions/fts++/stringutils.h'
1894--- extensions/fts++/stringutils.h 1970-01-01 00:00:00 +0000
1895+++ extensions/fts++/stringutils.h 2012-02-09 22:47:22 +0000
1896@@ -0,0 +1,42 @@
1897+/*
1898+ * Copyright (C) 2012 Mikkel Kamstrup Erlandsen
1899+ *
1900+ * This program is free software; you can redistribute it and/or
1901+ * modify it under the terms of the GNU General Public License
1902+ * as published by the Free Software Foundation; either version 2
1903+ * of the License, or (at your option) any later version.
1904+ *
1905+ * This program is distributed in the hope that it will be useful,
1906+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
1907+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
1908+ * GNU General Public License for more details.
1909+ *
1910+ * You should have received a copy of the GNU General Public License
1911+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
1912+ *
1913+ * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
1914+ *
1915+ */
1916+
1917+#include <string>
1918+#include <glib.h>
1919+
1920+namespace ZeitgeistFTS {
1921+
1922+namespace StringUtils {
1923+
1924+const unsigned int MAX_TERM_LENGTH = 245;
1925+
1926+std::string Truncate (std::string const& s,
1927+ unsigned int nbytes = MAX_TERM_LENGTH);
1928+
1929+std::string MangleUri (std::string const& orig);
1930+
1931+void SplitUri (std::string const& uri,
1932+ std::string &host,
1933+ std::string &path,
1934+ std::string &basename);
1935+
1936+} /* namespace StringUtils */
1937+
1938+} /* namespace ZeitgeistFTS */
1939
1940=== added symlink 'extensions/fts++/table-lookup.vala'
1941=== target is u'../../src/table-lookup.vala'
1942=== added file 'extensions/fts++/task.cpp'
1943--- extensions/fts++/task.cpp 1970-01-01 00:00:00 +0000
1944+++ extensions/fts++/task.cpp 2012-02-09 22:47:22 +0000
1945@@ -0,0 +1,47 @@
1946+/*
1947+ * Copyright (C) 2012 Canonical Ltd
1948+ *
1949+ * This program is free software; you can redistribute it and/or
1950+ * modify it under the terms of the GNU General Public License
1951+ * as published by the Free Software Foundation; either version 2
1952+ * of the License, or (at your option) any later version.
1953+ *
1954+ * This program is distributed in the hope that it will be useful,
1955+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
1956+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
1957+ * GNU General Public License for more details.
1958+ *
1959+ * You should have received a copy of the GNU General Public License
1960+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
1961+ *
1962+ * Authored by Michal Hruby <michal.hruby@canonical.com>
1963+ *
1964+ */
1965+
1966+#include "task.h"
1967+
1968+namespace ZeitgeistFTS {
1969+
1970+void IndexEventsTask::Process (Indexer *indexer)
1971+{
1972+ unsigned end_index = MIN (start_index + event_count, events->len);
1973+ for (unsigned i = start_index; i < end_index; i++)
1974+ {
1975+ indexer->IndexEvent ((ZeitgeistEvent*) g_ptr_array_index (events, i));
1976+ }
1977+}
1978+
1979+void DeleteEventsTask::Process (Indexer *indexer)
1980+{
1981+ for (unsigned i = 0; i < event_ids.size (); i++)
1982+ {
1983+ indexer->DeleteEvent (event_ids[i]);
1984+ }
1985+}
1986+
1987+void MetadataTask::Process (Indexer *indexer)
1988+{
1989+ indexer->SetDbMetadata (key_name, value);
1990+}
1991+
1992+}
1993
1994=== added file 'extensions/fts++/task.h'
1995--- extensions/fts++/task.h 1970-01-01 00:00:00 +0000
1996+++ extensions/fts++/task.h 2012-02-09 22:47:22 +0000
1997@@ -0,0 +1,100 @@
1998+/*
1999+ * Copyright (C) 2012 Canonical Ltd
2000+ *
2001+ * This program is free software; you can redistribute it and/or
2002+ * modify it under the terms of the GNU General Public License
2003+ * as published by the Free Software Foundation; either version 2
2004+ * of the License, or (at your option) any later version.
2005+ *
2006+ * This program is distributed in the hope that it will be useful,
2007+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
2008+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2009+ * GNU General Public License for more details.
2010+ *
2011+ * You should have received a copy of the GNU General Public License
2012+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
2013+ *
2014+ * Authored by Michal Hruby <michal.hruby@canonical.com>
2015+ *
2016+ */
2017+
2018+#ifndef _ZGFTS_TASK_H_
2019+#define _ZGFTS_TASK_H_
2020+
2021+#include <glib.h>
2022+
2023+#include "indexer.h"
2024+
2025+namespace ZeitgeistFTS {
2026+
2027+/**
2028+ * A task contains a chunk of work defined by the Controller.
2029+ * A task should not be clever in scheduling on its own, the
2030+ * Controller is responsible for breaking down tasks in suitable
2031+ * chunks.
2032+ */
2033+class Task
2034+{
2035+public:
2036+ virtual ~Task () {}
2037+ virtual void Process (Indexer *indexer) = 0;
2038+};
2039+
2040+class IndexEventsTask : public Task
2041+{
2042+public:
2043+ void Process (Indexer *indexer);
2044+
2045+ IndexEventsTask (GPtrArray *event_arr)
2046+ : events (event_arr), start_index (0), event_count (event_arr->len) {}
2047+
2048+ IndexEventsTask (GPtrArray *event_arr, unsigned index, unsigned count)
2049+ : events (event_arr), start_index (index), event_count (count) {}
2050+
2051+ virtual ~IndexEventsTask ()
2052+ {
2053+ g_ptr_array_unref (events);
2054+ }
2055+
2056+private:
2057+ GPtrArray *events;
2058+ unsigned start_index;
2059+ unsigned event_count;
2060+};
2061+
2062+class DeleteEventsTask : public Task
2063+{
2064+public:
2065+ void Process (Indexer *indexer);
2066+
2067+ DeleteEventsTask (unsigned *event_ids_arr, int event_ids_arr_size)
2068+ : event_ids (event_ids_arr, event_ids_arr + event_ids_arr_size) {}
2069+
2070+ virtual ~DeleteEventsTask ()
2071+ {
2072+ }
2073+
2074+private:
2075+ std::vector<unsigned> event_ids;
2076+};
2077+
2078+class MetadataTask : public Task
2079+{
2080+public:
2081+ void Process (Indexer *indexer);
2082+
2083+ MetadataTask (std::string const& name, std::string const& val)
2084+ : key_name (name), value (val) {}
2085+
2086+ virtual ~MetadataTask ()
2087+ {}
2088+
2089+private:
2090+ std::string key_name;
2091+ std::string value;
2092+};
2093+
2094+}
2095+
2096+#endif /* _ZGFTS_TASK_H_ */
2097+
2098
2099=== added directory 'extensions/fts++/test'
2100=== added file 'extensions/fts++/test/Makefile.am'
2101--- extensions/fts++/test/Makefile.am 1970-01-01 00:00:00 +0000
2102+++ extensions/fts++/test/Makefile.am 2012-02-09 22:47:22 +0000
2103@@ -0,0 +1,27 @@
2104+NULL =
2105+check_PROGRAMS = test-fts
2106+TESTS = test-fts
2107+
2108+AM_CPPFLAGS = \
2109+ $(ZEITGEIST_CFLAGS) \
2110+ -include $(CONFIG_HEADER) \
2111+ -w \
2112+ -I$(srcdir)/.. \
2113+ $(NULL)
2114+
2115+test_fts_SOURCES = \
2116+ test-stringutils.cpp \
2117+ test-indexer.cpp \
2118+ test-fts.c \
2119+ $(srcdir)/../stringutils.cpp \
2120+ $(srcdir)/../controller.cpp \
2121+ $(srcdir)/../indexer.cpp \
2122+ $(srcdir)/../task.cpp \
2123+ $(srcdir)/../fts.cpp \
2124+ $(NULL)
2125+
2126+test_fts_LDADD = \
2127+ $(builddir)/../libzeitgeist-internal.la \
2128+ -lxapian \
2129+ $(NULL)
2130+
2131
2132=== added file 'extensions/fts++/test/test-fts.c'
2133--- extensions/fts++/test/test-fts.c 1970-01-01 00:00:00 +0000
2134+++ extensions/fts++/test/test-fts.c 2012-02-09 22:47:22 +0000
2135@@ -0,0 +1,37 @@
2136+/*
2137+ * Copyright (C) 2012 Mikkel Kamstrup Erlandsen
2138+ *
2139+ * This program is free software; you can redistribute it and/or
2140+ * modify it under the terms of the GNU General Public License
2141+ * as published by the Free Software Foundation; either version 2
2142+ * of the License, or (at your option) any later version.
2143+ *
2144+ * This program is distributed in the hope that it will be useful,
2145+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
2146+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2147+ * GNU General Public License for more details.
2148+ *
2149+ * You should have received a copy of the GNU General Public License
2150+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
2151+ *
2152+ * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
2153+ *
2154+ */
2155+
2156+#include <glib-object.h>
2157+
2158+void test_stringutils_create_suite (void);
2159+void test_indexer_create_suite (void);
2160+
2161+gint
2162+main (gint argc, gchar *argv[])
2163+{
2164+ g_type_init ();
2165+
2166+ g_test_init (&argc, &argv, NULL);
2167+
2168+ test_stringutils_create_suite ();
2169+ test_indexer_create_suite ();
2170+
2171+ return g_test_run ();
2172+}
2173
2174=== added file 'extensions/fts++/test/test-indexer.cpp'
2175--- extensions/fts++/test/test-indexer.cpp 1970-01-01 00:00:00 +0000
2176+++ extensions/fts++/test/test-indexer.cpp 2012-02-09 22:47:22 +0000
2177@@ -0,0 +1,531 @@
2178+/*
2179+ * Copyright (C) 2012 Mikkel Kamstrup Erlandsen
2180+ *
2181+ * This program is free software; you can redistribute it and/or
2182+ * modify it under the terms of the GNU General Public License
2183+ * as published by the Free Software Foundation; either version 2
2184+ * of the License, or (at your option) any later version.
2185+ *
2186+ * This program is distributed in the hope that it will be useful,
2187+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
2188+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2189+ * GNU General Public License for more details.
2190+ *
2191+ * You should have received a copy of the GNU General Public License
2192+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
2193+ *
2194+ * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
2195+ *
2196+ */
2197+
2198+#include <glib-object.h>
2199+
2200+#include "stringutils.h"
2201+#include "fts.h"
2202+#include <zeitgeist-internal.h>
2203+
2204+using namespace ZeitgeistFTS;
2205+
2206+typedef struct
2207+{
2208+ ZeitgeistDbReader *db;
2209+ ZeitgeistIndexer *indexer;
2210+} Fixture;
2211+
2212+static void setup (Fixture *fix, gconstpointer data);
2213+static void teardown (Fixture *fix, gconstpointer data);
2214+
2215+static void
2216+setup (Fixture *fix, gconstpointer data)
2217+{
2218+ // use in-memory databases for both zg db and fts db
2219+ GError *error = NULL;
2220+ g_setenv ("ZEITGEIST_DATABASE_PATH", ":memory:", TRUE);
2221+ fix->db = ZEITGEIST_DB_READER (zeitgeist_engine_new (&error));
2222+
2223+ if (error)
2224+ {
2225+ g_warning ("%s", error->message);
2226+ return;
2227+ }
2228+
2229+ fix->indexer = zeitgeist_indexer_new (fix->db, &error);
2230+ if (error)
2231+ {
2232+ g_warning ("%s", error->message);
2233+ return;
2234+ }
2235+}
2236+
2237+static void
2238+teardown (Fixture *fix, gconstpointer data)
2239+{
2240+ zeitgeist_indexer_free (fix->indexer);
2241+ g_object_unref (fix->db);
2242+}
2243+
2244+static ZeitgeistEvent* create_test_event1 (void)
2245+{
2246+ ZeitgeistEvent *event = zeitgeist_event_new ();
2247+ ZeitgeistSubject *subject = zeitgeist_subject_new ();
2248+
2249+ zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_RASTER_IMAGE);
2250+ zeitgeist_subject_set_manifestation (subject, ZEITGEIST_NFO_REMOTE_DATA_OBJECT);
2251+ zeitgeist_subject_set_uri (subject, "http://example.com/image.jpg");
2252+ zeitgeist_subject_set_text (subject, "text");
2253+ zeitgeist_subject_set_mimetype (subject, "image/png");
2254+
2255+ zeitgeist_event_set_interpretation (event, ZEITGEIST_ZG_ACCESS_EVENT);
2256+ zeitgeist_event_set_manifestation (event, ZEITGEIST_ZG_USER_ACTIVITY);
2257+ zeitgeist_event_set_actor (event, "application://firefox.desktop");
2258+ zeitgeist_event_add_subject (event, subject);
2259+
2260+ g_object_unref (subject);
2261+ return event;
2262+}
2263+
2264+static ZeitgeistEvent* create_test_event2 (void)
2265+{
2266+ ZeitgeistEvent *event = zeitgeist_event_new ();
2267+ ZeitgeistSubject *subject = zeitgeist_subject_new ();
2268+
2269+ zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_WEBSITE);
2270+ zeitgeist_subject_set_manifestation (subject, ZEITGEIST_NFO_REMOTE_DATA_OBJECT);
2271+ zeitgeist_subject_set_uri (subject, "http://example.com/I%20Love%20Wikis");
2272+ zeitgeist_subject_set_text (subject, "Example.com Wiki Page. Kanji is awesome 漢字");
2273+ zeitgeist_subject_set_mimetype (subject, "text/html");
2274+
2275+ zeitgeist_event_set_interpretation (event, ZEITGEIST_ZG_ACCESS_EVENT);
2276+ zeitgeist_event_set_manifestation (event, ZEITGEIST_ZG_USER_ACTIVITY);
2277+ zeitgeist_event_set_actor (event, "application://firefox.desktop");
2278+ zeitgeist_event_add_subject (event, subject);
2279+
2280+ g_object_unref (subject);
2281+ return event;
2282+}
2283+
2284+static ZeitgeistEvent* create_test_event3 (void)
2285+{
2286+ ZeitgeistEvent *event = zeitgeist_event_new ();
2287+ ZeitgeistSubject *subject = zeitgeist_subject_new ();
2288+
2289+ zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_WEBSITE);
2290+ zeitgeist_subject_set_manifestation (subject, ZEITGEIST_NFO_REMOTE_DATA_OBJECT);
2291+ // Greek IDN - stands for http://παράδειγμα.δοκιμή
2292+ zeitgeist_subject_set_uri (subject, "http://xn--hxajbheg2az3al.xn--jxalpdlp/");
2293+ zeitgeist_subject_set_text (subject, "IDNwiki");
2294+ zeitgeist_subject_set_mimetype (subject, "text/html");
2295+
2296+ zeitgeist_event_set_interpretation (event, ZEITGEIST_ZG_ACCESS_EVENT);
2297+ zeitgeist_event_set_manifestation (event, ZEITGEIST_ZG_USER_ACTIVITY);
2298+ zeitgeist_event_set_actor (event, "application://firefox.desktop");
2299+ zeitgeist_event_add_subject (event, subject);
2300+
2301+ g_object_unref (subject);
2302+ return event;
2303+}
2304+
2305+static ZeitgeistEvent* create_test_event4 (void)
2306+{
2307+ ZeitgeistEvent *event = zeitgeist_event_new ();
2308+ ZeitgeistSubject *subject = zeitgeist_subject_new ();
2309+
2310+ zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_PRESENTATION);
2311+ zeitgeist_subject_set_manifestation (subject, ZEITGEIST_NFO_FILE_DATA_OBJECT);
2312+ zeitgeist_subject_set_uri (subject, "file:///home/username/Documents/my_fabulous_presentation.pdf");
2313+ zeitgeist_subject_set_text (subject, NULL);
2314+ zeitgeist_subject_set_mimetype (subject, "application/pdf");
2315+
2316+ zeitgeist_event_set_interpretation (event, ZEITGEIST_ZG_MODIFY_EVENT);
2317+ zeitgeist_event_set_manifestation (event, ZEITGEIST_ZG_USER_ACTIVITY);
2318+ zeitgeist_event_set_actor (event, "application://libreoffice-impress.desktop");
2319+ zeitgeist_event_add_subject (event, subject);
2320+
2321+ g_object_unref (subject);
2322+ return event;
2323+}
2324+
2325+// Steals the event, ref it if you want to keep it
2326+static guint
2327+index_event (Fixture *fix, ZeitgeistEvent *event)
2328+{
2329+ guint event_id = 0;
2330+
2331+ // add event to DBs
2332+ event_id = zeitgeist_engine_insert_event (ZEITGEIST_ENGINE (fix->db),
2333+ event, NULL, NULL);
2334+
2335+ GPtrArray *events = g_ptr_array_new_with_free_func (g_object_unref);
2336+ g_ptr_array_add (events, event); // steal event ref
2337+ zeitgeist_indexer_index_events (fix->indexer, events);
2338+ g_ptr_array_unref (events);
2339+
2340+ while (zeitgeist_indexer_has_pending_tasks (fix->indexer))
2341+ {
2342+ zeitgeist_indexer_process_task (fix->indexer);
2343+ }
2344+
2345+ return event_id;
2346+}
2347+
2348+static void
2349+test_simple_query (Fixture *fix, gconstpointer data)
2350+{
2351+ guint matches;
2352+ guint event_id;
2353+ ZeitgeistEvent* event;
2354+
2355+ // add test events to DBs
2356+ event_id = index_event (fix, create_test_event1 ());
2357+ index_event (fix, create_test_event2 ());
2358+ index_event (fix, create_test_event3 ());
2359+ index_event (fix, create_test_event4 ());
2360+
2361+ GPtrArray *results =
2362+ zeitgeist_indexer_search (fix->indexer,
2363+ "text",
2364+ zeitgeist_time_range_new_anytime (),
2365+ g_ptr_array_new (),
2366+ 0,
2367+ 10,
2368+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
2369+ &matches,
2370+ NULL);
2371+
2372+ g_assert_cmpuint (matches, >, 0);
2373+ g_assert_cmpuint (results->len, ==, 1);
2374+
2375+ event = (ZeitgeistEvent*) results->pdata[0];
2376+ g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id);
2377+
2378+ ZeitgeistSubject *subject = (ZeitgeistSubject*)
2379+ g_ptr_array_index (zeitgeist_event_get_subjects (event), 0);
2380+ g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "text");
2381+}
2382+
2383+static void
2384+test_simple_with_filter (Fixture *fix, gconstpointer data)
2385+{
2386+ guint matches;
2387+ guint event_id;
2388+ ZeitgeistEvent* event;
2389+
2390+ // add test events to DBs
2391+ index_event (fix, create_test_event1 ());
2392+ index_event (fix, create_test_event2 ());
2393+
2394+ GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref);
2395+ event = zeitgeist_event_new ();
2396+ zeitgeist_event_set_interpretation (event, ZEITGEIST_NFO_DOCUMENT);
2397+ g_ptr_array_add (filters, event); // steals ref
2398+
2399+ GPtrArray *results =
2400+ zeitgeist_indexer_search (fix->indexer,
2401+ "text",
2402+ zeitgeist_time_range_new_anytime (),
2403+ filters,
2404+ 0,
2405+ 10,
2406+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
2407+ &matches,
2408+ NULL);
2409+
2410+ g_assert_cmpuint (results->len, ==, 0);
2411+ g_assert_cmpuint (matches, ==, 0);
2412+}
2413+
2414+static void
2415+test_simple_with_valid_filter (Fixture *fix, gconstpointer data)
2416+{
2417+ guint matches;
2418+ guint event_id;
2419+ ZeitgeistEvent* event;
2420+ ZeitgeistSubject *subject;
2421+
2422+ // add test events to DBs
2423+ event_id = index_event (fix, create_test_event1 ());
2424+ index_event (fix, create_test_event2 ());
2425+
2426+ GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref);
2427+ event = zeitgeist_event_new ();
2428+ subject = zeitgeist_subject_new ();
2429+ zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_IMAGE);
2430+ zeitgeist_event_add_subject (event, subject);
2431+ g_ptr_array_add (filters, event); // steals ref
2432+
2433+ GPtrArray *results =
2434+ zeitgeist_indexer_search (fix->indexer,
2435+ "text",
2436+ zeitgeist_time_range_new_anytime (),
2437+ filters,
2438+ 0,
2439+ 10,
2440+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
2441+ &matches,
2442+ NULL);
2443+
2444+ g_assert_cmpuint (matches, >, 0);
2445+ g_assert_cmpuint (results->len, ==, 1);
2446+
2447+ event = (ZeitgeistEvent*) results->pdata[0];
2448+ g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id);
2449+
2450+ subject = (ZeitgeistSubject*)
2451+ g_ptr_array_index (zeitgeist_event_get_subjects (event), 0);
2452+ g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "text");
2453+}
2454+
2455+static void
2456+test_simple_negation (Fixture *fix, gconstpointer data)
2457+{
2458+ guint matches;
2459+ guint event_id;
2460+ ZeitgeistEvent* event;
2461+ ZeitgeistSubject *subject;
2462+
2463+ // add test events to DBs
2464+ event_id = index_event (fix, create_test_event1 ());
2465+ index_event (fix, create_test_event2 ());
2466+
2467+ GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref);
2468+ event = zeitgeist_event_new ();
2469+ subject = zeitgeist_subject_new ();
2470+ zeitgeist_subject_set_interpretation (subject, "!" ZEITGEIST_NFO_IMAGE);
2471+ zeitgeist_event_add_subject (event, subject);
2472+ g_ptr_array_add (filters, event); // steals ref
2473+
2474+ GPtrArray *results =
2475+ zeitgeist_indexer_search (fix->indexer,
2476+ "text",
2477+ zeitgeist_time_range_new_anytime (),
2478+ filters,
2479+ 0,
2480+ 10,
2481+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
2482+ &matches,
2483+ NULL);
2484+
2485+ g_assert_cmpuint (matches, ==, 0);
2486+ g_assert_cmpuint (results->len, ==, 0);
2487+}
2488+
2489+static void
2490+test_simple_noexpand (Fixture *fix, gconstpointer data)
2491+{
2492+ guint matches;
2493+ guint event_id;
2494+ ZeitgeistEvent* event;
2495+ ZeitgeistSubject *subject;
2496+
2497+ // add test events to DBs
2498+ event_id = index_event (fix, create_test_event1 ());
2499+ index_event (fix, create_test_event2 ());
2500+
2501+ GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref);
2502+ event = zeitgeist_event_new ();
2503+ subject = zeitgeist_subject_new ();
2504+ zeitgeist_subject_set_interpretation (subject, "+" ZEITGEIST_NFO_IMAGE);
2505+ zeitgeist_event_add_subject (event, subject);
2506+ g_ptr_array_add (filters, event); // steals ref
2507+
2508+ GPtrArray *results =
2509+ zeitgeist_indexer_search (fix->indexer,
2510+ "text",
2511+ zeitgeist_time_range_new_anytime (),
2512+ filters,
2513+ 0,
2514+ 10,
2515+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
2516+ &matches,
2517+ NULL);
2518+
2519+ g_assert_cmpuint (matches, ==, 0);
2520+ g_assert_cmpuint (results->len, ==, 0);
2521+}
2522+
2523+static void
2524+test_simple_noexpand_valid (Fixture *fix, gconstpointer data)
2525+{
2526+ guint matches;
2527+ guint event_id;
2528+ ZeitgeistEvent* event;
2529+ ZeitgeistSubject *subject;
2530+
2531+ // add test events to DBs
2532+ event_id = index_event (fix, create_test_event1 ());
2533+ index_event (fix, create_test_event2 ());
2534+
2535+ GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref);
2536+ event = zeitgeist_event_new ();
2537+ subject = zeitgeist_subject_new ();
2538+ zeitgeist_subject_set_interpretation (subject, "+"ZEITGEIST_NFO_RASTER_IMAGE);
2539+ zeitgeist_event_add_subject (event, subject);
2540+ g_ptr_array_add (filters, event); // steals ref
2541+
2542+ GPtrArray *results =
2543+ zeitgeist_indexer_search (fix->indexer,
2544+ "text",
2545+ zeitgeist_time_range_new_anytime (),
2546+ filters,
2547+ 0,
2548+ 10,
2549+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
2550+ &matches,
2551+ NULL);
2552+
2553+ g_assert_cmpuint (matches, >, 0);
2554+ g_assert_cmpuint (results->len, ==, 1);
2555+
2556+ event = (ZeitgeistEvent*) results->pdata[0];
2557+ g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id);
2558+
2559+ subject = (ZeitgeistSubject*)
2560+ g_ptr_array_index (zeitgeist_event_get_subjects (event), 0);
2561+ g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "text");
2562+}
2563+
2564+static void
2565+test_simple_url_unescape (Fixture *fix, gconstpointer data)
2566+{
2567+ guint matches;
2568+ guint event_id;
2569+ ZeitgeistEvent* event;
2570+ ZeitgeistSubject *subject;
2571+
2572+ // add test events to DBs
2573+ index_event (fix, create_test_event1 ());
2574+ event_id = index_event (fix, create_test_event2 ());
2575+
2576+ GPtrArray *filters = g_ptr_array_new_with_free_func (g_object_unref);
2577+ event = zeitgeist_event_new ();
2578+ subject = zeitgeist_subject_new ();
2579+ zeitgeist_subject_set_interpretation (subject, ZEITGEIST_NFO_WEBSITE);
2580+ zeitgeist_event_add_subject (event, subject);
2581+ g_ptr_array_add (filters, event); // steals ref
2582+
2583+ GPtrArray *results =
2584+ zeitgeist_indexer_search (fix->indexer,
2585+ "love",
2586+ zeitgeist_time_range_new_anytime (),
2587+ filters,
2588+ 0,
2589+ 10,
2590+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
2591+ &matches,
2592+ NULL);
2593+
2594+ g_assert_cmpuint (matches, >, 0);
2595+ g_assert_cmpuint (results->len, ==, 1);
2596+
2597+ event = (ZeitgeistEvent*) results->pdata[0];
2598+ g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id);
2599+
2600+ subject = (ZeitgeistSubject*)
2601+ g_ptr_array_index (zeitgeist_event_get_subjects (event), 0);
2602+ g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "Example.com Wiki Page. Kanji is awesome 漢字");
2603+}
2604+
2605+static void
2606+test_simple_cjk (Fixture *fix, gconstpointer data)
2607+{
2608+ guint matches;
2609+ guint event_id;
2610+ ZeitgeistEvent* event;
2611+ ZeitgeistSubject *subject;
2612+
2613+ // add test events to DBs
2614+ index_event (fix, create_test_event1 ());
2615+ event_id = index_event (fix, create_test_event2 ());
2616+
2617+ GPtrArray *results =
2618+ zeitgeist_indexer_search (fix->indexer,
2619+ "漢*",
2620+ zeitgeist_time_range_new_anytime (),
2621+ g_ptr_array_new (),
2622+ 0,
2623+ 10,
2624+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
2625+ &matches,
2626+ NULL);
2627+
2628+ g_assert_cmpuint (matches, >, 0);
2629+ g_assert_cmpuint (results->len, ==, 1);
2630+
2631+ event = (ZeitgeistEvent*) results->pdata[0];
2632+ g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id);
2633+
2634+ subject = (ZeitgeistSubject*)
2635+ g_ptr_array_index (zeitgeist_event_get_subjects (event), 0);
2636+ g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "Example.com Wiki Page. Kanji is awesome 漢字");
2637+}
2638+
2639+static void
2640+test_simple_idn_support (Fixture *fix, gconstpointer data)
2641+{
2642+ guint matches;
2643+ guint event_id;
2644+ ZeitgeistEvent* event;
2645+ ZeitgeistSubject *subject;
2646+
2647+ // add test events to DBs
2648+ index_event (fix, create_test_event1 ());
2649+ index_event (fix, create_test_event2 ());
2650+ event_id = index_event (fix, create_test_event3 ());
2651+
2652+ GPtrArray *results =
2653+ zeitgeist_indexer_search (fix->indexer,
2654+ "παράδειγμα",
2655+ zeitgeist_time_range_new_anytime (),
2656+ g_ptr_array_new (),
2657+ 0,
2658+ 10,
2659+ ZEITGEIST_RESULT_TYPE_MOST_RECENT_EVENTS,
2660+ &matches,
2661+ NULL);
2662+
2663+ g_assert_cmpuint (matches, >, 0);
2664+ g_assert_cmpuint (results->len, ==, 1);
2665+
2666+ event = (ZeitgeistEvent*) results->pdata[0];
2667+ g_assert_cmpuint (zeitgeist_event_get_id (event), ==, event_id);
2668+
2669+ subject = (ZeitgeistSubject*)
2670+ g_ptr_array_index (zeitgeist_event_get_subjects (event), 0);
2671+ g_assert_cmpstr (zeitgeist_subject_get_text (subject), ==, "IDNwiki");
2672+}
2673+
2674+G_BEGIN_DECLS
2675+
2676+static void discard_message (const gchar *domain,
2677+ GLogLevelFlags level,
2678+ const gchar *msg,
2679+ gpointer userdata)
2680+{
2681+}
2682+
2683+void test_indexer_create_suite (void)
2684+{
2685+ g_test_add ("/Zeitgeist/FTS/Indexer/SimpleQuery", Fixture, 0,
2686+ setup, test_simple_query, teardown);
2687+ g_test_add ("/Zeitgeist/FTS/Indexer/SimpleWithFilter", Fixture, 0,
2688+ setup, test_simple_with_filter, teardown);
2689+ g_test_add ("/Zeitgeist/FTS/Indexer/SimpleWithValidFilter", Fixture, 0,
2690+ setup, test_simple_with_valid_filter, teardown);
2691+ g_test_add ("/Zeitgeist/FTS/Indexer/SimpleNegation", Fixture, 0,
2692+ setup, test_simple_negation, teardown);
2693+ g_test_add ("/Zeitgeist/FTS/Indexer/SimpleNoexpand", Fixture, 0,
2694+ setup, test_simple_noexpand, teardown);
2695+ g_test_add ("/Zeitgeist/FTS/Indexer/SimpleNoexpandValid", Fixture, 0,
2696+ setup, test_simple_noexpand_valid, teardown);
2697+ g_test_add ("/Zeitgeist/FTS/Indexer/URLUnescape", Fixture, 0,
2698+ setup, test_simple_url_unescape, teardown);
2699+ g_test_add ("/Zeitgeist/FTS/Indexer/IDNSupport", Fixture, 0,
2700+ setup, test_simple_idn_support, teardown);
2701+ g_test_add ("/Zeitgeist/FTS/Indexer/CJK", Fixture, 0,
2702+ setup, test_simple_cjk, teardown);
2703+
2704+ // get rid of the "rebuilding index..." messages
2705+ g_log_set_handler (NULL, G_LOG_LEVEL_MESSAGE, discard_message, NULL);
2706+}
2707+
2708+G_END_DECLS
2709
2710=== added file 'extensions/fts++/test/test-stringutils.cpp'
2711--- extensions/fts++/test/test-stringutils.cpp 1970-01-01 00:00:00 +0000
2712+++ extensions/fts++/test/test-stringutils.cpp 2012-02-09 22:47:22 +0000
2713@@ -0,0 +1,178 @@
2714+/*
2715+ * Copyright (C) 2012 Mikkel Kamstrup Erlandsen
2716+ *
2717+ * This program is free software; you can redistribute it and/or
2718+ * modify it under the terms of the GNU General Public License
2719+ * as published by the Free Software Foundation; either version 2
2720+ * of the License, or (at your option) any later version.
2721+ *
2722+ * This program is distributed in the hope that it will be useful,
2723+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
2724+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2725+ * GNU General Public License for more details.
2726+ *
2727+ * You should have received a copy of the GNU General Public License
2728+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
2729+ *
2730+ * Authored by Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
2731+ *
2732+ */
2733+
2734+#include <glib-object.h>
2735+
2736+#include "stringutils.h"
2737+
2738+using namespace ZeitgeistFTS;
2739+
2740+typedef struct
2741+{
2742+ int i;
2743+} Fixture;
2744+
2745+static void setup (Fixture *fix, gconstpointer data);
2746+static void teardown (Fixture *fix, gconstpointer data);
2747+
2748+static void
2749+setup (Fixture *fix, gconstpointer data)
2750+{
2751+
2752+}
2753+
2754+static void
2755+teardown (Fixture *fix, gconstpointer data)
2756+{
2757+
2758+}
2759+
2760+static void
2761+test_truncate (Fixture *fix, gconstpointer data)
2762+{
2763+ g_assert_cmpstr ("", ==, StringUtils::Truncate("").c_str ());
2764+
2765+ g_assert_cmpstr ("", ==, StringUtils::Truncate("a", 0).c_str ());
2766+ g_assert_cmpstr ("a", ==, StringUtils::Truncate("a", 1).c_str ());
2767+ g_assert_cmpstr ("a", ==, StringUtils::Truncate("a").c_str ());
2768+
2769+ g_assert_cmpstr ("", ==, StringUtils::Truncate("aa", 0).c_str ());
2770+ g_assert_cmpstr ("a", ==, StringUtils::Truncate("aa", 1).c_str ());
2771+ g_assert_cmpstr ("aa", ==, StringUtils::Truncate("aa", 2).c_str ());
2772+ g_assert_cmpstr ("aa", ==, StringUtils::Truncate("aa").c_str ());
2773+
2774+
2775+ g_assert_cmpstr ("", ==, StringUtils::Truncate("å", 0).c_str ());
2776+ g_assert_cmpstr ("", ==, StringUtils::Truncate("å", 1).c_str ());
2777+ g_assert_cmpstr ("å", ==, StringUtils::Truncate("å").c_str ());
2778+
2779+ g_assert_cmpstr ("", ==, StringUtils::Truncate("åå", 0).c_str ());
2780+ g_assert_cmpstr ("", ==, StringUtils::Truncate("åå", 1).c_str ());
2781+ g_assert_cmpstr ("å", ==, StringUtils::Truncate("åå", 2).c_str ());
2782+ g_assert_cmpstr ("å", ==, StringUtils::Truncate("åå", 3).c_str ());
2783+ g_assert_cmpstr ("åå", ==, StringUtils::Truncate("åå", 4).c_str ());
2784+ g_assert_cmpstr ("åå", ==, StringUtils::Truncate("åå").c_str ());
2785+}
2786+
2787+static void
2788+test_mangle (Fixture *fix, gconstpointer data)
2789+{
2790+ g_assert_cmpstr ("", ==, StringUtils::MangleUri("").c_str ());
2791+
2792+ g_assert_cmpstr ("file", ==, StringUtils::MangleUri("file").c_str ());
2793+ g_assert_cmpstr ("file___", ==, StringUtils::MangleUri("file://").c_str ());
2794+ g_assert_cmpstr ("http___www.zeitgeist-project.com", ==,
2795+ StringUtils::MangleUri("http://www.zeitgeist-project.com").c_str ());
2796+
2797+ g_assert_cmpstr ("scheme_no_spaces_in_uris", ==,
2798+ StringUtils::MangleUri("scheme:no spaces in uris").c_str ());
2799+}
2800+
2801+static void
2802+test_split (Fixture *fix, gconstpointer data)
2803+{
2804+ std::string authority, path, query;
2805+
2806+ authority = path = query = "";
2807+ StringUtils::SplitUri ("", authority, path, query); // doesn't crash
2808+
2809+ g_assert_cmpstr ("", ==, authority.c_str ());
2810+ g_assert_cmpstr ("", ==, path.c_str ());
2811+ g_assert_cmpstr ("", ==, query.c_str ());
2812+
2813+ authority = path = query = "";
2814+ StringUtils::SplitUri ("scheme:", authority, path, query); // doesn't crash
2815+
2816+ g_assert_cmpstr ("", ==, authority.c_str ());
2817+ g_assert_cmpstr ("", ==, path.c_str ());
2818+ g_assert_cmpstr ("", ==, query.c_str ());
2819+
2820+ authority = path = query = "";
2821+ StringUtils::SplitUri ("ldap://ldap1.example.net:6666/o=University%20"
2822+ "of%20Michigan,c=US??sub?(cn=Babs%20Jensen)",
2823+ authority, path, query);
2824+
2825+ g_assert_cmpstr ("ldap1.example.net:6666", ==, authority.c_str ());
2826+ g_assert_cmpstr ("/o=University%20of%20Michigan,c=US", ==, path.c_str ());
2827+ g_assert_cmpstr ("?sub?(cn=Babs%20Jensen)", ==, query.c_str ());
2828+
2829+
2830+ authority = path = query = "";
2831+ StringUtils::SplitUri ("mailto:jsmith@example.com",
2832+ authority, path, query);
2833+
2834+ g_assert_cmpstr ("jsmith@example.com", ==, authority.c_str ());
2835+ g_assert_cmpstr ("", ==, path.c_str ());
2836+ g_assert_cmpstr ("", ==, query.c_str ());
2837+
2838+ authority = path = query = "";
2839+ StringUtils::SplitUri ("mailto:jsmith@example.com?subject=A%20Test&body="
2840+ "My%20idea%20is%3A%20%0A", authority, path, query);
2841+
2842+ g_assert_cmpstr ("jsmith@example.com", ==, authority.c_str ());
2843+ g_assert_cmpstr ("", ==, path.c_str ());
2844+ g_assert_cmpstr ("subject=A%20Test&body=My%20idea%20is%3A%20%0A", ==, query.c_str ());
2845+
2846+ authority = path = query = "";
2847+ StringUtils::SplitUri ("sip:alice@atlanta.com?subject=project%20x",
2848+ authority, path, query);
2849+
2850+ g_assert_cmpstr ("alice@atlanta.com", ==, authority.c_str ());
2851+ g_assert_cmpstr ("", ==, path.c_str ());
2852+ g_assert_cmpstr ("subject=project%20x", ==, query.c_str ());
2853+
2854+ authority = path = query = "";
2855+ StringUtils::SplitUri ("file:///",
2856+ authority, path, query);
2857+
2858+ g_assert_cmpstr ("", ==, authority.c_str ());
2859+ g_assert_cmpstr ("/", ==, path.c_str ());
2860+ g_assert_cmpstr ("", ==, query.c_str ());
2861+
2862+ authority = path = query = "";
2863+ StringUtils::SplitUri ("file:///home/username/file.ext",
2864+ authority, path, query);
2865+
2866+ g_assert_cmpstr ("", ==, authority.c_str ());
2867+ g_assert_cmpstr ("/home/username/file.ext", ==, path.c_str ());
2868+ g_assert_cmpstr ("", ==, query.c_str ());
2869+
2870+ authority = path = query = "";
2871+ StringUtils::SplitUri ("dns://192.168.1.1/ftp.example.org?type=A",
2872+ authority, path, query);
2873+
2874+ g_assert_cmpstr ("192.168.1.1", ==, authority.c_str ());
2875+ g_assert_cmpstr ("/ftp.example.org", ==, path.c_str ());
2876+ g_assert_cmpstr ("type=A", ==, query.c_str ());
2877+}
2878+
2879+G_BEGIN_DECLS
2880+
2881+void test_stringutils_create_suite (void)
2882+{
2883+ g_test_add ("/Zeitgeist/FTS/StringUtils/Truncate", Fixture, 0,
2884+ setup, test_truncate, teardown);
2885+ g_test_add ("/Zeitgeist/FTS/StringUtils/MangleUri", Fixture, 0,
2886+ setup, test_mangle, teardown);
2887+ g_test_add ("/Zeitgeist/FTS/StringUtils/SplitUri", Fixture, 0,
2888+ setup, test_split, teardown);
2889+}
2890+
2891+G_END_DECLS
2892
2893=== added symlink 'extensions/fts++/utils.vala'
2894=== target is u'../../src/utils.vala'
2895=== added symlink 'extensions/fts++/where-clause.vala'
2896=== target is u'../../src/where-clause.vala'
2897=== added file 'extensions/fts++/zeitgeist-fts.vala'
2898--- extensions/fts++/zeitgeist-fts.vala 1970-01-01 00:00:00 +0000
2899+++ extensions/fts++/zeitgeist-fts.vala 2012-02-09 22:47:22 +0000
2900@@ -0,0 +1,301 @@
2901+/* zeitgeist-fts.vala
2902+ *
2903+ * Copyright © 2012 Canonical Ltd.
2904+ * Copyright © 2012 Michal Hruby <michal.mhr@gmail.com>
2905+ *
2906+ * This program is free software; you can redistribute it and/or
2907+ * modify it under the terms of the GNU General Public License
2908+ * as published by the Free Software Foundation; either version 2
2909+ * of the License, or (at your option) any later version.
2910+ *
2911+ * This program is distributed in the hope that it will be useful,
2912+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
2913+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2914+ * GNU General Public License for more details.
2915+ *
2916+ * You should have received a copy of the GNU General Public License
2917+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
2918+ *
2919+ */
2920+
2921+namespace Zeitgeist
2922+{
2923+
2924+ [DBus (name = "org.freedesktop.DBus")]
2925+ public interface RemoteDBus : Object
2926+ {
2927+ public abstract bool name_has_owner (string name) throws IOError;
2928+ }
2929+
2930+ public class FtsDaemon : Object, RemoteSimpleIndexer, RemoteMonitor
2931+ {
2932+ //const string DBUS_NAME = "org.gnome.zeitgeist.Fts";
2933+ const string DBUS_NAME = "org.gnome.zeitgeist.SimpleIndexer";
2934+ const string ZEITGEIST_DBUS_NAME = "org.gnome.zeitgeist.Engine";
2935+ private static bool show_version_info = false;
2936+ private static string log_level = "";
2937+
2938+ const OptionEntry[] options =
2939+ {
2940+ {
2941+ "version", 'v', 0, OptionArg.NONE, out show_version_info,
2942+ "Print program's version number and exit", null
2943+ },
2944+ {
2945+ "log-level", 0, 0, OptionArg.STRING, out log_level,
2946+ "How much information should be printed; possible values: " +
2947+ "DEBUG, INFO, WARNING, ERROR, CRITICAL", "LEVEL"
2948+ },
2949+ {
2950+ null
2951+ }
2952+ };
2953+
2954+ private static FtsDaemon? instance;
2955+ private static MainLoop mainloop;
2956+ private static bool name_acquired = false;
2957+
2958+ private DbReader engine;
2959+ private Indexer indexer;
2960+
2961+ private uint indexer_register_id;
2962+ private uint monitor_register_id;
2963+ private unowned DBusConnection connection;
2964+
2965+ public FtsDaemon () throws EngineError
2966+ {
2967+ engine = new DbReader ();
2968+ indexer = new Indexer (engine);
2969+ }
2970+
2971+ private void do_quit ()
2972+ {
2973+ engine.close ();
2974+ mainloop.quit ();
2975+ }
2976+
2977+ public void register_dbus_object (DBusConnection conn) throws IOError
2978+ {
2979+ connection = conn;
2980+ indexer_register_id = conn.register_object<RemoteSimpleIndexer> (
2981+ "/org/gnome/zeitgeist/index/activity", this);
2982+ monitor_register_id = conn.register_object<RemoteMonitor> (
2983+ "/org/gnome/zeitgeist/monitor/special", this);
2984+ }
2985+
2986+ public void unregister_dbus_object ()
2987+ {
2988+ if (indexer_register_id != 0)
2989+ {
2990+ connection.unregister_object (indexer_register_id);
2991+ indexer_register_id = 0;
2992+ }
2993+
2994+ if (monitor_register_id != 0)
2995+ {
2996+ connection.unregister_object (monitor_register_id);
2997+ monitor_register_id = 0;
2998+ }
2999+ }
3000+
3001+ public async void notify_insert (Variant time_range, Variant events)
3002+ throws IOError
3003+ {
3004+ debug ("got insertion notification");
3005+ var events_arr = Events.from_variant (events);
3006+ indexer.index_events (events_arr);
3007+ }
3008+
3009+ public async void notify_delete (Variant time_range, uint32[] event_ids)
3010+ throws IOError
3011+ {
3012+ debug ("got deletion notification");
3013+ indexer.delete_events (event_ids);
3014+ }
3015+
3016+ public async void search (string query_string, Variant time_range,
3017+ Variant filter_templates,
3018+ uint offset, uint count, uint result_type,
3019+ out Variant events, out uint matches)
3020+ throws Error
3021+ {
3022+ var tr = new TimeRange.from_variant (time_range);
3023+ var templates = Events.from_variant (filter_templates);
3024+ var results = instance.indexer.search (query_string,
3025+ tr,
3026+ templates,
3027+ offset,
3028+ count,
3029+ (ResultType) result_type,
3030+ out matches);
3031+
3032+ events = Events.to_variant (results);
3033+ }
3034+
3035+ private static void name_acquired_callback (DBusConnection conn)
3036+ {
3037+ name_acquired = true;
3038+ }
3039+
3040+ private static void name_lost_callback (DBusConnection? conn)
3041+ {
3042+ if (conn == null)
3043+ {
3044+ // something happened to our bus connection
3045+ mainloop.quit ();
3046+ }
3047+ else if (instance != null && name_acquired)
3048+ {
3049+ // we owned the name and we lost it... what to do?
3050+ mainloop.quit ();
3051+ }
3052+ }
3053+
3054+ static void run ()
3055+ throws Error
3056+ {
3057+ DBusConnection connection = Bus.get_sync (BusType.SESSION);
3058+ var proxy = connection.get_proxy_sync<RemoteDBus> (
3059+ "org.freedesktop.DBus", "/org/freedesktop/DBus",
3060+ DBusProxyFlags.DO_NOT_LOAD_PROPERTIES);
3061+ bool zeitgeist_up = proxy.name_has_owner (ZEITGEIST_DBUS_NAME);
3062+ // FIXME: throw an error that zeitgeist isn't up? or just start it?
3063+ bool name_owned = proxy.name_has_owner (DBUS_NAME);
3064+ if (name_owned)
3065+ {
3066+ throw new EngineError.EXISTING_INSTANCE (
3067+ "The FTS daemon is running already.");
3068+ }
3069+
3070+ /* setup Engine instance and register objects on dbus */
3071+ try
3072+ {
3073+ instance = new FtsDaemon ();
3074+ instance.register_dbus_object (connection);
3075+ }
3076+ catch (Error err)
3077+ {
3078+ if (err is EngineError.DATABASE_CANTOPEN)
3079+ {
3080+ warning ("Could not access the database file.\n" +
3081+ "Please check the permissions of file %s.",
3082+ Utils.get_database_file_path ());
3083+ }
3084+ else if (err is EngineError.DATABASE_BUSY)
3085+ {
3086+ warning ("It looks like another Zeitgeist instance " +
3087+ "is already running (the database is locked).");
3088+ }
3089+ throw err;
3090+ }
3091+
3092+ uint owner_id = Bus.own_name_on_connection (connection,
3093+ DBUS_NAME,
3094+ BusNameOwnerFlags.NONE,
3095+ name_acquired_callback,
3096+ name_lost_callback);
3097+
3098+ mainloop = new MainLoop ();
3099+ mainloop.run ();
3100+
3101+ if (instance != null)
3102+ {
3103+ Bus.unown_name (owner_id);
3104+ instance.unregister_dbus_object ();
3105+ instance = null;
3106+
3107+ // make sure we send quit reply
3108+ try
3109+ {
3110+ connection.flush_sync ();
3111+ }
3112+ catch (Error e)
3113+ {
3114+ warning ("%s", e.message);
3115+ }
3116+ }
3117+ }
3118+
3119+ static void safe_exit ()
3120+ {
3121+ instance.do_quit ();
3122+ }
3123+
3124+ static int main (string[] args)
3125+ {
3126+ // FIXME: the cat process xapian spawns won't like this and we
3127+ // can freeze if it dies
3128+ Posix.signal (Posix.SIGHUP, safe_exit);
3129+ Posix.signal (Posix.SIGINT, safe_exit);
3130+ Posix.signal (Posix.SIGTERM, safe_exit);
3131+
3132+ var opt_context = new OptionContext (" - Zeitgeist FTS daemon");
3133+ opt_context.add_main_entries (options, null);
3134+
3135+ try
3136+ {
3137+ opt_context.parse (ref args);
3138+
3139+ if (show_version_info)
3140+ {
3141+ stdout.printf (Config.VERSION + "\n");
3142+ return 0;
3143+ }
3144+
3145+ LogLevelFlags discarded = LogLevelFlags.LEVEL_DEBUG;
3146+ if (log_level != null)
3147+ {
3148+ var ld = LogLevelFlags.LEVEL_DEBUG;
3149+ var li = LogLevelFlags.LEVEL_INFO;
3150+ var lm = LogLevelFlags.LEVEL_MESSAGE;
3151+ var lw = LogLevelFlags.LEVEL_WARNING;
3152+ var lc = LogLevelFlags.LEVEL_CRITICAL;
3153+ switch (log_level.up ())
3154+ {
3155+ case "DEBUG":
3156+ discarded = 0;
3157+ break;
3158+ case "INFO":
3159+ discarded = ld;
3160+ break;
3161+ case "WARNING":
3162+ discarded = ld | li | lm;
3163+ break;
3164+ case "CRITICAL":
3165+ discarded = ld | li | lm | lw;
3166+ break;
3167+ case "ERROR":
3168+ discarded = ld | li | lm | lw | lc;
3169+ break;
3170+ }
3171+ }
3172+ if (discarded != 0)
3173+ {
3174+ Log.set_handler ("", discarded, () => {});
3175+ }
3176+ else
3177+ {
3178+ Environment.set_variable ("G_MESSAGES_DEBUG", "all", true);
3179+ }
3180+
3181+ run ();
3182+ }
3183+ catch (Error err)
3184+ {
3185+ if (err is EngineError.DATABASE_CANTOPEN)
3186+ return 21;
3187+ if (err is EngineError.DATABASE_BUSY)
3188+ return 22;
3189+
3190+ warning ("%s", err.message);
3191+ return 1;
3192+ }
3193+
3194+ return 0;
3195+ }
3196+
3197+ }
3198+
3199+}
3200+
3201+// vim:expandtab:ts=4:sw=4
3202
3203=== removed directory 'extensions/fts-python'
3204=== removed file 'extensions/fts-python/Makefile.am'
3205--- extensions/fts-python/Makefile.am 2011-11-01 20:26:36 +0000
3206+++ extensions/fts-python/Makefile.am 1970-01-01 00:00:00 +0000
3207@@ -1,23 +0,0 @@
3208-NULL =
3209-
3210-ftsdir = $(pkgdatadir)/fts-python
3211-dist_fts_SCRIPTS = \
3212- fts.py \
3213- $(NULL)
3214-
3215-dist_fts_DATA = \
3216- datamodel.py \
3217- constants.py \
3218- lrucache.py \
3219- sql.py \
3220- $(NULL)
3221-
3222-servicedir = $(DBUS_SERVICES_DIR)
3223-service_DATA = org.gnome.zeitgeist.fts.service
3224-
3225-org.gnome.zeitgeist.fts.service: org.gnome.zeitgeist.fts.service.in
3226- $(AM_V_GEN)sed -e s!\@pkgdatadir\@!$(pkgdatadir)! < $< > $@
3227-org.gnome.zeitgeist.fts.service: Makefile
3228-
3229-EXTRA_DIST = org.gnome.zeitgeist.fts.service.in
3230-CLEANFILES = org.gnome.zeitgeist.fts.service
3231
3232=== removed file 'extensions/fts-python/constants.py'
3233--- extensions/fts-python/constants.py 2011-10-31 15:28:09 +0000
3234+++ extensions/fts-python/constants.py 1970-01-01 00:00:00 +0000
3235@@ -1,71 +0,0 @@
3236-# -.- coding: utf-8 -.-
3237-
3238-# Zeitgeist
3239-#
3240-# Copyright © 2009 Markus Korn <thekorn@gmx.de>
3241-# Copyright © 2009-2010 Siegfried-Angel Gevatter Pujals <rainct@ubuntu.com>
3242-# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
3243-#
3244-# This program is free software: you can redistribute it and/or modify
3245-# it under the terms of the GNU Lesser General Public License as published by
3246-# the Free Software Foundation, either version 2.1 of the License, or
3247-# (at your option) any later version.
3248-#
3249-# This program is distributed in the hope that it will be useful,
3250-# but WITHOUT ANY WARRANTY; without even the implied warranty of
3251-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
3252-# GNU Lesser General Public License for more details.
3253-#
3254-# You should have received a copy of the GNU Lesser General Public License
3255-# along with this program. If not, see <http://www.gnu.org/licenses/>.
3256-
3257-import os
3258-import logging
3259-from xdg import BaseDirectory
3260-
3261-from zeitgeist.client import ZeitgeistDBusInterface
3262-
3263-__all__ = [
3264- "log",
3265- "get_engine",
3266- "constants"
3267-]
3268-
3269-log = logging.getLogger("zeitgeist.engine")
3270-
3271-_engine = None
3272-def get_engine():
3273- """ Get the running engine instance or create a new one. """
3274- global _engine
3275- if _engine is None or _engine.is_closed():
3276- import main # _zeitgeist.engine.main
3277- _engine = main.ZeitgeistEngine()
3278- return _engine
3279-
3280-class _Constants:
3281- # Directories
3282- DATA_PATH = os.environ.get("ZEITGEIST_DATA_PATH",
3283- BaseDirectory.save_data_path("zeitgeist"))
3284- DATABASE_FILE = os.environ.get("ZEITGEIST_DATABASE_PATH",
3285- os.path.join(DATA_PATH, "activity.sqlite"))
3286- DATABASE_FILE_BACKUP = os.environ.get("ZEITGEIST_DATABASE_BACKUP_PATH",
3287- os.path.join(DATA_PATH, "activity.sqlite.bck"))
3288- DEFAULT_LOG_PATH = os.path.join(BaseDirectory.xdg_cache_home,
3289- "zeitgeist", "daemon.log")
3290-
3291- # D-Bus
3292- DBUS_INTERFACE = ZeitgeistDBusInterface.INTERFACE_NAME
3293- SIG_EVENT = "asaasay"
3294-
3295- # Required version of DB schema
3296- CORE_SCHEMA="core"
3297- CORE_SCHEMA_VERSION = 4
3298-
3299- USER_EXTENSION_PATH = os.path.join(DATA_PATH, "extensions")
3300-
3301- # configure runtime cache for events
3302- # default size is 2000
3303- CACHE_SIZE = int(os.environ.get("ZEITGEIST_CACHE_SIZE", 2000))
3304- log.debug("Cache size = %i" %CACHE_SIZE)
3305-
3306-constants = _Constants()
3307
3308=== removed file 'extensions/fts-python/datamodel.py'
3309--- extensions/fts-python/datamodel.py 2011-10-10 14:07:42 +0000
3310+++ extensions/fts-python/datamodel.py 1970-01-01 00:00:00 +0000
3311@@ -1,83 +0,0 @@
3312-# -.- coding: utf-8 -.-
3313-
3314-# Zeitgeist
3315-#
3316-# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
3317-# Copyright © 2009 Markus Korn <thekorn@gmx.de>
3318-# Copyright © 2009 Seif Lotfy <seif@lotfy.com>
3319-# Copyright © 2009-2010 Siegfried-Angel Gevatter Pujals <rainct@ubuntu.com>
3320-#
3321-# This program is free software: you can redistribute it and/or modify
3322-# it under the terms of the GNU Lesser General Public License as published by
3323-# the Free Software Foundation, either version 2.1 of the License, or
3324-# (at your option) any later version.
3325-#
3326-# This program is distributed in the hope that it will be useful,
3327-# but WITHOUT ANY WARRANTY; without even the implied warranty of
3328-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
3329-# GNU Lesser General Public License for more details.
3330-#
3331-# You should have received a copy of the GNU Lesser General Public License
3332-# along with this program. If not, see <http://www.gnu.org/licenses/>.
3333-
3334-from zeitgeist.datamodel import Event as OrigEvent, Subject as OrigSubject, \
3335- DataSource as OrigDataSource
3336-
3337-class Event(OrigEvent):
3338-
3339- @staticmethod
3340- def _to_unicode(obj):
3341- """
3342- Return an unicode representation of the given object.
3343- If obj is None, return an empty string.
3344- """
3345- return unicode(obj) if obj is not None else u""
3346-
3347- @staticmethod
3348- def _make_dbus_sendable(obj):
3349- """
3350- Ensure that all fields in the event struct are non-None
3351- """
3352- for n, value in enumerate(obj[0]):
3353- obj[0][n] = obj._to_unicode(value)
3354- for subject in obj[1]:
3355- for n, value in enumerate(subject):
3356- subject[n] = obj._to_unicode(value)
3357- # The payload require special handling, since it is binary data
3358- # If there is indeed data here, we must not unicode encode it!
3359- if obj[2] is None:
3360- obj[2] = u""
3361- elif isinstance(obj[2], unicode):
3362- obj[2] = str(obj[2])
3363- return obj
3364-
3365- @staticmethod
3366- def get_plain(ev):
3367- """
3368- Ensure that an Event instance is a Plain Old Python Object (popo),
3369- without DBus wrappings etc.
3370- """
3371- popo = []
3372- popo.append(map(unicode, ev[0]))
3373- popo.append([map(unicode, subj) for subj in ev[1]])
3374- # We need the check here so that if D-Bus gives us an empty
3375- # byte array we don't serialize the text "dbus.Array(...)".
3376- popo.append(str(ev[2]) if ev[2] else u'')
3377- return popo
3378-
3379-class Subject(OrigSubject):
3380- pass
3381-
3382-class DataSource(OrigDataSource):
3383-
3384- @staticmethod
3385- def get_plain(datasource):
3386- for plaintype, props in {
3387- unicode: (DataSource.Name, DataSource.Description),
3388- lambda x: map(Event.get_plain, x): (DataSource.EventTemplates,),
3389- bool: (DataSource.Running, DataSource.Enabled),
3390- int: (DataSource.LastSeen,),
3391- }.iteritems():
3392- for prop in props:
3393- datasource[prop] = plaintype(datasource[prop])
3394- return tuple(datasource)
3395
3396=== removed file 'extensions/fts-python/fts.py'
3397--- extensions/fts-python/fts.py 2012-01-06 10:11:45 +0000
3398+++ extensions/fts-python/fts.py 1970-01-01 00:00:00 +0000
3399@@ -1,1273 +0,0 @@
3400-#!/usr/bin/env python
3401-# -.- coding: utf-8 -.-
3402-
3403-# Zeitgeist
3404-#
3405-# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
3406-# Copyright © 2010 Canonical Ltd
3407-#
3408-# This program is free software: you can redistribute it and/or modify
3409-# it under the terms of the GNU Lesser General Public License as published by
3410-# the Free Software Foundation, either version 3 of the License, or
3411-# (at your option) any later version.
3412-#
3413-# This program is distributed in the hope that it will be useful,
3414-# but WITHOUT ANY WARRANTY; without even the implied warranty of
3415-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
3416-# GNU Lesser General Public License for more details.
3417-#
3418-# You should have received a copy of the GNU Lesser General Public License
3419-# along with this program. If not, see <http://www.gnu.org/licenses/>.
3420-#
3421-
3422-#
3423-# TODO
3424-#
3425-# - Delete events hook
3426-# - ? Filter on StorageState
3427-# - Throttle IO and CPU where possible
3428-
3429-import os, sys
3430-import time
3431-import pickle
3432-import dbus
3433-import sqlite3
3434-import dbus.service
3435-from xdg import BaseDirectory
3436-from xdg.DesktopEntry import DesktopEntry, xdg_data_dirs
3437-import logging
3438-import subprocess
3439-from xml.dom import minidom
3440-import xapian
3441-import os
3442-from Queue import Queue, Empty
3443-import threading
3444-from urllib import quote as url_escape, unquote as url_unescape
3445-import gobject, gio
3446-from cStringIO import StringIO
3447-
3448-from collections import defaultdict
3449-from array import array
3450-from zeitgeist.datamodel import Event as OrigEvent, StorageState, TimeRange, \
3451- ResultType, get_timestamp_for_now, Interpretation, Symbol, NEGATION_OPERATOR, WILDCARD, NULL_EVENT
3452-from datamodel import Event, Subject
3453-from constants import constants
3454-from zeitgeist.client import ZeitgeistClient, ZeitgeistDBusInterface
3455-from sql import get_default_cursor, unset_cursor, TableLookup, WhereClause
3456-from lrucache import LRUCache
3457-
3458-ZG_CLIENT = ZeitgeistClient()
3459-
3460-logging.basicConfig(level=logging.DEBUG)
3461-log = logging.getLogger("zeitgeist.fts")
3462-
3463-INDEX_FILE = os.path.join(constants.DATA_PATH, "bb.fts.index")
3464-INDEX_VERSION = "1"
3465-INDEX_LOCK = threading.Lock()
3466-FTS_DBUS_BUS_NAME = "org.gnome.zeitgeist.SimpleIndexer"
3467-FTS_DBUS_OBJECT_PATH = "/org/gnome/zeitgeist/index/activity"
3468-FTS_DBUS_INTERFACE = "org.gnome.zeitgeist.Index"
3469-
3470-FILTER_PREFIX_EVENT_INTERPRETATION = "ZGEI"
3471-FILTER_PREFIX_EVENT_MANIFESTATION = "ZGEM"
3472-FILTER_PREFIX_ACTOR = "ZGA"
3473-FILTER_PREFIX_SUBJECT_URI = "ZGSU"
3474-FILTER_PREFIX_SUBJECT_INTERPRETATION = "ZGSI"
3475-FILTER_PREFIX_SUBJECT_MANIFESTATION = "ZGSM"
3476-FILTER_PREFIX_SUBJECT_ORIGIN = "ZGSO"
3477-FILTER_PREFIX_SUBJECT_MIMETYPE = "ZGST"
3478-FILTER_PREFIX_SUBJECT_STORAGE = "ZGSS"
3479-FILTER_PREFIX_XDG_CATEGORY = "AC"
3480-
3481-VALUE_EVENT_ID = 0
3482-VALUE_TIMESTAMP = 1
3483-
3484-MAX_CACHE_BATCH_SIZE = constants.CACHE_SIZE/2
3485-
3486-# When sorting by of the COALESCING_RESULT_TYPES result types,
3487-# we need to fetch some extra events from the Xapian index because
3488-# the final result set will be coalesced on some property of the event
3489-COALESCING_RESULT_TYPES = [ \
3490- ResultType.MostRecentSubjects,
3491- ResultType.LeastRecentSubjects,
3492- ResultType.MostPopularSubjects,
3493- ResultType.LeastPopularSubjects,
3494- ResultType.MostRecentActor,
3495- ResultType.LeastRecentActor,
3496- ResultType.MostPopularActor,
3497- ResultType.LeastPopularActor,
3498-]
3499-
3500-MAX_TERM_LENGTH = 245
3501-
3502-
3503-class NegationNotSupported(ValueError):
3504- pass
3505-
3506-class WildcardNotSupported(ValueError):
3507- pass
3508-
3509-def parse_negation(kind, field, value, parse_negation=True):
3510- """checks if value starts with the negation operator,
3511- if value starts with the negation operator but the field does
3512- not support negation a ValueError is raised.
3513- This function returns a (value_without_negation, negation)-tuple
3514- """
3515- negation = False
3516- if parse_negation and value.startswith(NEGATION_OPERATOR):
3517- negation = True
3518- value = value[len(NEGATION_OPERATOR):]
3519- if negation and field not in kind.SUPPORTS_NEGATION:
3520- raise NegationNotSupported("This field does not support negation")
3521- return value, negation
3522-
3523-def parse_wildcard(kind, field, value):
3524- """checks if value ends with the a wildcard,
3525- if value ends with a wildcard but the field does not support wildcards
3526- a ValueError is raised.
3527- This function returns a (value_without_wildcard, wildcard)-tuple
3528- """
3529- wildcard = False
3530- if value.endswith(WILDCARD):
3531- wildcard = True
3532- value = value[:-len(WILDCARD)]
3533- if wildcard and field not in kind.SUPPORTS_WILDCARDS:
3534- raise WildcardNotSupported("This field does not support wildcards")
3535- return value, wildcard
3536-
3537-def parse_operators(kind, field, value):
3538- """runs both (parse_negation and parse_wildcard) parser functions
3539- on query values, and handles the special case of Subject.Text correctly.
3540- returns a (value_without_negation_and_wildcard, negation, wildcard)-tuple
3541- """
3542- try:
3543- value, negation = parse_negation(kind, field, value)
3544- except ValueError:
3545- if kind is Subject and field == Subject.Text:
3546- # we do not support negation of the text field,
3547- # the text field starts with the NEGATION_OPERATOR
3548- # so we handle this string as the content instead
3549- # of an operator
3550- negation = False
3551- else:
3552- raise
3553- value, wildcard = parse_wildcard(kind, field, value)
3554- return value, negation, wildcard
3555-
3556-
3557-def synchronized(lock):
3558- """ Synchronization decorator. """
3559- def wrap(f):
3560- def newFunction(*args, **kw):
3561- lock.acquire()
3562- try:
3563- return f(*args, **kw)
3564- finally:
3565- lock.release()
3566- return newFunction
3567- return wrap
3568-
3569-class Deletion:
3570- """
3571- A marker class that marks an event id for deletion
3572- """
3573- def __init__ (self, event_id):
3574- self.event_id = event_id
3575-
3576-class Reindex:
3577- """
3578- Marker class that tells the worker thread to rebuild the entire index.
3579- On construction time all events are pulled out of the zg_engine
3580- argument and stored for later processing in the worker thread.
3581- This avoid concurrent access to the ZG sqlite db from the worker thread.
3582- """
3583- def __init__ (self, zg_engine):
3584- all_events = zg_engine._find_events(1, TimeRange.always(),
3585- [], StorageState.Any,
3586- sys.maxint,
3587- ResultType.MostRecentEvents)
3588- self.all_events = all_events
3589-
3590-class SearchEngineExtension (dbus.service.Object):
3591- """
3592- Full text indexing and searching extension for Zeitgeist
3593- """
3594- PUBLIC_METHODS = []
3595-
3596- def __init__ (self):
3597- bus_name = dbus.service.BusName(FTS_DBUS_BUS_NAME, bus=dbus.SessionBus())
3598- dbus.service.Object.__init__(self, bus_name, FTS_DBUS_OBJECT_PATH)
3599- self._indexer = Indexer()
3600-
3601- ZG_CLIENT.install_monitor((0, 2**63 - 1), [],
3602- self.pre_insert_event, self.post_delete_events)
3603-
3604- def pre_insert_event(self, timerange, events):
3605- for event in events:
3606- self._indexer.index_event (event)
3607-
3608- def post_delete_events (self, ids):
3609- for _id in ids:
3610- self._indexer.delete_event (_id)
3611-
3612- @dbus.service.method(FTS_DBUS_INTERFACE,
3613- in_signature="s(xx)a("+constants.SIG_EVENT+")uuu",
3614- out_signature="a("+constants.SIG_EVENT+")u")
3615- def Search(self, query_string, time_range, filter_templates, offset, count, result_type):
3616- """
3617- DBus method to perform a full text search against the contents of the
3618- Zeitgeist log. Returns an array of events.
3619- """
3620- time_range = TimeRange(time_range[0], time_range[1])
3621- filter_templates = map(Event, filter_templates)
3622- events, hit_count = self._indexer.search(query_string, time_range,
3623- filter_templates,
3624- offset, count, result_type)
3625- return self._make_events_sendable (events), hit_count
3626-
3627- @dbus.service.method(FTS_DBUS_INTERFACE,
3628- in_signature="",
3629- out_signature="")
3630- def ForceReindex(self):
3631- """
3632- DBus method to force a reindex of the entire Zeitgeist log.
3633- This method is only intended for debugging purposes and is not
3634- considered blessed public API.
3635- """
3636- log.debug ("Received ForceReindex request over DBus.")
3637- self._indexer._queue.put (Reindex (self._indexer))
3638-
3639- def _make_events_sendable(self, events):
3640- return [NULL_EVENT if event is None else Event._make_dbus_sendable(event) for event in events]
3641-
3642-def mangle_uri (uri):
3643- """
3644- Converts a URI into an index- and query friendly string. The problem
3645- is that Xapian doesn't handle CAPITAL letters or most non-alphanumeric
3646- symbols in a boolean term when it does prefix matching. The mangled
3647- URIs returned from this function are suitable for boolean prefix searches.
3648-
3649- IMPORTANT: This is a 1-way function! You can not convert back.
3650- """
3651- result = ""
3652- for c in uri.lower():
3653- if c in (": /"):
3654- result += "_"
3655- else:
3656- result += c
3657- return result
3658-
3659-def cap_string (s, nbytes=MAX_TERM_LENGTH):
3660- """
3661- If s has more than nbytes bytes (not characters) then cap it off
3662- after nbytes bytes in a way still producing a valid utf-8 string.
3663-
3664- Assumes that s is a utf-8 string.
3665-
3666- This function useful for working with Xapian terms because Xapian has
3667- a max term length of 245 (which is not very well documented, but see
3668- http://xapian.org/docs/omega/termprefixes.html).
3669- """
3670- # Check if we can fast-path this string
3671- if (len(s.encode("utf-8")) <= nbytes):
3672- return s
3673-
3674- # We use a StringIO here to avoid mem thrashing via naiive
3675- # string concatenation. See fx. http://www.skymind.com/~ocrow/python_string/
3676- buf = StringIO()
3677- for char in s :
3678- if buf.tell() >= nbytes - 1 :
3679- return buf.getvalue()
3680- buf.write(char.encode("utf-8"))
3681-
3682- return unicode(buf.getvalue().decode("utf-8"))
3683-
3684-
3685-def expand_type (type_prefix, uri):
3686- """
3687- Return a string with a Xapian query matching all child types of 'uri'
3688- inside the Xapian prefix 'type_prefix'.
3689- """
3690- is_negation = uri.startswith(NEGATION_OPERATOR)
3691- uri = uri[1:] if is_negation else uri
3692- children = Symbol.find_child_uris_extended(uri)
3693- children = [ "%s:%s" % (type_prefix, child) for child in children ]
3694-
3695- result = " OR ".join(children)
3696- return result if not is_negation else "NOT (%s)" % result
3697-
3698-class Indexer:
3699- """
3700- Abstraction of the FT indexer and search engine
3701- """
3702-
3703- QUERY_PARSER_FLAGS = xapian.QueryParser.FLAG_PHRASE | \
3704- xapian.QueryParser.FLAG_BOOLEAN | \
3705- xapian.QueryParser.FLAG_PURE_NOT | \
3706- xapian.QueryParser.FLAG_LOVEHATE | \
3707- xapian.QueryParser.FLAG_WILDCARD
3708-
3709- def __init__ (self):
3710-
3711- self._cursor = cursor = get_default_cursor()
3712- os.environ["XAPIAN_CJK_NGRAM"] = "1"
3713- self._interpretation = TableLookup(cursor, "interpretation")
3714- self._manifestation = TableLookup(cursor, "manifestation")
3715- self._mimetype = TableLookup(cursor, "mimetype")
3716- self._actor = TableLookup(cursor, "actor")
3717- self._event_cache = LRUCache(constants.CACHE_SIZE)
3718-
3719- log.debug("Opening full text index: %s" % INDEX_FILE)
3720- try:
3721- self._index = xapian.WritableDatabase(INDEX_FILE, xapian.DB_CREATE_OR_OPEN)
3722- except xapian.DatabaseError, e:
3723- log.warn("Full text index corrupted: '%s'. Rebuilding index." % e)
3724- self._index = xapian.WritableDatabase(INDEX_FILE, xapian.DB_CREATE_OR_OVERWRITE)
3725- self._tokenizer = indexer = xapian.TermGenerator()
3726- self._query_parser = xapian.QueryParser()
3727- self._query_parser.set_database (self._index)
3728- self._query_parser.add_prefix("name", "N")
3729- self._query_parser.add_prefix("title", "N")
3730- self._query_parser.add_prefix("site", "S")
3731- self._query_parser.add_prefix("app", "A")
3732- self._query_parser.add_boolean_prefix("zgei", FILTER_PREFIX_EVENT_INTERPRETATION)
3733- self._query_parser.add_boolean_prefix("zgem", FILTER_PREFIX_EVENT_MANIFESTATION)
3734- self._query_parser.add_boolean_prefix("zga", FILTER_PREFIX_ACTOR)
3735- self._query_parser.add_prefix("zgsu", FILTER_PREFIX_SUBJECT_URI)
3736- self._query_parser.add_boolean_prefix("zgsi", FILTER_PREFIX_SUBJECT_INTERPRETATION)
3737- self._query_parser.add_boolean_prefix("zgsm", FILTER_PREFIX_SUBJECT_MANIFESTATION)
3738- self._query_parser.add_prefix("zgso", FILTER_PREFIX_SUBJECT_ORIGIN)
3739- self._query_parser.add_boolean_prefix("zgst", FILTER_PREFIX_SUBJECT_MIMETYPE)
3740- self._query_parser.add_boolean_prefix("zgss", FILTER_PREFIX_SUBJECT_STORAGE)
3741- self._query_parser.add_prefix("category", FILTER_PREFIX_XDG_CATEGORY)
3742- self._query_parser.add_valuerangeprocessor(
3743- xapian.NumberValueRangeProcessor(VALUE_EVENT_ID, "id", True))
3744- self._query_parser.add_valuerangeprocessor(
3745- xapian.NumberValueRangeProcessor(VALUE_TIMESTAMP, "ms", False))
3746- self._query_parser.set_default_op(xapian.Query.OP_AND)
3747- self._enquire = xapian.Enquire(self._index)
3748-
3749- self._desktops = {}
3750-
3751- gobject.threads_init()
3752- self._may_run = True
3753- self._queue = Queue(0)
3754- self._worker = threading.Thread(target=self._worker_thread,
3755- name="IndexWorker")
3756- self._worker.daemon = True
3757-
3758- # We need to defer the index checking until after ZG has completed
3759- # full setup. Hence the idle handler.
3760- # We also don't start the worker until after we've checked the index
3761- gobject.idle_add (self._check_index_and_start_worker)
3762-
3763- @synchronized (INDEX_LOCK)
3764- def _check_index_and_start_worker (self):
3765- """
3766- Check whether we need a rebuild of the index.
3767- Returns True if the index is good. False if a reindexing has
3768- been commenced.
3769-
3770- This method should be called from the main thread and only once.
3771- It starts the worker thread as a side effect.
3772-
3773- We are clearing the queue, because there may be a race when an
3774- event insertion / deletion is already queued and our index
3775- is corrupted. Creating a new queue instance should be safe,
3776- because we're running in main thread as are the index_event
3777- and delete_event methods, and the worker thread wasn't yet
3778- started.
3779- """
3780- if self._index.get_metadata("fts_index_version") != INDEX_VERSION:
3781- log.info("Index must be upgraded. Doing full rebuild")
3782- self._queue = Queue(0)
3783- self._queue.put(Reindex(self))
3784- elif self._index.get_doccount() == 0:
3785- # If the index is empty we trigger a rebuild
3786- # We must delay reindexing until after the engine is done setting up
3787- log.info("Empty index detected. Doing full rebuild")
3788- self._queue = Queue(0)
3789- self._queue.put(Reindex(self))
3790-
3791- # Now that we've checked the index from the main thread we can start the worker
3792- self._worker.start()
3793-
3794- def index_event (self, event):
3795- """
3796- This method schedules and event for indexing. It returns immediate and
3797- defers the actual work to a bottom half thread. This means that it
3798- will not block the main loop of the Zeitgeist daemon while indexing
3799- (which may be a heavy operation)
3800- """
3801- self._queue.put (event)
3802- return event
3803-
3804- def delete_event (self, event_id):
3805- """
3806- Remove an event from the index given its event id
3807- """
3808- self._queue.put (Deletion(event_id))
3809- return
3810-
3811- @synchronized (INDEX_LOCK)
3812- def search (self, query_string, time_range=None, filters=None, offset=0, maxhits=10, result_type=100):
3813- """
3814- Do a full text search over the indexed corpus. The `result_type`
3815- parameter may be a zeitgeist.datamodel.ResultType or 100. In case it is
3816- 100 the textual relevancy of the search engine will be used to sort the
3817- results. Result type 100 is the fastest (and default) mode.
3818-
3819- The filters argument should be a list of event templates.
3820- """
3821- # Expand event template filters if necessary
3822- if filters:
3823- query_string = "(%s) AND (%s)" % (query_string, self._compile_event_filter_query (filters))
3824-
3825- # Expand time range value query
3826- if time_range and not time_range.is_always():
3827- query_string = "(%s) AND (%s)" % (query_string, self._compile_time_range_filter_query (time_range))
3828-
3829- # If the result type coalesces the events we need to fetch some extra
3830- # events from the index to have a chance of actually holding 'maxhits'
3831- # unique events
3832- if result_type in COALESCING_RESULT_TYPES:
3833- raw_maxhits = maxhits * 3
3834- else:
3835- raw_maxhits = maxhits
3836-
3837- # When not sorting by relevance, we fetch the results from Xapian sorted,
3838- # by timestamp. That minimizes the skew we get from otherwise doing a
3839- # relevancy ranked xapaian query and then resorting with Zeitgeist. The
3840- # "skew" is that low-relevancy results may still have the highest timestamp
3841- if result_type == 100:
3842- self._enquire.set_sort_by_relevance()
3843- else:
3844- self._enquire.set_sort_by_value(VALUE_TIMESTAMP, True)
3845-
3846- # Allow wildcards
3847- query_start = time.time()
3848- query = self._query_parser.parse_query (query_string,
3849- self.QUERY_PARSER_FLAGS)
3850- self._enquire.set_query (query)
3851- hits = self._enquire.get_mset (offset, raw_maxhits)
3852- hit_count = hits.get_matches_estimated()
3853- log.debug("Search '%s' gave %s hits in %sms" %
3854- (query_string, hits.get_matches_estimated(), (time.time() - query_start)*1000))
3855-
3856- if result_type == 100:
3857- event_ids = []
3858- for m in hits:
3859- event_id = int(xapian.sortable_unserialise(
3860- m.document.get_value(VALUE_EVENT_ID)))
3861- event_ids.append (event_id)
3862- if event_ids:
3863- return self.get_events(event_ids), hit_count
3864- else:
3865- return [], 0
3866- else:
3867- templates = []
3868- for m in hits:
3869- event_id = int(xapian.sortable_unserialise(
3870- m.document.get_value(VALUE_EVENT_ID)))
3871- ev = Event()
3872- ev[0][Event.Id] = str(event_id)
3873- templates.append(ev)
3874- if templates:
3875- x = self._find_events(1, TimeRange.always(),
3876- templates,
3877- StorageState.Any,
3878- maxhits,
3879- result_type), hit_count
3880- return x
3881- else:
3882- return [], 0
3883-
3884- def _worker_thread (self):
3885- is_dirty = False
3886- while self._may_run:
3887- # FIXME: Throttle IO and CPU
3888- try:
3889- # If we are dirty wait a while before we flush,
3890- # or if we are clean wait indefinitely to avoid
3891- # needless wakeups
3892- if is_dirty:
3893- event = self._queue.get(True, 0.5)
3894- else:
3895- event = self._queue.get(True)
3896-
3897- if isinstance (event, Deletion):
3898- self._delete_event_real (event.event_id)
3899- elif isinstance (event, Reindex):
3900- self._reindex (event.all_events)
3901- else:
3902- self._index_event_real (event)
3903-
3904- is_dirty = True
3905- except Empty:
3906- if is_dirty:
3907- # Write changes to disk
3908- log.debug("Committing FTS index")
3909- self._index.flush()
3910- is_dirty = False
3911- else:
3912- log.debug("No changes to index. Sleeping")
3913-
3914- @synchronized (INDEX_LOCK)
3915- def _reindex (self, event_list):
3916- """
3917- Index everything in the ZG log. The argument must be a list
3918- of events. Typically extracted by a Reindex instance.
3919- Only call from worker thread as it writes to the db and Xapian
3920- is *not* thread safe (only single-writer-multiple-reader).
3921- """
3922- self._index.close ()
3923- self._index = xapian.WritableDatabase(INDEX_FILE, xapian.DB_CREATE_OR_OVERWRITE)
3924- self._query_parser.set_database (self._index)
3925- self._enquire = xapian.Enquire(self._index)
3926- # Register that this index was built with CJK enabled
3927- self._index.set_metadata("fts_index_version", INDEX_VERSION)
3928- log.info("Preparing to rebuild index with %s events" % len(event_list))
3929- for e in event_list : self._queue.put(e)
3930-
3931- @synchronized (INDEX_LOCK)
3932- def _delete_event_real (self, event_id):
3933- """
3934- Look up the doc id given an event id and remove the xapian.Document
3935- for that doc id.
3936- Note: This is slow, but there's not much we can do about it
3937- """
3938- try:
3939- _id = xapian.sortable_serialise(float(event_id))
3940- query = xapian.Query(xapian.Query.OP_VALUE_RANGE,
3941- VALUE_EVENT_ID, _id, _id)
3942-
3943- self._enquire.set_query (query)
3944- hits = self._enquire.get_mset (0, 10)
3945-
3946- total = hits.get_matches_estimated()
3947- if total > 1:
3948- log.warning ("More than one event found with id '%s'" % event_id)
3949- elif total <= 0:
3950- log.debug ("No event for id '%s'" % event_id)
3951- return
3952-
3953- for m in hits:
3954- log.debug("Deleting event '%s' with docid '%s'" %
3955- (event_id, m.docid))
3956- self._index.delete_document(m.docid)
3957- except Exception, e:
3958- log.error("Failed to delete event '%s': %s" % (event_id, e))
3959-
3960- def _split_uri (self, uri):
3961- """
3962- Returns a triple of (scheme, host, and path) extracted from `uri`
3963- """
3964- i = uri.find(":")
3965- if i == -1 :
3966- scheme = ""
3967- host = ""
3968- path = uri
3969- else:
3970- scheme = uri[:i]
3971- host = ""
3972- path = ""
3973-
3974- if uri[i+1] == "/" and uri[i+2] == "/":
3975- j = uri.find("/", i+3)
3976- if j == -1 :
3977- host = uri[i+3:]
3978- else:
3979- host = uri[i+3:j]
3980- path = uri[j:]
3981- else:
3982- host = uri[i+1:]
3983-
3984- # Strip out URI query part
3985- i = path.find("?")
3986- if i != -1:
3987- path = path[:i]
3988-
3989- return scheme, host, path
3990-
3991- def _get_desktop_entry (self, app_id):
3992- """
3993- Return a xdg.DesktopEntry.DesktopEntry `app_id` or None in case
3994- no file is found for the given desktop id
3995- """
3996- if app_id in self._desktops:
3997- return self._desktops[app_id]
3998-
3999- for datadir in xdg_data_dirs:
4000- path = os.path.join(datadir, "applications", app_id)
4001- if os.path.exists(path):
4002- try:
4003- desktop = DesktopEntry(path)
4004- self._desktops[app_id] = desktop
4005- return desktop
4006- except Exception, e:
4007- log.warning("Unable to load %s: %s" % (path, e))
4008- return None
4009-
4010- return None
4011-
4012- def _index_actor (self, actor):
4013- """
4014- Takes an actor as a path to a .desktop file or app:// uri
4015- and index the contents of the corresponding .desktop file
4016- into the document currently set for self._tokenizer.
4017- """
4018- if not actor : return
4019-
4020- # Get the path of the .desktop file and convert it to
4021- # an app id (eg. 'gedit.desktop')
4022- scheme, host, path = self._split_uri(url_unescape (actor))
4023- if not path:
4024- path = host
4025-
4026- if not path :
4027- log.debug("Unable to determine application id for %s" % actor)
4028- return
4029-
4030- if path.startswith("/") :
4031- path = os.path.basename(path)
4032-
4033- desktop = self._get_desktop_entry(path)
4034- if desktop:
4035- if not desktop.getNoDisplay():
4036- self._tokenizer.index_text(desktop.getName(), 5)
4037- self._tokenizer.index_text(desktop.getName(), 5, "A")
4038- self._tokenizer.index_text(desktop.getGenericName(), 5)
4039- self._tokenizer.index_text(desktop.getGenericName(), 5, "A")
4040- self._tokenizer.index_text(desktop.getComment(), 2)
4041- self._tokenizer.index_text(desktop.getComment(), 2, "A")
4042-
4043- doc = self._tokenizer.get_document()
4044- for cat in desktop.getCategories():
4045- doc.add_boolean_term(FILTER_PREFIX_XDG_CATEGORY+cat.lower())
4046- else:
4047- log.debug("Unable to look up app info for %s" % actor)
4048-
4049-
4050- def _index_uri (self, uri):
4051- """
4052- Index `uri` into the document currectly set on self._tokenizer
4053- """
4054- # File URIs and paths are indexed in one way, and all other,
4055- # usually web URIs, are indexed in another way because there may
4056- # be domain name etc. in there we want to rank differently
4057- scheme, host, path = self._split_uri (url_unescape (uri))
4058- if scheme == "file" or not scheme:
4059- path, name = os.path.split(path)
4060- self._tokenizer.index_text(name, 5)
4061- self._tokenizer.index_text(name, 5, "N")
4062-
4063- # Index parent names with descending weight
4064- weight = 5
4065- while path and name:
4066- weight = weight / 1.5
4067- path, name = os.path.split(path)
4068- self._tokenizer.index_text(name, int(weight))
4069-
4070- elif scheme == "mailto":
4071- tokens = host.split("@")
4072- name = tokens[0]
4073- self._tokenizer.index_text(name, 6)
4074- if len(tokens) > 1:
4075- self._tokenizer.index_text(" ".join[1:], 1)
4076- else:
4077- # We're cautious about indexing the path components of
4078- # non-file URIs as some websites practice *extremely* long
4079- # and useless URLs
4080- path, name = os.path.split(path)
4081- if len(name) > 30 : name = name[:30]
4082- if len(path) > 30 : path = path[30]
4083- if name:
4084- self._tokenizer.index_text(name, 5)
4085- self._tokenizer.index_text(name, 5, "N")
4086- if path:
4087- self._tokenizer.index_text(path, 1)
4088- self._tokenizer.index_text(path, 1, "N")
4089- if host:
4090- self._tokenizer.index_text(host, 2)
4091- self._tokenizer.index_text(host, 2, "N")
4092- self._tokenizer.index_text(host, 2, "S")
4093-
4094- def _index_text (self, text):
4095- """
4096- Index `text` as raw text data for the document currently
4097- set on self._tokenizer. The text is assumed to be a primary
4098- description of the subject, such as the basename of a file.
4099-
4100- Primary use is for subject.text
4101- """
4102- self._tokenizer.index_text(text, 5)
4103-
4104- def _index_contents (self, uri):
4105- # xmlindexer doesn't extract words for URIs only for file paths
4106-
4107- # FIXME: IONICE and NICE on xmlindexer
4108-
4109- path = uri.replace("file://", "")
4110- xmlindexer = subprocess.Popen(['xmlindexer', path],
4111- stdout=subprocess.PIPE)
4112- xml = xmlindexer.communicate()[0].strip()
4113- xmlindexer.wait()
4114-
4115- dom = minidom.parseString(xml)
4116- text_nodes = dom.getElementsByTagName("text")
4117- lines = []
4118- if text_nodes:
4119- for line in text_nodes[0].childNodes:
4120- lines.append(line.data)
4121-
4122- if lines:
4123- self._tokenizer.index_text (" ".join(lines))
4124-
4125-
4126- def _add_doc_filters (self, event, doc):
4127- """Adds the filtering rules to the doc. Filtering rules will
4128- not affect the relevancy ranking of the event/doc"""
4129- if event.interpretation:
4130- doc.add_boolean_term (cap_string(FILTER_PREFIX_EVENT_INTERPRETATION+event.interpretation))
4131- if event.manifestation:
4132- doc.add_boolean_term (cap_string(FILTER_PREFIX_EVENT_MANIFESTATION+event.manifestation))
4133- if event.actor:
4134- doc.add_boolean_term (cap_string(FILTER_PREFIX_ACTOR+mangle_uri(event.actor)))
4135-
4136- for su in event.subjects:
4137- if su.uri:
4138- doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_URI+mangle_uri(su.uri)))
4139- if su.interpretation:
4140- doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_INTERPRETATION+su.interpretation))
4141- if su.manifestation:
4142- doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_MANIFESTATION+su.manifestation))
4143- if su.origin:
4144- doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_ORIGIN+mangle_uri(su.origin)))
4145- if su.mimetype:
4146- doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_MIMETYPE+su.mimetype))
4147- if su.storage:
4148- doc.add_boolean_term (cap_string(FILTER_PREFIX_SUBJECT_STORAGE+su.storage))
4149-
4150- @synchronized (INDEX_LOCK)
4151- def _index_event_real (self, event):
4152- if not isinstance (event, OrigEvent):
4153- log.error("Not an Event, found: %s" % type(event))
4154- if not event.id:
4155- log.warning("Not indexing event. Event has no id")
4156- return
4157-
4158- try:
4159- doc = xapian.Document()
4160- doc.add_value (VALUE_EVENT_ID,
4161- xapian.sortable_serialise(float(event.id)))
4162- doc.add_value (VALUE_TIMESTAMP,
4163- xapian.sortable_serialise(float(event.timestamp)))
4164- self._tokenizer.set_document (doc)
4165-
4166- self._index_actor (event.actor)
4167-
4168- for subject in event.subjects:
4169- if not subject.uri : continue
4170-
4171- # By spec URIs can have arbitrary length. In reality that's just silly.
4172- # The general online "rule" is to keep URLs less than 2k so we just
4173- # choose to enforce that
4174- if len(subject.uri) > 2000:
4175- log.info ("URI too long (%s). Discarding: %s..."% (len(subject.uri), subject.uri[:30]))
4176- return
4177- log.debug("Indexing '%s'" % subject.uri)
4178-
4179- self._index_uri (subject.uri)
4180- self._index_text (subject.text)
4181-
4182- # If the subject URI is an actor, we index the .desktop also
4183- if subject.uri.startswith ("application://"):
4184- self._index_actor (subject.uri)
4185-
4186- # File contents indexing disabled for now...
4187- #self._index_contents (subject.uri)
4188-
4189- # FIXME: Possibly index payloads when we have apriori knowledge
4190-
4191- self._add_doc_filters (event, doc)
4192- self._index.add_document (doc)
4193-
4194- except Exception, e:
4195- log.error("Error indexing event: %s" % e)
4196-
4197- def _compile_event_filter_query (self, events):
4198- """Takes a list of event templates and compiles a filter query
4199- based on their, interpretations, manifestations, and actor,
4200- for event and subjects.
4201-
4202- All fields within the same event will be ANDed and each template
4203- will be ORed with the others. Like elsewhere in Zeitgeist the
4204- type tree of the interpretations and manifestations will be expanded
4205- to match all child symbols as well
4206- """
4207- query = []
4208- for event in events:
4209- if not isinstance(event, Event):
4210- raise TypeError("Expected Event. Found %s" % type(event))
4211-
4212- tmpl = []
4213- if event.interpretation :
4214- tmpl.append(expand_type("zgei", event.interpretation))
4215- if event.manifestation :
4216- tmpl.append(expand_type("zgem", event.manifestation))
4217- if event.actor : tmpl.append("zga:%s" % mangle_uri(event.actor))
4218- for su in event.subjects:
4219- if su.uri :
4220- tmpl.append("zgsu:%s" % mangle_uri(su.uri))
4221- if su.interpretation :
4222- tmpl.append(expand_type("zgsi", su.interpretation))
4223- if su.manifestation :
4224- tmpl.append(expand_type("zgsm", su.manifestation))
4225- if su.origin :
4226- tmpl.append("zgso:%s" % mangle_uri(su.origin))
4227- if su.mimetype :
4228- tmpl.append("zgst:%s" % su.mimetype)
4229- if su.storage :
4230- tmpl.append("zgss:%s" % su.storage)
4231-
4232- tmpl = "(" + ") AND (".join(tmpl) + ")"
4233- query.append(tmpl)
4234-
4235- return " OR ".join(query)
4236-
4237- def _compile_time_range_filter_query (self, time_range):
4238- """Takes a TimeRange and compiles a range query for it"""
4239-
4240- if not isinstance(time_range, TimeRange):
4241- raise TypeError("Expected TimeRange, but found %s" % type(time_range))
4242-
4243- return "%s..%sms" % (time_range.begin, time_range.end)
4244-
4245- def _get_event_from_row(self, row):
4246- event = Event()
4247- event[0][Event.Id] = row["id"] # Id property is read-only in the public API
4248- event.timestamp = row["timestamp"]
4249- for field in ("interpretation", "manifestation", "actor"):
4250- # Try to get event attributes from row using the attributed field id
4251- # If attribute does not exist we break the attribute fetching and return
4252- # None instead of of crashing
4253- try:
4254- setattr(event, field, getattr(self, "_" + field).value(row[field]))
4255- except KeyError, e:
4256- log.error("Event %i broken: Table %s has no id %i" \
4257- %(row["id"], field, row[field]))
4258- return None
4259- event.origin = row["event_origin_uri"] or ""
4260- event.payload = row["payload"] or "" # default payload: empty string
4261- return event
4262-
4263- def _get_subject_from_row(self, row):
4264- subject = Subject()
4265- for field in ("uri", "text", "storage"):
4266- setattr(subject, field, row["subj_" + field])
4267- subject.origin = row["subj_origin_uri"]
4268- if row["subj_current_uri"]:
4269- subject.current_uri = row["subj_current_uri"]
4270- for field in ("interpretation", "manifestation", "mimetype"):
4271- # Try to get subject attributes from row using the attributed field id
4272- # If attribute does not exist we break the attribute fetching and return
4273- # None instead of crashing
4274- try:
4275- setattr(subject, field,
4276- getattr(self, "_" + field).value(row["subj_" + field]))
4277- except KeyError, e:
4278- log.error("Event %i broken: Table %s has no id %i" \
4279- %(row["id"], field, row["subj_" + field]))
4280- return None
4281- return subject
4282-
4283- def get_events(self, ids, sender=None):
4284- """
4285- Look up a list of events.
4286- """
4287-
4288- t = time.time()
4289-
4290- if not ids:
4291- return []
4292-
4293- # Split ids into cached and uncached
4294- uncached_ids = array("i")
4295- cached_ids = array("i")
4296-
4297- # If ids batch greater than MAX_CACHE_BATCH_SIZE ids ignore cache
4298- use_cache = True
4299- if len(ids) > MAX_CACHE_BATCH_SIZE:
4300- use_cache = False
4301- if not use_cache:
4302- uncached_ids = ids
4303- else:
4304- for id in ids:
4305- if id in self._event_cache:
4306- cached_ids.append(id)
4307- else:
4308- uncached_ids.append(id)
4309-
4310- id_hash = defaultdict(lambda: array("i"))
4311- for n, id in enumerate(ids):
4312- # the same id can be at multible places (LP: #673916)
4313- # cache all of them
4314- id_hash[id].append(n)
4315-
4316- # If we are not able to get an event by the given id
4317- # append None instead of raising an Error. The client
4318- # might simply have requested an event that has been
4319- # deleted
4320- events = {}
4321- sorted_events = [None]*len(ids)
4322-
4323- for id in cached_ids:
4324- event = self._event_cache[id]
4325- if event:
4326- if event is not None:
4327- for n in id_hash[event.id]:
4328- # insert the event into all necessary spots (LP: #673916)
4329- sorted_events[n] = event
4330-
4331- # Get uncached events
4332- rows = self._cursor.execute("""
4333- SELECT * FROM event_view
4334- WHERE id IN (%s)
4335- """ % ",".join("%d" % _id for _id in uncached_ids))
4336-
4337- time_get_uncached = time.time() - t
4338- t = time.time()
4339-
4340- t_get_event = 0
4341- t_get_subject = 0
4342- t_apply_get_hooks = 0
4343-
4344- row_counter = 0
4345- for row in rows:
4346- row_counter += 1
4347- # Assumption: all rows of a same event for its different
4348- # subjects are in consecutive order.
4349- t_get_event -= time.time()
4350- event = self._get_event_from_row(row)
4351- t_get_event += time.time()
4352-
4353- if event:
4354- # Check for existing event.id in event to attach
4355- # other subjects to it
4356- if event.id not in events:
4357- events[event.id] = event
4358- else:
4359- event = events[event.id]
4360-
4361- t_get_subject -= time.time()
4362- subject = self._get_subject_from_row(row)
4363- t_get_subject += time.time()
4364- # Check if subject has a proper value. If none than something went
4365- # wrong while trying to fetch the subject from the row. So instead
4366- # of failing and raising an error. We silently skip the event.
4367- if subject:
4368- event.append_subject(subject)
4369- if use_cache and not event.payload:
4370- self._event_cache[event.id] = event
4371- if event is not None:
4372- for n in id_hash[event.id]:
4373- # insert the event into all necessary spots (LP: #673916)
4374- sorted_events[n] = event
4375- # Avoid caching events with payloads to have keep the cache MB size
4376- # at a decent level
4377-
4378-
4379- log.debug("Got %d raw events in %fs" % (row_counter, time_get_uncached))
4380- log.debug("Got %d events in %fs" % (len(sorted_events), time.time()-t))
4381- log.debug(" Where time spent in _get_event_from_row in %fs" % (t_get_event))
4382- log.debug(" Where time spent in _get_subject_from_row in %fs" % (t_get_subject))
4383- log.debug(" Where time spent in apply_get_hooks in %fs" % (t_apply_get_hooks))
4384- return sorted_events
4385-
4386- def _find_events(self, return_mode, time_range, event_templates,
4387- storage_state, max_events, order, sender=None):
4388- """
4389- Accepts 'event_templates' as either a real list of Events or as
4390- a list of tuples (event_data, subject_data) as we do in the
4391- DBus API.
4392-
4393- Return modes:
4394- - 0: IDs.
4395- - 1: Events.
4396- """
4397- t = time.time()
4398-
4399- where = self._build_sql_event_filter(time_range, event_templates,
4400- storage_state)
4401-
4402- if not where.may_have_results():
4403- return []
4404-
4405- if return_mode == 0:
4406- sql = "SELECT DISTINCT id FROM event_view"
4407- elif return_mode == 1:
4408- sql = "SELECT id FROM event_view"
4409- else:
4410- raise NotImplementedError, "Unsupported return_mode."
4411-
4412- wheresql = " WHERE %s" % where.sql if where else ""
4413-
4414- def group_and_sort(field, wheresql, time_asc=False, count_asc=None,
4415- aggregation_type='max'):
4416-
4417- args = {
4418- 'field': field,
4419- 'aggregation_type': aggregation_type,
4420- 'where_sql': wheresql,
4421- 'time_sorting': 'ASC' if time_asc else 'DESC',
4422- 'aggregation_sql': '',
4423- 'order_sql': '',
4424- }
4425-
4426- if count_asc is not None:
4427- args['aggregation_sql'] = ', COUNT(%s) AS num_events' % \
4428- field
4429- args['order_sql'] = 'num_events %s,' % \
4430- ('ASC' if count_asc else 'DESC')
4431-
4432- return """
4433- NATURAL JOIN (
4434- SELECT %(field)s,
4435- %(aggregation_type)s(timestamp) AS timestamp
4436- %(aggregation_sql)s
4437- FROM event_view %(where_sql)s
4438- GROUP BY %(field)s)
4439- GROUP BY %(field)s
4440- ORDER BY %(order_sql)s timestamp %(time_sorting)s
4441- """ % args
4442-
4443- if order == ResultType.MostRecentEvents:
4444- sql += wheresql + " ORDER BY timestamp DESC"
4445- elif order == ResultType.LeastRecentEvents:
4446- sql += wheresql + " ORDER BY timestamp ASC"
4447- elif order == ResultType.MostRecentEventOrigin:
4448- sql += group_and_sort("origin", wheresql, time_asc=False)
4449- elif order == ResultType.LeastRecentEventOrigin:
4450- sql += group_and_sort("origin", wheresql, time_asc=True)
4451- elif order == ResultType.MostPopularEventOrigin:
4452- sql += group_and_sort("origin", wheresql, time_asc=False,
4453- count_asc=False)
4454- elif order == ResultType.LeastPopularEventOrigin:
4455- sql += group_and_sort("origin", wheresql, time_asc=True,
4456- count_asc=True)
4457- elif order == ResultType.MostRecentSubjects:
4458- # Remember, event.subj_id identifies the subject URI
4459- sql += group_and_sort("subj_id", wheresql, time_asc=False)
4460- elif order == ResultType.LeastRecentSubjects:
4461- sql += group_and_sort("subj_id", wheresql, time_asc=True)
4462- elif order == ResultType.MostPopularSubjects:
4463- sql += group_and_sort("subj_id", wheresql, time_asc=False,
4464- count_asc=False)
4465- elif order == ResultType.LeastPopularSubjects:
4466- sql += group_and_sort("subj_id", wheresql, time_asc=True,
4467- count_asc=True)
4468- elif order == ResultType.MostRecentCurrentUri:
4469- sql += group_and_sort("subj_id_current", wheresql, time_asc=False)
4470- elif order == ResultType.LeastRecentCurrentUri:
4471- sql += group_and_sort("subj_id_current", wheresql, time_asc=True)
4472- elif order == ResultType.MostPopularCurrentUri:
4473- sql += group_and_sort("subj_id_current", wheresql, time_asc=False,
4474- count_asc=False)
4475- elif order == ResultType.LeastPopularCurrentUri:
4476- sql += group_and_sort("subj_id_current", wheresql, time_asc=True,
4477- count_asc=True)
4478- elif order == ResultType.MostRecentActor:
4479- sql += group_and_sort("actor", wheresql, time_asc=False)
4480- elif order == ResultType.LeastRecentActor:
4481- sql += group_and_sort("actor", wheresql, time_asc=True)
4482- elif order == ResultType.MostPopularActor:
4483- sql += group_and_sort("actor", wheresql, time_asc=False,
4484- count_asc=False)
4485- elif order == ResultType.LeastPopularActor:
4486- sql += group_and_sort("actor", wheresql, time_asc=True,
4487- count_asc=True)
4488- elif order == ResultType.OldestActor:
4489- sql += group_and_sort("actor", wheresql, time_asc=True,
4490- aggregation_type="min")
4491- elif order == ResultType.MostRecentOrigin:
4492- sql += group_and_sort("subj_origin", wheresql, time_asc=False)
4493- elif order == ResultType.LeastRecentOrigin:
4494- sql += group_and_sort("subj_origin", wheresql, time_asc=True)
4495- elif order == ResultType.MostPopularOrigin:
4496- sql += group_and_sort("subj_origin", wheresql, time_asc=False,
4497- count_asc=False)
4498- elif order == ResultType.LeastPopularOrigin:
4499- sql += group_and_sort("subj_origin", wheresql, time_asc=True,
4500- count_asc=True)
4501- elif order == ResultType.MostRecentSubjectInterpretation:
4502- sql += group_and_sort("subj_interpretation", wheresql,
4503- time_asc=False)
4504- elif order == ResultType.LeastRecentSubjectInterpretation:
4505- sql += group_and_sort("subj_interpretation", wheresql,
4506- time_asc=True)
4507- elif order == ResultType.MostPopularSubjectInterpretation:
4508- sql += group_and_sort("subj_interpretation", wheresql,
4509- time_asc=False, count_asc=False)
4510- elif order == ResultType.LeastPopularSubjectInterpretation:
4511- sql += group_and_sort("subj_interpretation", wheresql,
4512- time_asc=True, count_asc=True)
4513- elif order == ResultType.MostRecentMimeType:
4514- sql += group_and_sort("subj_mimetype", wheresql, time_asc=False)
4515- elif order == ResultType.LeastRecentMimeType:
4516- sql += group_and_sort("subj_mimetype", wheresql, time_asc=True)
4517- elif order == ResultType.MostPopularMimeType:
4518- sql += group_and_sort("subj_mimetype", wheresql, time_asc=False,
4519- count_asc=False)
4520- elif order == ResultType.LeastPopularMimeType:
4521- sql += group_and_sort("subj_mimetype", wheresql, time_asc=True,
4522- count_asc=True)
4523-
4524- if max_events > 0:
4525- sql += " LIMIT %d" % max_events
4526- result = array("i", self._cursor.execute(sql, where.arguments).fetch(0))
4527-
4528- if return_mode == 0:
4529- log.debug("Found %d event IDs in %fs" % (len(result), time.time()- t))
4530- elif return_mode == 1:
4531- log.debug("Found %d events in %fs" % (len(result), time.time()- t))
4532- result = self.get_events(ids=result, sender=sender)
4533- else:
4534- raise Exception("%d" % return_mode)
4535-
4536- return result
4537-
4538- @staticmethod
4539- def _build_templates(templates):
4540- for event_template in templates:
4541- event_data = event_template[0]
4542- for subject in (event_template[1] or (Subject(),)):
4543- yield Event((event_data, [], None)), Subject(subject)
4544-
4545- def _build_sql_from_event_templates(self, templates):
4546-
4547- where_or = WhereClause(WhereClause.OR)
4548-
4549- for template in templates:
4550- event_template = Event((template[0], [], None))
4551- if template[1]:
4552- subject_templates = [Subject(data) for data in template[1]]
4553- else:
4554- subject_templates = None
4555-
4556- subwhere = WhereClause(WhereClause.AND)
4557-
4558- if event_template.id:
4559- subwhere.add("id = ?", event_template.id)
4560-
4561- try:
4562- value, negation, wildcard = parse_operators(Event, Event.Interpretation, event_template.interpretation)
4563- # Expand event interpretation children
4564- event_interp_where = WhereClause(WhereClause.OR, negation)
4565- for child_interp in (Symbol.find_child_uris_extended(value)):
4566- if child_interp:
4567- event_interp_where.add_text_condition("interpretation",
4568- child_interp, like=wildcard, cache=self._interpretation)
4569- if event_interp_where:
4570- subwhere.extend(event_interp_where)
4571-
4572- value, negation, wildcard = parse_operators(Event, Event.Manifestation, event_template.manifestation)
4573- # Expand event manifestation children
4574- event_manif_where = WhereClause(WhereClause.OR, negation)
4575- for child_manif in (Symbol.find_child_uris_extended(value)):
4576- if child_manif:
4577- event_manif_where.add_text_condition("manifestation",
4578- child_manif, like=wildcard, cache=self._manifestation)
4579- if event_manif_where:
4580- subwhere.extend(event_manif_where)
4581-
4582- value, negation, wildcard = parse_operators(Event, Event.Actor, event_template.actor)
4583- if value:
4584- subwhere.add_text_condition("actor", value, wildcard, negation, cache=self._actor)
4585-
4586- value, negation, wildcard = parse_operators(Event, Event.Origin, event_template.origin)
4587- if value:
4588- subwhere.add_text_condition("origin", value, wildcard, negation)
4589-
4590- if subject_templates is not None:
4591- for subject_template in subject_templates:
4592- value, negation, wildcard = parse_operators(Subject, Subject.Interpretation, subject_template.interpretation)
4593- # Expand subject interpretation children
4594- su_interp_where = WhereClause(WhereClause.OR, negation)
4595- for child_interp in (Symbol.find_child_uris_extended(value)):
4596- if child_interp:
4597- su_interp_where.add_text_condition("subj_interpretation",
4598- child_interp, like=wildcard, cache=self._interpretation)
4599- if su_interp_where:
4600- subwhere.extend(su_interp_where)
4601-
4602- value, negation, wildcard = parse_operators(Subject, Subject.Manifestation, subject_template.manifestation)
4603- # Expand subject manifestation children
4604- su_manif_where = WhereClause(WhereClause.OR, negation)
4605- for child_manif in (Symbol.find_child_uris_extended(value)):
4606- if child_manif:
4607- su_manif_where.add_text_condition("subj_manifestation",
4608- child_manif, like=wildcard, cache=self._manifestation)
4609- if su_manif_where:
4610- subwhere.extend(su_manif_where)
4611-
4612- # FIXME: Expand mime children as well.
4613- # Right now we only do exact matching for mimetypes
4614- # thekorn: this will be fixed when wildcards are supported
4615- value, negation, wildcard = parse_operators(Subject, Subject.Mimetype, subject_template.mimetype)
4616- if value:
4617- subwhere.add_text_condition("subj_mimetype",
4618- value, wildcard, negation, cache=self._mimetype)
4619-
4620- for key in ("uri", "origin", "text"):
4621- value = getattr(subject_template, key)
4622- if value:
4623- value, negation, wildcard = parse_operators(Subject, getattr(Subject, key.title()), value)
4624- subwhere.add_text_condition("subj_%s" % key, value, wildcard, negation)
4625-
4626- if subject_template.current_uri:
4627- value, negation, wildcard = parse_operators(Subject,
4628- Subject.CurrentUri, subject_template.current_uri)
4629- subwhere.add_text_condition("subj_current_uri", value, wildcard, negation)
4630-
4631- if subject_template.storage:
4632- subwhere.add_text_condition("subj_storage", subject_template.storage)
4633-
4634- except KeyError, e:
4635- # Value not in DB
4636- log.debug("Unknown entity in query: %s" % e)
4637- where_or.register_no_result()
4638- continue
4639- where_or.extend(subwhere)
4640- return where_or
4641-
4642- def _build_sql_event_filter(self, time_range, templates, storage_state):
4643-
4644- where = WhereClause(WhereClause.AND)
4645-
4646- # thekorn: we are using the unary operator here to tell sql to not use
4647- # the index on the timestamp column at the first place. This `fix` for
4648- # (LP: #672965) is based on some benchmarks, which suggest a performance
4649- # win, but we might not oversee all implications.
4650- # (see http://www.sqlite.org/optoverview.html section 6.0)
4651- min_time, max_time = time_range
4652- if min_time != 0:
4653- where.add("+timestamp >= ?", min_time)
4654- if max_time != sys.maxint:
4655- where.add("+timestamp <= ?", max_time)
4656-
4657- if storage_state in (StorageState.Available, StorageState.NotAvailable):
4658- where.add("(subj_storage_state = ? OR subj_storage_state IS NULL)",
4659- storage_state)
4660- elif storage_state != StorageState.Any:
4661- raise ValueError, "Unknown storage state '%d'" % storage_state
4662-
4663- where.extend(self._build_sql_from_event_templates(templates))
4664-
4665- return where
4666-
4667-if __name__ == "__main__":
4668- mainloop = gobject.MainLoop(is_running=True)
4669- search_engine = SearchEngineExtension()
4670- ZG_CLIENT._iface.connect_exit(lambda: mainloop.quit ())
4671- mainloop.run()
4672-
4673
4674=== removed file 'extensions/fts-python/lrucache.py'
4675--- extensions/fts-python/lrucache.py 2011-10-10 14:07:42 +0000
4676+++ extensions/fts-python/lrucache.py 1970-01-01 00:00:00 +0000
4677@@ -1,125 +0,0 @@
4678-# -.- coding: utf-8 -.-
4679-
4680-# lrucache.py
4681-#
4682-# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
4683-# Copyright © 2009 Markus Korn <thekorn@gmx.de>
4684-# Copyright © 2011 Seif Lotfy <seif@lotfy.com>
4685-#
4686-# This program is free software: you can redistribute it and/or modify
4687-# it under the terms of the GNU Lesser General Public License as published by
4688-# the Free Software Foundation, either version 2.1 of the License, or
4689-# (at your option) any later version.
4690-#
4691-# This program is distributed in the hope that it will be useful,
4692-# but WITHOUT ANY WARRANTY; without even the implied warranty of
4693-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4694-# GNU Lesser General Public License for more details.
4695-#
4696-# You should have received a copy of the GNU Lesser General Public License
4697-# along with this program. If not, see <http://www.gnu.org/licenses/>.
4698-
4699-class LRUCache:
4700- """
4701- A simple LRUCache implementation backed by a linked list and a dict.
4702- It can be accessed and updated just like a dict. To check if an element
4703- exists in the cache the following type of statements can be used:
4704- if "foo" in cache
4705- """
4706-
4707- class _Item:
4708- """
4709- A container for each item in LRUCache which knows about the
4710- item's position and relations
4711- """
4712- def __init__(self, item_key, item_value):
4713- self.value = item_value
4714- self.key = item_key
4715- self.next = None
4716- self.prev = None
4717-
4718- def __init__(self, max_size):
4719- """
4720- The size of the cache (in number of cached items) is guaranteed to
4721- never exceed 'size'
4722- """
4723- self._max_size = max_size
4724- self.clear()
4725-
4726-
4727- def clear(self):
4728- self._list_end = None # The newest item
4729- self._list_start = None # Oldest item
4730- self._map = {}
4731-
4732- def __len__(self):
4733- return len(self._map)
4734-
4735- def __contains__(self, key):
4736- return key in self._map
4737-
4738- def __delitem__(self, key):
4739- item = self._map[key]
4740- if item.prev:
4741- item.prev.next = item.next
4742- else:
4743- # we are deleting the first item, so we need a new first one
4744- self._list_start = item.next
4745- if item.next:
4746- item.next.prev = item.prev
4747- else:
4748- # we are deleting the last item, get a new last one
4749- self._list_end = item.prev
4750- del self._map[key], item
4751-
4752- def __setitem__(self, key, value):
4753- if key in self._map:
4754- item = self._map[key]
4755- item.value = value
4756- self._move_item_to_end(item)
4757- else:
4758- new = LRUCache._Item(key, value)
4759- self._append_to_list(new)
4760-
4761- if len(self._map) > self._max_size :
4762- # Remove eldest entry from list
4763- self.remove_eldest_item()
4764-
4765- def __getitem__(self, key):
4766- item = self._map[key]
4767- self._move_item_to_end(item)
4768- return item.value
4769-
4770- def __iter__(self):
4771- """
4772- Iteration is in order from eldest to newest,
4773- and returns (key,value) tuples
4774- """
4775- iter = self._list_start
4776- while iter != None:
4777- yield (iter.key, iter.value)
4778- iter = iter.next
4779-
4780- def _move_item_to_end(self, item):
4781- del self[item.key]
4782- self._append_to_list(item)
4783-
4784- def _append_to_list(self, item):
4785- self._map[item.key] = item
4786- if not self._list_start:
4787- self._list_start = item
4788- if self._list_end:
4789- self._list_end.next = item
4790- item.prev = self._list_end
4791- item.next = None
4792- self._list_end = item
4793-
4794- def remove_eldest_item(self):
4795- if self._list_start == self._list_end:
4796- self._list_start = None
4797- self._list_end = None
4798- return
4799- old = self._list_start
4800- old.next.prev = None
4801- self._list_start = old.next
4802- del self[old.key], old
4803
4804=== removed file 'extensions/fts-python/org.gnome.zeitgeist.fts.service.in'
4805--- extensions/fts-python/org.gnome.zeitgeist.fts.service.in 2011-10-10 18:51:40 +0000
4806+++ extensions/fts-python/org.gnome.zeitgeist.fts.service.in 1970-01-01 00:00:00 +0000
4807@@ -1,3 +0,0 @@
4808-[D-BUS Service]
4809-Name=org.gnome.zeitgeist.SimpleIndexer
4810-Exec=@pkgdatadir@/fts-python/fts.py
4811
4812=== removed file 'extensions/fts-python/sql.py'
4813--- extensions/fts-python/sql.py 2012-01-20 14:01:36 +0000
4814+++ extensions/fts-python/sql.py 1970-01-01 00:00:00 +0000
4815@@ -1,301 +0,0 @@
4816-# -.- coding: utf-8 -.-
4817-
4818-# Zeitgeist
4819-#
4820-# Copyright © 2009-2010 Siegfried-Angel Gevatter Pujals <rainct@ubuntu.com>
4821-# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
4822-# Copyright © 2009-2011 Markus Korn <thekorn@gmx.net>
4823-# Copyright © 2009 Seif Lotfy <seif@lotfy.com>
4824-# Copyright © 2011 J.P. Lacerda <jpaflacerda@gmail.com>
4825-# Copyright © 2011 Collabora Ltd.
4826-# By Siegfried-Angel Gevatter Pujals <rainct@ubuntu.com>
4827-#
4828-# This program is free software: you can redistribute it and/or modify
4829-# it under the terms of the GNU Lesser General Public License as published by
4830-# the Free Software Foundation, either version 2.1 of the License, or
4831-# (at your option) any later version.
4832-#
4833-# This program is distributed in the hope that it will be useful,
4834-# but WITHOUT ANY WARRANTY; without even the implied warranty of
4835-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4836-# GNU Lesser General Public License for more details.
4837-#
4838-# You should have received a copy of the GNU Lesser General Public License
4839-# along with this program. If not, see <http://www.gnu.org/licenses/>.
4840-
4841-import sqlite3
4842-import logging
4843-import time
4844-import os
4845-import shutil
4846-
4847-from constants import constants
4848-
4849-log = logging.getLogger("siis.zeitgeist.sql")
4850-
4851-TABLE_MAP = {
4852- "origin": "uri",
4853- "subj_mimetype": "mimetype",
4854- "subj_origin": "uri",
4855- "subj_uri": "uri",
4856- "subj_current_uri": "uri",
4857-}
4858-
4859-def explain_query(cursor, statement, arguments=()):
4860- plan = ""
4861- for r in cursor.execute("EXPLAIN QUERY PLAN "+statement, arguments).fetchall():
4862- plan += str(list(r)) + "\n"
4863- log.debug("Got query:\nQUERY:\n%s (%s)\nPLAN:\n%s" % (statement, arguments, plan))
4864-
4865-class UnicodeCursor(sqlite3.Cursor):
4866-
4867- debug_explain = os.getenv("ZEITGEIST_DEBUG_QUERY_PLANS")
4868-
4869- @staticmethod
4870- def fix_unicode(obj):
4871- if isinstance(obj, (int, long)):
4872- # thekorn: as long as we are using the unary operator for timestamp
4873- # related queries we have to make sure that integers are not
4874- # converted to strings, same applies for long numbers.
4875- return obj
4876- if isinstance(obj, str):
4877- obj = obj.decode("UTF-8")
4878- # seif: Python’s default encoding is ASCII, so whenever a character with
4879- # an ASCII value > 127 is in the input data, you’ll get a UnicodeDecodeError
4880- # because that character can’t be handled by the ASCII encoding.
4881- try:
4882- obj = unicode(obj)
4883- except UnicodeDecodeError, ex:
4884- pass
4885- return obj
4886-
4887- def execute(self, statement, parameters=()):
4888- parameters = [self.fix_unicode(p) for p in parameters]
4889- if UnicodeCursor.debug_explain:
4890- explain_query(super(UnicodeCursor, self), statement, parameters)
4891- return super(UnicodeCursor, self).execute(statement, parameters)
4892-
4893- def fetch(self, index=None):
4894- if index is not None:
4895- for row in self:
4896- yield row[index]
4897- else:
4898- for row in self:
4899- yield row
4900-
4901-def _get_schema_version (cursor, schema_name):
4902- """
4903- Returns the schema version for schema_name or returns 0 in case
4904- the schema doesn't exist.
4905- """
4906- try:
4907- schema_version_result = cursor.execute("""
4908- SELECT version FROM schema_version WHERE schema=?
4909- """, (schema_name,))
4910- result = schema_version_result.fetchone()
4911- return result[0] if result else 0
4912- except sqlite3.OperationalError, e:
4913- # The schema isn't there...
4914- log.debug ("Schema '%s' not found: %s" % (schema_name, e))
4915- return 0
4916-
4917-def _connect_to_db(file_path):
4918- conn = sqlite3.connect(file_path)
4919- conn.row_factory = sqlite3.Row
4920- cursor = conn.cursor(UnicodeCursor)
4921- return cursor
4922-
4923-_cursor = None
4924-def get_default_cursor():
4925- global _cursor
4926- if not _cursor:
4927- dbfile = constants.DATABASE_FILE
4928- start = time.time()
4929- log.info("Using database: %s" % dbfile)
4930- new_database = not os.path.exists(dbfile)
4931- _cursor = _connect_to_db(dbfile)
4932- core_schema_version = _get_schema_version(_cursor, constants.CORE_SCHEMA)
4933- if core_schema_version < constants.CORE_SCHEMA_VERSION:
4934- log.exception(
4935- "Database '%s' is on version %s, but %s is required" % \
4936- (constants.CORE_SCHEMA, core_schema_version,
4937- constants.CORE_SCHEMA_VERSION))
4938- raise SystemExit(27)
4939- return _cursor
4940-def unset_cursor():
4941- global _cursor
4942- _cursor = None
4943-
4944-class TableLookup(dict):
4945-
4946- # We are not using an LRUCache as pressumably there won't be thousands
4947- # of manifestations/interpretations/mimetypes/actors on most
4948- # installations, so we can save us the overhead of tracking their usage.
4949-
4950- def __init__(self, cursor, table):
4951-
4952- self._cursor = cursor
4953- self._table = table
4954-
4955- for row in cursor.execute("SELECT id, value FROM %s" % table):
4956- self[row["value"]] = row["id"]
4957-
4958- self._inv_dict = dict((value, key) for key, value in self.iteritems())
4959-
4960- def __getitem__(self, name):
4961- # Use this for inserting new properties into the database
4962- if name in self:
4963- return super(TableLookup, self).__getitem__(name)
4964- id = self._cursor.execute("SELECT id FROM %s WHERE value=?"
4965- % self._table, (name,)).fetchone()[0]
4966- # If we are here it's a newly inserted value, insert it into cache
4967- self[name] = id
4968- self._inv_dict[id] = name
4969- return id
4970-
4971- def value(self, id):
4972- # When we fetch an event, it either was already in the database
4973- # at the time Zeitgeist started or it was inserted later -using
4974- # Zeitgeist-, so here we always have the data in memory already.
4975- return self._inv_dict[id]
4976-
4977- def id(self, name):
4978- # Use this when fetching values which are supposed to be in the
4979- # database already. Eg., in find_eventids.
4980- return super(TableLookup, self).__getitem__(name)
4981-
4982- def remove_id(self, id):
4983- value = self.value(id)
4984- del self._inv_dict[id]
4985- del self[value]
4986-
4987-def get_right_boundary(text):
4988- """ returns the smallest string which is greater than `text` """
4989- if not text:
4990- # if the search prefix is empty we query for the whole range
4991- # of 'utf-8 'unicode chars
4992- return unichr(0x10ffff)
4993- if isinstance(text, str):
4994- # we need to make sure the text is decoded as 'utf-8' unicode
4995- text = unicode(text, "UTF-8")
4996- charpoint = ord(text[-1])
4997- if charpoint == 0x10ffff:
4998- # if the last character is the biggest possible char we need to
4999- # look at the second last
5000- return get_right_boundary(text[:-1])
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches