Merge lp:~kamstrup/zeitgeist/query-expansion into lp:zeitgeist/0.1

Proposed by Mikkel Kamstrup Erlandsen
Status: Merged
Merge reported by: Mikkel Kamstrup Erlandsen
Merged at revision: not available
Proposed branch: lp:~kamstrup/zeitgeist/query-expansion
Merge into: lp:zeitgeist/0.1
Diff against target: 323 lines (+218/-15)
4 files modified
_zeitgeist/engine/main.py (+47/-11)
test/datamodel-test.py (+52/-0)
test/test-sql.py (+51/-0)
zeitgeist/datamodel.py (+68/-4)
To merge this branch: bzr merge lp:~kamstrup/zeitgeist/query-expansion
Reviewer Review Type Date Requested Status
Markus Korn Needs Fixing
Review via email: mp+25000@code.launchpad.net

Description of the change

Huzzah! Smackeroo! I have query expansion fully working now all unit tests passing. Both on the SQL level and on our template matching level.

So what does "query expansion" mean. Consider a query for subject with interp. nfo:Media. Right that would only match stuff that has been explicitly identified as nfo:Media (which is not much since we usually can identify whther stuff is Audio, Image, or Video data).

With query expansion we'll also match any children of nfo:Media. Ie also nfo:Image, nfo:Audio, and nfo:Video. Also recursively matching children of these like nfo:RasterImage and nfo:Vector image.

The way it's implemented is really simple. We simply expand the tree of children and compile a big OR query with everything.

To post a comment you must log in.
Revision history for this message
Markus Korn (thekorn) wrote :
Download full text (16.5 KiB)

Hey Mikkel,
thanks you for your works, it is working fine for me.
Feel free to merge this branch into lp:zeitgeist once you thought about my three comments ;)

Markus

> === modified file '_zeitgeist/engine/main.py'
> --- _zeitgeist/engine/main.py   2010-05-03 16:32:00 +0000
> +++ _zeitgeist/engine/main.py   2010-05-12 19:32:33 +0000
> @@ -32,7 +32,7 @@
>  from collections import defaultdict
>
>  from zeitgeist.datamodel import Event as OrigEvent, StorageState, TimeRange, \
> -       ResultType, get_timestamp_for_now, Interpretation
> +       ResultType, get_timestamp_for_now, Interpretation, Symbol
>  from _zeitgeist.engine.datamodel import Event, Subject
>  from _zeitgeist.engine.extension import ExtensionsCollection, load_class
>  from _zeitgeist.engine import constants
> @@ -163,16 +163,51 @@
>                for (event_template, subject_template) in self._build_templates(templates):
>                        subwhere = WhereClause(WhereClause.AND)
>                        try:
> -                               for key in ("interpretation", "manifestation", "actor"):
> -                                       value = getattr(event_template, key)
> -                                       if value:
> -                                               subwhere.add("%s = ?" % key,
> -                                                       getattr(self, "_" + key).id(value))
> -                               for key in ("interpretation", "manifestation", "mimetype"):
> -                                       value = getattr(subject_template, key)
> -                                       if value:
> -                                               subwhere.add("subj_%s = ?" % key,
> -                                                       getattr(self, "_" + key).id(value))
> +                               # Expand event interpretation children
> +                               event_interp_where = WhereClause(WhereClause.OR)
> +                               for child_interp in
> (Symbol.find_child_uris_extended(event_template.interpretation)):
> +                                       if child_interp:
> +                                               event_interp_where.add("interpretation = ?",
> +                                                                      self._interpretation.id(child_interp))
> +                               if event_interp_where:
> +                                       subwhere.extend(event_interp_where)
> +
> +                               # Expand event manifestation children
> +                               event_manif_where = WhereClause(WhereClause.OR)
> +                               for child_manif in
> (Symbol.find_child_uris_extended(event_template.manifestation)):
> +                                       if child_manif:
> +                                               event_manif_where.add("manifestation = ?",
> +                                                                     self._manifestation.id(child_manif))
> +                               if event_manif_where:
> +                                       subwhere.extend(event_manif_where)
> +
> +                               # Expand subjec...

Revision history for this message
Markus Korn (thekorn) :
review: Needs Fixing

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file '_zeitgeist/engine/main.py'
--- _zeitgeist/engine/main.py 2010-05-01 22:18:55 +0000
+++ _zeitgeist/engine/main.py 2010-05-10 14:47:20 +0000
@@ -32,7 +32,7 @@
32from collections import defaultdict32from collections import defaultdict
3333
34from zeitgeist.datamodel import Event as OrigEvent, StorageState, TimeRange, \34from zeitgeist.datamodel import Event as OrigEvent, StorageState, TimeRange, \
35 ResultType, get_timestamp_for_now, Interpretation35 ResultType, get_timestamp_for_now, Interpretation, Symbol
36from _zeitgeist.engine.datamodel import Event, Subject 36from _zeitgeist.engine.datamodel import Event, Subject
37from _zeitgeist.engine.extension import ExtensionsCollection, load_class37from _zeitgeist.engine.extension import ExtensionsCollection, load_class
38from _zeitgeist.engine import constants38from _zeitgeist.engine import constants
@@ -163,16 +163,51 @@
163 for (event_template, subject_template) in self._build_templates(templates):163 for (event_template, subject_template) in self._build_templates(templates):
164 subwhere = WhereClause(WhereClause.AND)164 subwhere = WhereClause(WhereClause.AND)
165 try:165 try:
166 for key in ("interpretation", "manifestation", "actor"):166 # Expand event interpretation children
167 value = getattr(event_template, key)167 event_interp_where = WhereClause(WhereClause.OR)
168 if value:168 for child_interp in (Symbol.find_child_uris_extended(event_template.interpretation)):
169 subwhere.add("%s = ?" % key,169 if child_interp:
170 getattr(self, "_" + key).id(value))170 event_interp_where.add("interpretation = ?",
171 for key in ("interpretation", "manifestation", "mimetype"):171 self._interpretation.id(child_interp))
172 value = getattr(subject_template, key)172 if event_interp_where:
173 if value:173 subwhere.extend(event_interp_where)
174 subwhere.add("subj_%s = ?" % key,174
175 getattr(self, "_" + key).id(value))175 # Expand event manifestation children
176 event_manif_where = WhereClause(WhereClause.OR)
177 for child_manif in (Symbol.find_child_uris_extended(event_template.manifestation)):
178 if child_manif:
179 event_manif_where.add("manifestation = ?",
180 self._manifestation.id(child_manif))
181 if event_manif_where:
182 subwhere.extend(event_manif_where)
183
184 # Expand subject interpretation children
185 su_interp_where = WhereClause(WhereClause.OR)
186 for child_interp in (Symbol.find_child_uris_extended(subject_template.interpretation)):
187 if child_interp:
188 su_interp_where.add("subj_interpretation = ?",
189 self._interpretation.id(child_interp))
190 if su_interp_where:
191 subwhere.extend(su_interp_where)
192
193 # Expand subject manifestation children
194 su_manif_where = WhereClause(WhereClause.OR)
195 for child_manif in (Symbol.find_child_uris_extended(subject_template.manifestation)):
196 if child_manif:
197 su_manif_where.add("subj_manifestation = ?",
198 self._manifestation.id(child_manif))
199 if su_manif_where:
200 subwhere.extend(su_manif_where)
201
202 # FIXME: Expand mime children as well.
203 # Right now we only do exact matching for mimetypes
204 if subject_template.mimetype:
205 subwhere.add("subj_mimetype = ?",
206 self._mimetype.id(subject_tempalte.mimetype))
207
208 if event_template.actor:
209 subwhere.add("actor = ?",
210 self._actor.id(event_template.actor))
176 except KeyError:211 except KeyError:
177 # Value not in DB212 # Value not in DB
178 where_or.register_no_result()213 where_or.register_no_result()
@@ -183,6 +218,7 @@
183 subwhere.add("subj_%s = ?" % key, value)218 subwhere.add("subj_%s = ?" % key, value)
184 where_or.extend(subwhere)219 where_or.extend(subwhere)
185 220
221 print "SQL: ", where_or.sql, where_or.arguments
186 return where_or222 return where_or
187 223
188 def _build_sql_event_filter(self, time_range, templates, storage_state):224 def _build_sql_event_filter(self, time_range, templates, storage_state):
189225
=== modified file 'test/datamodel-test.py'
--- test/datamodel-test.py 2010-04-26 19:42:07 +0000
+++ test/datamodel-test.py 2010-05-10 14:47:20 +0000
@@ -51,6 +51,47 @@
51 self.assertTrue(f.display_name != None)51 self.assertTrue(f.display_name != None)
52 self.assertTrue(f.doc != None)52 self.assertTrue(f.doc != None)
5353
54class RelationshipTest (unittest.TestCase):
55 """
56 Tests for parent/child relationships in the loaded ontologies
57 """
58
59 def testDirectParents (self):
60 """
61 Tests relationship tracking for immediate parents
62 """
63 self.assertTrue(Interpretation.AUDIO.is_a(Interpretation.MEDIA))
64
65 def testSecondLevelParents (self):
66 """
67 Tests relationship tracking for second level parents
68 """
69 self.assertTrue(Interpretation.VECTOR_IMAGE.is_a(Interpretation.MEDIA))
70 self.assertTrue(Interpretation.VECTOR_IMAGE.is_a(Interpretation.IMAGE))
71
72 def testRootParents (self):
73 """
74 Tests relationship tracking for root nodes, ie Interpretation
75 and Manifestation
76 """
77 self.assertTrue(Interpretation.VECTOR_IMAGE.is_a(Interpretation))
78 self.assertTrue(Manifestation.FILE_DATA_OBJECT.is_a(Manifestation))
79 self.assertTrue(Manifestation.USER_ACTIVITY.is_a(Manifestation))
80
81 def testReflecsive (self):
82 """
83 Assert that a symbol is a child of itself
84 """
85 self.assertTrue(Manifestation.USER_ACTIVITY.is_a(Manifestation.USER_ACTIVITY))
86
87 def testFindExtendedChildren (self):
88 self.assertEquals(["foo://bar"], Symbol.find_child_uris_extended("foo://bar"))
89 self.assertEquals(["http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Icon",
90 "http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#VectorImage",
91 "http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Cursor",
92 "http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#RasterImage",
93 "http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Image"],
94 Symbol.find_child_uris_extended("http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Image"))
5495
55class EventTest (unittest.TestCase):96class EventTest (unittest.TestCase):
56 def setUp(self):97 def setUp(self):
@@ -116,6 +157,17 @@
116 e.manifestation="ILLEGAL SNAFU"157 e.manifestation="ILLEGAL SNAFU"
117 self.assertFalse(e.matches_template(template))158 self.assertFalse(e.matches_template(template))
118 159
160 def testTemplateParentMatching(self):
161 template = Event.new_for_values(
162 manifestation=Manifestation.EVENT_MANIFESTATION,
163 subject_interpretation=Interpretation)
164
165 e = Event.new_for_values(
166 manifestation=Manifestation.USER_ACTIVITY,
167 subject_interpretation=Interpretation.TEXT_DOCUMENT,
168 subject_text="Foo")
169 self.assertTrue(e.matches_template(template))
170
119 def testTemplateFiltering(self):171 def testTemplateFiltering(self):
120 template = Event.new_for_values(interpretation="stfu:OpenEvent")172 template = Event.new_for_values(interpretation="stfu:OpenEvent")
121 events = parse_events("test/data/five_events.js")173 events = parse_events("test/data/five_events.js")
122174
=== added file 'test/test-sql.py'
--- test/test-sql.py 1970-01-01 00:00:00 +0000
+++ test/test-sql.py 2010-05-10 14:47:20 +0000
@@ -0,0 +1,51 @@
1#! /usr/bin/python
2# -.- coding: utf-8 -.-
3
4# Zeitgeist
5#
6# Copyright © 2010 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
7#
8# This program is free software: you can redistribute it and/or modify
9# it under the terms of the GNU Lesser General Public License as published by
10# the Free Software Foundation, either version 3 of the License, or
11# (at your option) any later version.
12#
13# This program is distributed in the hope that it will be useful,
14# but WITHOUT ANY WARRANTY; without even the implied warranty of
15# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16# GNU Lesser General Public License for more details.
17#
18# You should have received a copy of the GNU Lesser General Public License
19# along with this program. If not, see <http://www.gnu.org/licenses/>.
20#
21
22import sys, os
23sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
24
25import unittest
26from _zeitgeist.engine.sql import *
27
28class SQLTest (unittest.TestCase):
29
30 def testFlat (self):
31 where = WhereClause(WhereClause.AND)
32 where.add ("foo = %s", 10)
33 where.add ("bar = %s", 27)
34 self.assertEquals(where.sql % tuple(where.arguments),
35 "(foo = 10 AND bar = 27)")
36
37 def testNested (self):
38 where = WhereClause(WhereClause.AND)
39 where.add ("foo = %s", 10)
40
41 subwhere = WhereClause(WhereClause.OR)
42 subwhere.add ("subfoo = %s", 68)
43 subwhere.add ("subbar = %s", 69)
44 where.extend(subwhere)
45 where.add ("bar = %s", 11)
46
47 self.assertEquals(where.sql % tuple(where.arguments),
48 "(foo = 10 AND (subfoo = 68 OR subbar = 69) AND bar = 11)")
49
50if __name__ == "__main__":
51 unittest.main()
052
=== modified file 'zeitgeist/datamodel.py'
--- zeitgeist/datamodel.py 2010-04-29 08:28:44 +0000
+++ zeitgeist/datamodel.py 2010-05-10 14:47:20 +0000
@@ -185,6 +185,22 @@
185 dikt[self.name] = self185 dikt[self.name] = self
186 for child in self._children.itervalues():186 for child in self._children.itervalues():
187 child._visit(dikt) 187 child._visit(dikt)
188
189 @staticmethod
190 def find_child_uris_extended (uri):
191 """
192 Creates a list of all known child URIs of `uri`, including
193 `uri` itself in the list. Hence the "extended". If `uri`
194 is unknown a list containing only `uri` is returned.
195 """
196 try:
197 symbol = _SYMBOLS_BY_URI[uri]
198 children = [child.uri for child in symbol.get_all_children()]
199 children.append(uri)
200 return children
201 except KeyError, e:
202 return [uri]
203
188204
189 @property205 @property
190 def uri(self):206 def uri(self):
@@ -236,7 +252,51 @@
236 Returns a list of immediate parent symbols252 Returns a list of immediate parent symbols
237 """253 """
238 return frozenset(self._parents.itervalues())254 return frozenset(self._parents.itervalues())
239 255
256 def is_a (self, parent):
257 """
258 Returns True if this symbol is a child of `parent`.
259 """
260 if not isinstance (parent, Symbol):
261 try:
262 parent = _SYMBOLS_BY_URI[parent]
263 except KeyError, e:
264 # Parent is not a known URI
265 print 11111111111, self.uri, parent
266 return self.uri == parent
267
268 # Invariant: parent is a Symbol
269 if self.uri == parent.uri : return True
270
271 parent._ensure_all_children()
272
273 # FIXME: We should really check that child.uri is in there,
274 # but that is not fast with the current code layout
275 return self.name in parent._all_children
276
277 @staticmethod
278 def uri_is_a (child, parent):
279 """
280 Returns True if `child` is a child of `parent`. Both `child`
281 and `parent` arguments must be any combination of
282 :class:`Symbol` and/or string.
283 """
284 if isinstance (child, basestring):
285 try:
286 child = _SYMBOLS_BY_URI[child]
287 except KeyError, e:
288 # Child is not a know URI
289 if isinstance (parent, basestring):
290 return child == parent
291 elif isinstance (parent, Symbol):
292 return child == parent.uri
293 else:
294 return False
295
296 if not isinstance (child, Symbol):
297 raise ValueError("Child argument must be a Symbol or string. Got %s" % type(child))
298
299 return child.is_a(parent)
240 300
241class TimeRange(list):301class TimeRange(list):
242 """302 """
@@ -463,11 +523,13 @@
463 """523 """
464 Return True if this Subject matches *subject_template*. Empty524 Return True if this Subject matches *subject_template*. Empty
465 fields in the template are treated as wildcards.525 fields in the template are treated as wildcards.
526 Interpretations and manifestations are also matched if they are
527 children of the types specified in `subject_template`.
466 528
467 See also :meth:`Event.matches_template`529 See also :meth:`Event.matches_template`
468 """530 """
469 for m in Subject.Fields:531 for m in Subject.Fields:
470 if subject_template[m] and subject_template[m] != self[m] :532 if subject_template[m] and not Symbol.uri_is_a (self[m], subject_template[m]):
471 return False533 return False
472 return True534 return True
473535
@@ -693,7 +755,9 @@
693 """755 """
694 Return True if this event matches *event_template*. The756 Return True if this event matches *event_template*. The
695 matching is done where unset fields in the template is757 matching is done where unset fields in the template is
696 interpreted as wild cards. If the template has more than one758 interpreted as wild cards. Interpretations and manifestations
759 are also matched if they are children of the types specified
760 in `event_template`. If the template has more than one
697 subject, this event matches if at least one of the subjects761 subject, this event matches if at least one of the subjects
698 on this event matches any single one of the subjects on the762 on this event matches any single one of the subjects on the
699 template.763 template.
@@ -707,7 +771,7 @@
707 tdata = event_template[0]771 tdata = event_template[0]
708 for m in Event.Fields:772 for m in Event.Fields:
709 if m == Event.Timestamp : continue773 if m == Event.Timestamp : continue
710 if tdata[m] and tdata[m] != data[m] : return False774 if tdata[m] and not Symbol.uri_is_a (data[m], tdata[m]) : return False
711 775
712 # If template has no subjects we have a match776 # If template has no subjects we have a match
713 if len(event_template[1]) == 0 : return True777 if len(event_template[1]) == 0 : return True