Merge lp:~zeitgeist/zeitgeist/small-find-events-optimization into lp:zeitgeist/0.1

Proposed by Mikkel Kamstrup Erlandsen
Status: Merged
Approved by: Seif Lotfy
Approved revision: 1557
Merged at revision: 1562
Proposed branch: lp:~zeitgeist/zeitgeist/small-find-events-optimization
Merge into: lp:zeitgeist/0.1
Diff against target: 39 lines (+13/-8)
1 file modified
_zeitgeist/engine/main.py (+13/-8)
To merge this branch: bzr merge lp:~zeitgeist/zeitgeist/small-find-events-optimization
Reviewer Review Type Date Requested Status
Markus Korn Approve
Review via email: mp+34065@code.launchpad.net

Description of the change

I was seeing query response times of around 200ms when calling ZG from Unity. As this seems like a suspiciously long query time for my small database I went looking a bit into this.

First I noticed that the debug/profiling statements we had for this was not only imprecise but also directly hiding the truth. So I fixed that. The problem being that we never really logged the time it took to execute the query where we find, group, and sort the events, only the time it takes us to build the event objects after the query ...

Next thing I noticed was that we spend surprisingly long time in _find_events() when looking up full events compared to just looking up the ids. So I made that function always just find the ids and then do another SQL call to look up the full event structures. Surprisingly adding this extra SQL call shaves off roughly 50ms on the full query time.

(Note: This was done in my Canonical time - so if you accept this branch Zeitgeist will forever be "tainted" bwahaha! ;-P)

To post a comment you must log in.
Revision history for this message
Seif Lotfy (seif) wrote :
Download full text (3.6 KiB)

AWESOME...
Tainted how, you did not add your copyrights :P

On Mon, Aug 30, 2010 at 3:27 PM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:

> Mikkel Kamstrup Erlandsen has proposed merging
> lp:~zeitgeist/zeitgeist/small-find-events-optimization into lp:zeitgeist.
>
> Requested reviews:
> Zeitgeist Framework Team (zeitgeist)
>
>
> I was seeing query response times of around 200ms when calling ZG from
> Unity. As this seems like a suspiciously long query time for my small
> database I went looking a bit into this.
>
> First I noticed that the debug/profiling statements we had for this was not
> only imprecise but also directly hiding the truth. So I fixed that. The
> problem being that we never really logged the time it took to execute the
> query where we find, group, and sort the events, only the time it takes us
> to build the event objects after the query ...
>
> Next thing I noticed was that we spend surprisingly long time in
> _find_events() when looking up full events compared to just looking up the
> ids. So I made that function always just find the ids and then do another
> SQL call to look up the full event structures. Surprisingly adding this
> extra SQL call shaves off roughly 50ms on the full query time.
>
> (Note: This was done in my Canonical time - so if you accept this branch
> Zeitgeist will forever be "tainted" bwahaha! ;-P)
>
> --
>
> https://code.launchpad.net/~zeitgeist/zeitgeist/small-find-events-optimization/+merge/34065<https://code.launchpad.net/%7Ezeitgeist/zeitgeist/small-find-events-optimization/+merge/34065>
> You are subscribed to branch lp:zeitgeist.
>
> === modified file '_zeitgeist/engine/main.py'
> --- _zeitgeist/engine/main.py 2010-08-02 10:13:12 +0000
> +++ _zeitgeist/engine/main.py 2010-08-30 13:27:45 +0000
> @@ -337,7 +337,7 @@
> if return_mode == 0:
> sql = "SELECT DISTINCT id FROM event_view"
> elif return_mode == 1:
> - sql = "SELECT * FROM event_view"
> + sql = "SELECT id FROM event_view"
> elif return_mode == 2:
> sql = "SELECT subj_uri, timestamp FROM event_view"
> else:
> @@ -374,14 +374,19 @@
>
> result = self._cursor.execute(sql,
> where.arguments).fetchall()
>
> - if return_mode == 1:
> - return self.get_events(rows=result, sender=sender)
> - if return_mode == 2:
> - return map(lambda row: (row[0], row[1]), result)
> - else: # return_mode == 0
> + if return_mode == 0:
> + log.debug("Found %d event IDs in %fs" %
> (len(result), time.time()- t))
> result = [row[0] for row in result]
> - log.debug("Fetched %d event IDs in %fs" %
> (len(result), time.time()- t))
> - return result
> + elif return_mode == 1:
> + log.debug("Found %d events in %fs" % (len(result),
> time.time()- t))
> + result = self.get_events(ids=[row[0] for row in
> result], sender=sender)
> + ...

Read more...

Revision history for this message
Markus Korn (thekorn) wrote :

[...]
> -               else: # return_mode == 0
> +               if return_mode == 0:
> +                       log.debug("Found %d event IDs in %fs" % (len(result), time.time()- t))
>                        result = [row[0] for row in result]
> -                       log.debug("Fetched %d event IDs in %fs" % (len(result), time.time()- t))
> -                       return result
> +               elif return_mode == 1:
> +                       log.debug("Found %d events in %fs" % (len(result), time.time()- t))

IMHO, this should be "Found %d event IDs(...]" too

> +                       result = self.get_events(ids=[row[0] for row in result], sender=sender)
> +               elif return_mode == 2:
> +                       log.debug("Found %d (uri,timestamp) tuples in %fs" % (len(result), time.time()- t))
> +                       result = map(lambda row: (row[0], row[1]), result)
> +               else:
> +                       raise Exception("%d" % return_mode)
> +
> +               return result
>
>        def find_eventids(self, *args):
>                return self._find_events(0, *args)
>
>
>

review: Needs Fixing
Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Seif: Right - I didn't think the change was big enough for warranting copyright assignment, but you'll have @canonical.com in the commit logs ;-)

Markus: Right. Fixing now

1557. By Mikkel Kamstrup Erlandsen

Fix debug/profiling statement

Revision history for this message
Markus Korn (thekorn) wrote :

Approved after Mikkel changed the debug statement.
Thanks for your work, Mikkel.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '_zeitgeist/engine/main.py'
2--- _zeitgeist/engine/main.py 2010-08-02 10:13:12 +0000
3+++ _zeitgeist/engine/main.py 2010-08-30 18:44:40 +0000
4@@ -337,7 +337,7 @@
5 if return_mode == 0:
6 sql = "SELECT DISTINCT id FROM event_view"
7 elif return_mode == 1:
8- sql = "SELECT * FROM event_view"
9+ sql = "SELECT id FROM event_view"
10 elif return_mode == 2:
11 sql = "SELECT subj_uri, timestamp FROM event_view"
12 else:
13@@ -374,14 +374,19 @@
14
15 result = self._cursor.execute(sql, where.arguments).fetchall()
16
17- if return_mode == 1:
18- return self.get_events(rows=result, sender=sender)
19- if return_mode == 2:
20- return map(lambda row: (row[0], row[1]), result)
21- else: # return_mode == 0
22+ if return_mode == 0:
23+ log.debug("Found %d event IDs in %fs" % (len(result), time.time()- t))
24 result = [row[0] for row in result]
25- log.debug("Fetched %d event IDs in %fs" % (len(result), time.time()- t))
26- return result
27+ elif return_mode == 1:
28+ log.debug("Found %d events IDs in %fs" % (len(result), time.time()- t))
29+ result = self.get_events(ids=[row[0] for row in result], sender=sender)
30+ elif return_mode == 2:
31+ log.debug("Found %d (uri,timestamp) tuples in %fs" % (len(result), time.time()- t))
32+ result = map(lambda row: (row[0], row[1]), result)
33+ else:
34+ raise Exception("%d" % return_mode)
35+
36+ return result
37
38 def find_eventids(self, *args):
39 return self._find_events(0, *args)