Merge lp:~zeitgeist/zeitgeist/bluebird-no-distinct into lp:~zeitgeist/zeitgeist/bluebird

Proposed by Siegfried Gevatter
Status: Merged
Approved by: Michal Hruby
Approved revision: 356
Merged at revision: 358
Proposed branch: lp:~zeitgeist/zeitgeist/bluebird-no-distinct
Merge into: lp:~zeitgeist/zeitgeist/bluebird
Diff against target: 41 lines (+8/-7)
1 file modified
src/engine.vala (+8/-7)
To merge this branch: bzr merge lp:~zeitgeist/zeitgeist/bluebird-no-distinct
Reviewer Review Type Date Requested Status
Michal Hruby (community) Approve
Review via email: mp+87175@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Siegfried Gevatter (rainct) wrote :

From #sqlite:
-----------------------------------------------
<RainCT> Hi. I've got a question: do .step() calls (internally) just look for the next row, or does the first .step() call already process everything?
<RainCT> Particularly, I'm wondering if removing a LIMIT from a query and instead aborting the while(... stmt.step() ...) once I've got enough will have any bad side effect
<#sqlite_guy> RainCT: technically, each step is actually executing code on the internal VM
<#sqlite_guy> I would say that if it makes sense to put in a LIMIT, then do so
<RainCT> #sqlite_guy: the problem is that the limit depends on another condition (a DISTINCT), but having SQLite check that makes the query 100x slower
<#sqlite_guy> how messy is the distinct?
<#sqlite_guy> could you throw an appropriate index on?
<RainCT> and I can really easily check that condition in code and count the number of things I got myself
<#sqlite_guy> honestly, you're up in the realm of personal taste/whatever works
<RainCT> #sqlite_guy: it's basically an integer ID which may be the same for a few consecutive rows
<RainCT> #sqlite_guy: Okay, thanks :). On benchmarks it seems to work fine, but I just wanted to make sure I'm not missing something.
<#sqlite_guy> nah
<#sqlite_guy> you're not missing anything
<#sqlite_guy> just make sure to hit sqlite3_reset when you're done with it
<#sqlite_guy> or finalize, obviously
-----------------------------------------------

Revision history for this message
Michal Hruby (mhr3) wrote :

33 + event_ids += event_id;

We could be more clever here, this will cause 3-6 reallocations in common cases, perhaps the array should be preallocated to `int.min(max_events, 128)`? (for non-zero max_events).

355. By Siegfried Gevatter

benchmark-tools: fix symlink and usage example.

356. By Siegfried Gevatter

Using DISTINCT in FindEventIds has a huge performance penalty, so we
do the filtering and counting ourselves instead.

Revision history for this message
Michal Hruby (mhr3) wrote :

As the query plans show, without the DISTINCT sql doesn't need to build one temp b-tree, so it's slightly faster.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/engine.vala'
2--- src/engine.vala 2011-12-29 13:59:16 +0000
3+++ src/engine.vala 2011-12-31 14:59:23 +0000
4@@ -207,7 +207,7 @@
5 //if (!where.may_have_results ())
6 // return new uint32[0];
7
8- string sql = "SELECT DISTINCT id FROM event_view ";
9+ string sql = "SELECT id FROM event_view ";
10 string where_sql = "";
11 if (!where.is_empty ())
12 {
13@@ -321,9 +321,6 @@
14 throw new EngineError.INVALID_ARGUMENT (error_message);
15 }
16
17- if (max_events > 0)
18- sql += " LIMIT %u".printf (max_events);
19-
20 int rc;
21 Sqlite.Statement stmt;
22
23@@ -338,11 +335,15 @@
24
25 while ((rc = stmt.step()) == Sqlite.ROW)
26 {
27- var id = (uint32) uint64.parse(
28+ var event_id = (uint32) uint64.parse(
29 stmt.column_text (EventViewRows.ID));
30- event_ids += id;
31+ // Events are supposed to be contiguous in the database
32+ if (event_ids.length == 0 || event_ids[event_ids.length-1] != event_id) {
33+ event_ids += event_id;
34+ if (event_ids.length == max_events) break;
35+ }
36 }
37- if (rc != Sqlite.DONE)
38+ if (rc != Sqlite.DONE && rc != Sqlite.ROW)
39 {
40 string error_message = "Error in find_event_ids: %d, %s".printf (
41 rc, db.errmsg ());

Subscribers

People subscribed via source and target branches