Invalid cache access (was: Error when trying to fetch items)

Bug #598666 reported by Seif Lotfy
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Zeitgeist Framework
Fix Released
Medium
Markus Korn

Bug Description

When i try to fetch all items in one query i get

Error from Zeitgeist engine: org.freedesktop.DBus.Python.KeyError: Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/remote.py", line 254, in FindEvents
    event_templates, storage_state, num_events, result_type, sender))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 378, in find_events
    return self._find_events(1, *args)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 366, in _find_events
    return self.get_events(rows=result, sender=sender)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 188, in get_events
    events[event.id].append_subject(self._get_subject_from_row(row))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 160, in _get_subject_from_row
    getattr(self, "_" + field).value(row["subj_" + field]))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/sql.py", line 422, in value
    return self._inv_dict[id]
KeyError: 138

This looks serious

Revision history for this message
Siegfried Gevatter (rainct) wrote :

Yeah I get those too from time to time and they are caused by some wrong entry in the database, however I have no idea how those entries get there :/.

Revision history for this message
Seif Lotfy (seif) wrote :

Its not the first time it happened to me...

Changed in zeitgeist:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Markus Korn (thekorn) wrote :

Ok, first of all: why did you set the importance to critical? just because it happens a few times for you? Critical with no milestone set means it will block all future releases for zg, so if we set a task to critical we should make sure that someone is working on a bug (aka. the task has an assignee).

I think our main issue here is that we are not blocking our engine, which can esp. have effects on long runs (which seems to be the case here `getting all events`). We have only one natural lock, its the sqlite lock, which allows only one connection at a time. But when our engine is running pure python code, it is not locked, so it is possible that our cache is `old`/`outdated` before it even has been completely created or updated.
This can also be the reason for `wrong db entries`.

But I should once again say, this is not an important issue, a client should never request all events (or an undefined number of events), basically because no client is able to display 5k events, 100 should be the maximum size. Why do you need *all* events?

Revision history for this message
Seif Lotfy (seif) wrote :

the problem is faulty inserts. these could affect a time range requsted if the faulty insert is within that range.

Seif Lotfy (seif)
Changed in zeitgeist:
importance: Critical → Low
status: Confirmed → Incomplete
Revision history for this message
Markus Korn (thekorn) wrote :

For the record: *PLEASE* add your activity.sqlite database or a sample event if this bug happens again.

Seif Lotfy (seif)
Changed in zeitgeist:
milestone: none → 0.6
Revision history for this message
Seif Lotfy (seif) wrote : Re: Error when trying items

So When I tried to get events of the last 3 days I had this error
---------------------------------------
Zeitgeist keyerror

seif@Wumbo:~/Projects/elmntry-sezen$ python src/main.py
Error from Zeitgeist engine: org.freedesktop.DBus.Python.KeyError: Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/remote.py", line 253, in FindEvents
    event_templates, storage_state, num_events, result_type, sender))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 394, in find_events
    return self._find_events(1, *args)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 384, in _find_events
    result = self.get_events(ids=[row[0] for row in result], sender=sender)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 184, in get_events
    events[event.id].append_subject(self._get_subject_from_row(row))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 156, in _get_subject_from_row
    getattr(self, "_" + field).value(row["subj_" + field]))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/sql.py", line 449, in value
    return self._inv_dict[id]
KeyError: 36
----------------------------------------------
I looked into the DB and found in event table 5 events with subj_interpretation 36 although no such one existed.

summary: - Error when trying to fetch all items in one query
+ Error when trying items
Revision history for this message
Seif Lotfy (seif) wrote :

In an attempt to fix this bug I couldn't find anything wrong with TableLookup so I added an extra check in the _insert_event method. The branch linked here is a attempt to fix this bug. Sadly I have no idea how to replicate this.

Changed in zeitgeist:
assignee: nobody → Seif Lotfy (seif)
Seif Lotfy (seif)
summary: - Error when trying items
+ Error when trying to fetch items
Revision history for this message
Seif Lotfy (seif) wrote : Re: Error when trying to fetch items

I added a little script tools/zeitgeist-integrity-checker.py that allows us to debug for DB errors .
The solution that I would suggest for now is:
1. Allow the query to continue and return the healthy events and ignore the broken ones instead of just failing.
2. Look into how the broken events get in.

Revision history for this message
Manish Sinha (मनीष सिन्हा) (manishsinha) wrote :

I tried running it on my database and got this
http://paste.ubuntu.com/518089/

Markus Korn (thekorn)
Changed in zeitgeist:
milestone: 0.6 → none
Seif Lotfy (seif)
Changed in zeitgeist:
assignee: Seif Lotfy (seif) → Siegfried Gevatter (rainct)
importance: Low → High
status: Incomplete → Triaged
summary: - Error when trying to fetch items
+ Invalid cache access (was: Error when trying to fetch items)
Seif Lotfy (seif)
Changed in zeitgeist:
milestone: none → 0.7.0
Revision history for this message
Markus Korn (thekorn) wrote :

Seif sent me an activity.log which he managed to create by "[...] randomly clicking in GAJ and synapse[...]". This log is broken and results in the above mentioned KeyError.

The db has a few broken entries: http://paste.ubuntu.com/553188/

Revision history for this message
Seif Lotfy (seif) wrote :

So I just added a little temporary fix that I would like to see with the next release.
Instead of zeitgeist returning nothing when requesting a time-period with a broken event, zeitgeist now can ignore the broken event and return a list with healthy events. The broken events can be seen in the log under Error

Seif Lotfy (seif)
Changed in zeitgeist:
milestone: 0.7.0 → 0.8.0
Revision history for this message
Siegfried Gevatter (rainct) wrote :

The last release added a workaround for this. Since we currently aren't able to reproduce this issue, I'm just going to close this bug now. Please re-open it if you can provide a test case to reproduce the problem.

Changed in zeitgeist:
assignee: Siegfried Gevatter (rainct) → nobody
status: Triaged → Fix Released
Revision history for this message
Siegfried Gevatter (rainct) wrote :

I'm re-opening this since I've finally found the cause of this problem!

When events are deleted, interpretations/manifestations/mimetypes/etc. which are no longer used are automatically removed from the database. However, the cache isn't informed about this and until Zeitgeist is restarted it'll continue using the IDs of the deleted rows for new events.

(I'm setting the importance down since a workaround for the problem is already deployed).

Changed in zeitgeist:
importance: High → Medium
status: Fix Released → Triaged
Revision history for this message
Siegfried Gevatter (rainct) wrote :

I've committed a test case reproducing this problem to trunk. I believe the best way to fix it would be using http://www.sqlite.org/c3ref/update_hook.html, but it looks like that functionality is not accessible from Python.

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote : Re: [Zeitgeist] [Bug 598666] Re: Invalid cache access (was: Error when trying to fetch items)

I don't have any clever solution off the top of my head - I just want
to give Siegfried some props for the cool detective work! Nice job!

Now. To thinking.

Revision history for this message
Markus Korn (thekorn) wrote :

Oh yeah, of course. Indeed, good work Siegfried.
While reading the last few comments one possible fix came into my mind:
Let's maintain a (temporary) helper table called '_fix_cache' with (table_name VARCHAR, id INTEGER) and create a 'BEFORE DELETE' trigger on each cached table (interpretation, manifestation, mimetype, ...) which inserts the table name and the deleted id in _fix_cache once a row in the related table got deleted. All we now have to do is run a 'SELECT * FROM _fix_cache' after deleting the events. If some rows in manifestation/interpretation/... got deleted while deleting events we get back the the ids of the cached values which we then can delete from the caches.
I know that's not optimal, but looks like the only solution to me. This should not be a *big* performance issue, since the number of rows in the helper table should be very low. I'll hack on a prototype tomorrow, and see how it works.

Revision history for this message
Markus Korn (thekorn) wrote :

I've added a prototype of my idea as described in my last comment, if we want to do it this way, we have to bump the sql-schema-version, I guess.
I'm not completely sure if this is the way to go, but given the fact that the number of entries in the _fix_cache table should be *very* low over time this should not be a big performance issue, and works well.
Comments welcome.

Revision history for this message
Siegfried Gevatter (rainct) wrote :

Hey Markus,

Nice to hear from you again.

What you're proposing sounds better than adding a C binding file for update_hook (unless we decide we want to start adding more C stuff, but I can't think of what that'd be right now). I guess it's the best we can do.

If we're going to do this it should be merged with lp:~zeitgeist/zeitgeist/dbschema4/. I'd like to finish the cache benchmarking first, though.

By the way:
> cursor.execute("CREATE TABLE IF NOT EXISTS _fix_cache (t VARCHAR, id INTRGER)"
You've got a typo in there (INTRGER -> INTEGER).

Seif Lotfy (seif)
Changed in zeitgeist:
milestone: 0.8.0 → 0.7.1
Revision history for this message
Siegfried Gevatter (rainct) wrote :

Since Mikkel hasn't commented yet, let's leave this for 0.8.

Changed in zeitgeist:
milestone: 0.7.1 → 0.8.0
Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

I looked at Markus' branch and I think it makes sense - provided that we're still confident that the cache gives us a boost :-) And we're bumpting the schema version anayway...

A quick pre-review of Markus' branch:

 - there seems to be missing a commit() after the DELETE statement?

 - _fix_cache should be pruned when we're done with cleaning the memory cache?

 - The INTRGER thing Siegfried noted

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Also!

_fix_cache can be a temp table or something right? sqlite has something akin to a memory backed table that is lost when the process dies. We might as well use that?

Revision history for this message
Markus Korn (thekorn) wrote :

Thanks for the reviews, I addressed them in my last commits to this branch, I still need to test if the TEMP triggers are working properly.

Seif Lotfy (seif)
Changed in zeitgeist:
assignee: nobody → Markus Korn (thekorn)
status: Triaged → Fix Committed
Seif Lotfy (seif)
Changed in zeitgeist:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.