Zeitgeist Framework

Invalid cache access (was: Error when trying to fetch items)

Bug #598666 reported by Seif Lotfy on 2010-06-25

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Zeitgeist Framework	Fix Released	Medium	Markus Korn	Zeitgeist Framework 0.8.0 "I only got a donkey"

Bug Description

When i try to fetch all items in one query i get

Error from Zeitgeist engine: org.freedesktop.DBus.Python.KeyError: Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/remote.py", line 254, in FindEvents
    event_templates, storage_state, num_events, result_type, sender))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 378, in find_events
    return self._find_events(1, *args)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 366, in _find_events
    return self.get_events(rows=result, sender=sender)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 188, in get_events
    events[event.id].append_subject(self._get_subject_from_row(row))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 160, in _get_subject_from_row
    getattr(self, "_" + field).value(row["subj_" + field]))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/sql.py", line 422, in value
    return self._inv_dict[id]
KeyError: 138

This looks serious

Related branches

lp:~seif/zeitgeist/fix-598666

Merged into lp:zeitgeist/0.1 at revision 1622

Mikkel Kamstrup Erlandsen: Approve on 2010-10-22

lp:~seif/zeitgeist/catch_cache_error

Merged into lp:zeitgeist/0.1

Markus Korn: Approve on 2011-01-17

lp:~thekorn/zeitgeist/fix-598666-remove_cache_entry

Merged into lp:zeitgeist/0.1 at revision 1709

Mikkel Kamstrup Erlandsen: Approve on 2011-04-07

lp:~zeitgeist-packagers/zeitgeist/debian

lp:debian/zeitgeist

lp:ubuntu/oneiric/zeitgeist

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2010-06-25:

Yeah I get those too from time to time and they are caused by some wrong entry in the database, however I have no idea how those entries get there :/.

Revision history for this message

Seif Lotfy (seif) wrote on 2010-07-12:

Its not the first time it happened to me...

Changed in zeitgeist:
status:	New → Confirmed
importance:	Undecided → Critical

Revision history for this message

Markus Korn (thekorn) wrote on 2010-07-14:

Ok, first of all: why did you set the importance to critical? just because it happens a few times for you? Critical with no milestone set means it will block all future releases for zg, so if we set a task to critical we should make sure that someone is working on a bug (aka. the task has an assignee).

I think our main issue here is that we are not blocking our engine, which can esp. have effects on long runs (which seems to be the case here `getting all events`). We have only one natural lock, its the sqlite lock, which allows only one connection at a time. But when our engine is running pure python code, it is not locked, so it is possible that our cache is `old`/`outdated` before it even has been completely created or updated.
This can also be the reason for `wrong db entries`.

But I should once again say, this is not an important issue, a client should never request all events (or an undefined number of events), basically because no client is able to display 5k events, 100 should be the maximum size. Why do you need *all* events?

Revision history for this message

Seif Lotfy (seif) wrote on 2010-08-02:

the problem is faulty inserts. these could affect a time range requsted if the faulty insert is within that range.

Seif Lotfy (seif) on 2010-08-02

Changed in zeitgeist:
importance:	Critical → Low
status:	Confirmed → Incomplete

Revision history for this message

Markus Korn (thekorn) wrote on 2010-08-02:

For the record: *PLEASE* add your activity.sqlite database or a sample event if this bug happens again.

Seif Lotfy (seif) on 2010-09-07

Changed in zeitgeist:
milestone:	none → 0.6

Revision history for this message

Seif Lotfy (seif) wrote on 2010-10-22: Re: Error when trying items

So When I tried to get events of the last 3 days I had this error
---------------------------------------
Zeitgeist keyerror

seif@Wumbo:~/Projects/elmntry-sezen$ python src/main.py
Error from Zeitgeist engine: org.freedesktop.DBus.Python.KeyError: Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/remote.py", line 253, in FindEvents
    event_templates, storage_state, num_events, result_type, sender))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 394, in find_events
    return self._find_events(1, *args)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 384, in _find_events
    result = self.get_events(ids=[row[0] for row in result], sender=sender)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 184, in get_events
    events[event.id].append_subject(self._get_subject_from_row(row))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 156, in _get_subject_from_row
    getattr(self, "_" + field).value(row["subj_" + field]))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/sql.py", line 449, in value
    return self._inv_dict[id]
KeyError: 36
----------------------------------------------
I looked into the DB and found in event table 5 events with subj_interpretation 36 although no such one existed.

summary:

- Error when trying to fetch all items in one query
+ Error when trying items

Revision history for this message

Seif Lotfy (seif) wrote on 2010-10-22:

In an attempt to fix this bug I couldn't find anything wrong with TableLookup so I added an extra check in the _insert_event method. The branch linked here is a attempt to fix this bug. Sadly I have no idea how to replicate this.

Changed in zeitgeist:
assignee:	nobody → Seif Lotfy (seif)

Seif Lotfy (seif) on 2010-10-22

summary:

- Error when trying items
+ Error when trying to fetch items

Revision history for this message

Seif Lotfy (seif) wrote on 2010-10-22: Re: Error when trying to fetch items

I added a little script tools/zeitgeist-integrity-checker.py that allows us to debug for DB errors .
The solution that I would suggest for now is:
1. Allow the query to continue and return the healthy events and ignore the broken ones instead of just failing.
2. Look into how the broken events get in.

Revision history for this message

Manish Sinha (मनीष सिन्हा) (manishsinha) wrote on 2010-10-22:

The sqlite database Edit (57.0 KiB, application/octet-stream)

I tried running it on my database and got this
http://paste.ubuntu.com/518089/

Markus Korn (thekorn) on 2010-11-01

Changed in zeitgeist:
milestone:	0.6 → none

Seif Lotfy (seif) on 2010-12-28

Changed in zeitgeist:
assignee:	Seif Lotfy (seif) → Siegfried Gevatter (rainct)
importance:	Low → High
status:	Incomplete → Triaged

Siegfried Gevatter (rainct) on 2010-12-28

summary:

- Error when trying to fetch items
+ Invalid cache access (was: Error when trying to fetch items)

Seif Lotfy (seif) on 2010-12-28

Changed in zeitgeist:
milestone:	none → 0.7.0

Revision history for this message

Markus Korn (thekorn) wrote on 2011-01-12:

#10

Seif sent me an activity.log which he managed to create by "[...] randomly clicking in GAJ and synapse[...]". This log is broken and results in the above mentioned KeyError.

The db has a few broken entries: http://paste.ubuntu.com/553188/

Revision history for this message

Seif Lotfy (seif) wrote on 2011-01-12:

#11

So I just added a little temporary fix that I would like to see with the next release.
Instead of zeitgeist returning nothing when requesting a time-period with a broken event, zeitgeist now can ignore the broken event and return a list with healthy events. The broken events can be seen in the log under Error

Seif Lotfy (seif) on 2011-01-13

Changed in zeitgeist:
milestone:	0.7.0 → 0.8.0

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2011-02-10:

#12

The last release added a workaround for this. Since we currently aren't able to reproduce this issue, I'm just going to close this bug now. Please re-open it if you can provide a test case to reproduce the problem.

Changed in zeitgeist:
assignee:	Siegfried Gevatter (rainct) → nobody
status:	Triaged → Fix Released

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2011-03-04:

#13

I'm re-opening this since I've finally found the cause of this problem!

When events are deleted, interpretations/manifestations/mimetypes/etc. which are no longer used are automatically removed from the database. However, the cache isn't informed about this and until Zeitgeist is restarted it'll continue using the IDs of the deleted rows for new events.

(I'm setting the importance down since a workaround for the problem is already deployed).

Changed in zeitgeist:
importance:	High → Medium
status:	Fix Released → Triaged

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2011-03-04:

#14

I've committed a test case reproducing this problem to trunk. I believe the best way to fix it would be using http://www.sqlite.org/c3ref/update_hook.html, but it looks like that functionality is not accessible from Python.

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2011-03-05: Re: [Zeitgeist] [Bug 598666] Re: Invalid cache access (was: Error when trying to fetch items)

#15

I don't have any clever solution off the top of my head - I just want
to give Siegfried some props for the cool detective work! Nice job!

Now. To thinking.

Revision history for this message

Markus Korn (thekorn) wrote on 2011-03-05:

#16

Oh yeah, of course. Indeed, good work Siegfried.
While reading the last few comments one possible fix came into my mind:
Let's maintain a (temporary) helper table called '_fix_cache' with (table_name VARCHAR, id INTEGER) and create a 'BEFORE DELETE' trigger on each cached table (interpretation, manifestation, mimetype, ...) which inserts the table name and the deleted id in _fix_cache once a row in the related table got deleted. All we now have to do is run a 'SELECT * FROM _fix_cache' after deleting the events. If some rows in manifestation/interpretation/... got deleted while deleting events we get back the the ids of the cached values which we then can delete from the caches.
I know that's not optimal, but looks like the only solution to me. This should not be a *big* performance issue, since the number of rows in the helper table should be very low. I'll hack on a prototype tomorrow, and see how it works.

Revision history for this message

Markus Korn (thekorn) wrote on 2011-03-07:

#17

I've added a prototype of my idea as described in my last comment, if we want to do it this way, we have to bump the sql-schema-version, I guess.
I'm not completely sure if this is the way to go, but given the fact that the number of entries in the _fix_cache table should be *very* low over time this should not be a big performance issue, and works well.
Comments welcome.

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2011-03-07:

#18

Hey Markus,

Nice to hear from you again.

What you're proposing sounds better than adding a C binding file for update_hook (unless we decide we want to start adding more C stuff, but I can't think of what that'd be right now). I guess it's the best we can do.

If we're going to do this it should be merged with lp:~zeitgeist/zeitgeist/dbschema4/. I'd like to finish the cache benchmarking first, though.

By the way:
> cursor.execute("CREATE TABLE IF NOT EXISTS _fix_cache (t VARCHAR, id INTRGER)"
You've got a typo in there (INTRGER -> INTEGER).

Seif Lotfy (seif) on 2011-03-17

Changed in zeitgeist:
milestone:	0.8.0 → 0.7.1

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2011-03-20:

#19

Since Mikkel hasn't commented yet, let's leave this for 0.8.

Changed in zeitgeist:
milestone:	0.7.1 → 0.8.0

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2011-03-22:

#20

I looked at Markus' branch and I think it makes sense - provided that we're still confident that the cache gives us a boost :-) And we're bumpting the schema version anayway...

A quick pre-review of Markus' branch:

- there seems to be missing a commit() after the DELETE statement?

- _fix_cache should be pruned when we're done with cleaning the memory cache?

- The INTRGER thing Siegfried noted

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2011-03-22:

#21

Also!

_fix_cache can be a temp table or something right? sqlite has something akin to a memory backed table that is lost when the process dies. We might as well use that?

Revision history for this message

Markus Korn (thekorn) wrote on 2011-03-23:

#22

Thanks for the reviews, I addressed them in my last commits to this branch, I still need to test if the TEMP triggers are working properly.

Seif Lotfy (seif) on 2011-04-07