zeitgeist fails to run if its database structure is not complete

Bug #660307 reported by Rocko
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Zeitgeist Framework
Fix Released
Low
J.P. Lacerda
zeitgeist (Ubuntu)
Fix Released
Low
Seif Lotfy

Bug Description

Binary package hint: zeitgeist

If zeitgeist's database (~/.local/share/zeitgeist/activity.sqlite) is incomplete, eg missing the events table, zeitgeist fails to run. And because the GUI does not report that zeitgeist faiiled to run, applications that rely on zeitgeist simply fail to work without any relevant reason given.

I ran into this problem on upgrading an installation from Ubuntu 10.04 to 10.10. After the upgrade, the dockbarx applet failed to run. The error message from gnome-panel just said it had failed to run, and .xsession-errors said the child process did not report any specific error. Running in debug mode (ie with the command "dockbarx-factory.py run-in-window") gave:

ERROR:dbus.proxies:Introspect error on :1.134:/org/gnome/zeitgeist/log/activity: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Message did not receive a reply (timeout by message bus)
DEBUG:dbus.proxies:Executing introspect queue due to error
Traceback (most recent call last):
  File "/usr/bin/dockbarx_factory.py", line 26, in <module>
    import dockbarx.dockbar
...
  File "/usr/lib/pymodules/python2.6/dbus/proxies.py", line 140, in __call__
    **keywords)
  File "/usr/lib/pymodules/python2.6/dbus/connection.py", line 620, in call_blocking
    message, timeout)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.134 was not provided by any .service files

The error appeared to be a dbus error, but in fact was a problem with zeitgeist, which was failing to run because its database apparently was corrupted during the upgrade. I fixed the problem (eventually) by deleting the zeitgeist database file and restarting the zeitgeist-daemon manually.

What I would expect to happen is:

1) The GUI should report that zeitgeist has failed to run.

2) Better yet, zeitgeist could create the necessary tables if its database is invalid or perhaps backup the old database and create a new one so it can run properly.

It would of course be nice if dockbarx reported better error information, but since there are other applications that depend on zeitgeist, it would be good if zeitgeist could recover from this situation.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: zeitgeist 0.5.2-0ubuntu1
ProcVersionSignature: Ubuntu 2.6.35-22.34-generic 2.6.35.4
Uname: Linux 2.6.35-22-generic i686
Architecture: i386
Date: Thu Oct 14 11:52:41 2010
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Alpha i386 (20100602.2)
PackageArchitecture: all
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_AU.UTF-8
 SHELL=/bin/bash
SourcePackage: zeitgeist

Related branches

Revision history for this message
Rocko (rockorequin) wrote :
Revision history for this message
Seif Lotfy (seif) wrote :

The problem is more difficult than that. If a table is not there it gets created with the next starting of zeitgeist. So there should be another explanation why the update failed... We need more info about the issue... I will mark it as incomplete until we can actually reproduce this bug.

Changed in zeitgeist:
status: New → Incomplete
Revision history for this message
Rocko (rockorequin) wrote :

No, I do not believe that is correct. The table is *not* created with the next starting of zeitgeist - see note 2 in the original description:

2) Better yet, zeitgeist could create the necessary tables if its database is invalid or perhaps backup the old database and create a new one so it can run properly.

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 660307] Re: zeitgeist fails to run if its database structure is not complete
Download full text (3.7 KiB)

I found an issue...
The tables are generated if not existent on each startup
However if the table exists but is damaged or incomplete it wont overwrite
it
We can fix that by getting the columns for each table and replacing if
incomplete
Does that sound sane?

On Fri, Nov 5, 2010 at 4:49 AM, Rocko <email address hidden> wrote:

> No, I do not believe that is correct. The table is *not* created with
> the next starting of zeitgeist - see note 2 in the original description:
>
> 2) Better yet, zeitgeist could create the necessary tables if its
> database is invalid or perhaps backup the old database and create a new
> one so it can run properly.
>
> --
> zeitgeist fails to run if its database structure is not complete
> https://bugs.launchpad.net/bugs/660307
> You received this bug notification because you are subscribed to
> Zeitgeist Framework.
>
> Status in Zeitgeist Framework: Incomplete
> Status in “zeitgeist” package in Ubuntu: New
>
> Bug description:
> Binary package hint: zeitgeist
>
> If zeitgeist's database (~/.local/share/zeitgeist/activity.sqlite) is
> incomplete, eg missing the events table, zeitgeist fails to run. And because
> the GUI does not report that zeitgeist faiiled to run, applications that
> rely on zeitgeist simply fail to work without any relevant reason given.
>
> I ran into this problem on upgrading an installation from Ubuntu 10.04 to
> 10.10. After the upgrade, the dockbarx applet failed to run. The error
> message from gnome-panel just said it had failed to run, and
> .xsession-errors said the child process did not report any specific error.
> Running in debug mode (ie with the command "dockbarx-factory.py
> run-in-window") gave:
>
> ERROR:dbus.proxies:Introspect error on
> :1.134:/org/gnome/zeitgeist/log/activity: dbus.exceptions.DBusException:
> org.freedesktop.DBus.Error.NoReply: Message did not receive a reply (timeout
> by message bus)
> DEBUG:dbus.proxies:Executing introspect queue due to error
> Traceback (most recent call last):
> File "/usr/bin/dockbarx_factory.py", line 26, in <module>
> import dockbarx.dockbar
> ...
> File "/usr/lib/pymodules/python2.6/dbus/proxies.py", line 140, in __call__
> **keywords)
> File "/usr/lib/pymodules/python2.6/dbus/connection.py", line 620, in
> call_blocking
> message, timeout)
> dbus.exceptions.DBusException: org.freedesktop.DBus.Error.ServiceUnknown:
> The name :1.134 was not provided by any .service files
>
> The error appeared to be a dbus error, but in fact was a problem with
> zeitgeist, which was failing to run because its database apparently was
> corrupted during the upgrade. I fixed the problem (eventually) by deleting
> the zeitgeist database file and restarting the zeitgeist-daemon manually.
>
>
> What I would expect to happen is:
>
> 1) The GUI should report that zeitgeist has failed to run.
>
> 2) Better yet, zeitgeist could create the necessary tables if its database
> is invalid or perhaps backup the old database and create a new one so it can
> run properly.
>
> It would of course be nice if dockbarx reported better error information,
> but since there are other applications that depend on zeitgeist, it would be
> good if zeitgeist could r...

Read more...

Revision history for this message
Rocko (rockorequin) wrote :

I guess so... but in the particular case I found, it wouldn't create the tables at all. The activity.sqlite file that wouldn't work at all had only two tables, uri and schema_version:

sqlite> .schema uri
CREATE TABLE uri
   (id INTEGER PRIMARY KEY, value VARCHAR UNIQUE);
CREATE UNIQUE INDEX uri_value ON uri(value);

sqlite> .schema schema_version
CREATE TABLE schema_version
   (schema VARCHAR PRIMARY KEY ON CONFLICT REPLACE, version INT);

uri was empty and schema_version contained just "core|1".

However, this looks like the correct format for uri and schema_version according to my current (working) activity.sqlite file.

Revision history for this message
Seif Lotfy (seif) wrote :

Do you mind attaching the broken activity.sql file for us to test with?

Revision history for this message
Rocko (rockorequin) wrote :

Sorry, the file doesn't exist any more, it was on another user's PC. Does creating a file with the SQL above help in recreating it?

Revision history for this message
Seif Lotfy (seif) wrote :

I manged to replicate this manually creating an invalid structure. Thank you for the bug. We are on it.

Changed in zeitgeist:
status: Incomplete → Confirmed
assignee: nobody → Seif Lotfy (seif)
milestone: none → 0.7.0
status: Confirmed → Triaged
importance: Undecided → Low
Changed in zeitgeist (Ubuntu):
assignee: nobody → Seif Lotfy (seif)
status: New → Confirmed
Revision history for this message
Seif Lotfy (seif) wrote :
Download full text (3.8 KiB)

To replicate this
1) Backup the activity.sqlite DB
2) restart the daemon
3) while its saying "DEBUG:zeitgeist.sql:Updating sql schema" kill the
daemon

On Fri, Nov 12, 2010 at 8:13 PM, Seif Lotfy <email address hidden>wrote:

> I manged to replicate this manually creating an invalid structure. Thank
> you for the bug. We are on it.
>
> ** Changed in: zeitgeist
> Status: Incomplete => Confirmed
>
> ** Changed in: zeitgeist
> Assignee: (unassigned) => Seif Lotfy (seif)
>
> ** Changed in: zeitgeist
> Milestone: None => 0.7.0
>
> ** Changed in: zeitgeist
> Status: Confirmed => Triaged
>
> ** Changed in: zeitgeist
> Importance: Undecided => Low
>
> ** Changed in: zeitgeist (Ubuntu)
> Assignee: (unassigned) => Seif Lotfy (seif)
>
> ** Changed in: zeitgeist (Ubuntu)
> Status: New => Confirmed
>
> --
> zeitgeist fails to run if its database structure is not complete
> https://bugs.launchpad.net/bugs/660307
> You received this bug notification because you are a bug assignee.
>
> Status in Zeitgeist Framework: Triaged
> Status in “zeitgeist” package in Ubuntu: Confirmed
>
> Bug description:
> Binary package hint: zeitgeist
>
> If zeitgeist's database (~/.local/share/zeitgeist/activity.sqlite) is
> incomplete, eg missing the events table, zeitgeist fails to run. And because
> the GUI does not report that zeitgeist faiiled to run, applications that
> rely on zeitgeist simply fail to work without any relevant reason given.
>
> I ran into this problem on upgrading an installation from Ubuntu 10.04 to
> 10.10. After the upgrade, the dockbarx applet failed to run. The error
> message from gnome-panel just said it had failed to run, and
> .xsession-errors said the child process did not report any specific error.
> Running in debug mode (ie with the command "dockbarx-factory.py
> run-in-window") gave:
>
> ERROR:dbus.proxies:Introspect error on
> :1.134:/org/gnome/zeitgeist/log/activity: dbus.exceptions.DBusException:
> org.freedesktop.DBus.Error.NoReply: Message did not receive a reply (timeout
> by message bus)
> DEBUG:dbus.proxies:Executing introspect queue due to error
> Traceback (most recent call last):
> File "/usr/bin/dockbarx_factory.py", line 26, in <module>
> import dockbarx.dockbar
> ...
> File "/usr/lib/pymodules/python2.6/dbus/proxies.py", line 140, in __call__
> **keywords)
> File "/usr/lib/pymodules/python2.6/dbus/connection.py", line 620, in
> call_blocking
> message, timeout)
> dbus.exceptions.DBusException: org.freedesktop.DBus.Error.ServiceUnknown:
> The name :1.134 was not provided by any .service files
>
> The error appeared to be a dbus error, but in fact was a problem with
> zeitgeist, which was failing to run because its database apparently was
> corrupted during the upgrade. I fixed the problem (eventually) by deleting
> the zeitgeist database file and restarting the zeitgeist-daemon manually.
>
>
> What I would expect to happen is:
>
> 1) The GUI should report that zeitgeist has failed to run.
>
> 2) Better yet, zeitgeist could create the necessary tables if its database
> is invalid or perhaps backup the old database and create a new one so it can
> run properly.
>
> It would of course ...

Read more...

Revision history for this message
Seif Lotfy (seif) wrote :
Download full text (4.2 KiB)

http://paste.ubuntu.com/530852/ is a start of the fix

On Fri, Nov 12, 2010 at 8:42 PM, Seif Lotfy <email address hidden> wrote:

> To replicate this
> 1) Backup the activity.sqlite DB
> 2) restart the daemon
> 3) while its saying "DEBUG:zeitgeist.sql:Updating sql schema" kill the
> daemon
>
> On Fri, Nov 12, 2010 at 8:13 PM, Seif Lotfy <email address hidden>wrote:
>
>> I manged to replicate this manually creating an invalid structure. Thank
>> you for the bug. We are on it.
>>
>> ** Changed in: zeitgeist
>> Status: Incomplete => Confirmed
>>
>> ** Changed in: zeitgeist
>> Assignee: (unassigned) => Seif Lotfy (seif)
>>
>> ** Changed in: zeitgeist
>> Milestone: None => 0.7.0
>>
>> ** Changed in: zeitgeist
>> Status: Confirmed => Triaged
>>
>> ** Changed in: zeitgeist
>> Importance: Undecided => Low
>>
>> ** Changed in: zeitgeist (Ubuntu)
>> Assignee: (unassigned) => Seif Lotfy (seif)
>>
>> ** Changed in: zeitgeist (Ubuntu)
>> Status: New => Confirmed
>>
>> --
>> zeitgeist fails to run if its database structure is not complete
>> https://bugs.launchpad.net/bugs/660307
>> You received this bug notification because you are a bug assignee.
>>
>> Status in Zeitgeist Framework: Triaged
>> Status in “zeitgeist” package in Ubuntu: Confirmed
>>
>> Bug description:
>> Binary package hint: zeitgeist
>>
>> If zeitgeist's database (~/.local/share/zeitgeist/activity.sqlite) is
>> incomplete, eg missing the events table, zeitgeist fails to run. And because
>> the GUI does not report that zeitgeist faiiled to run, applications that
>> rely on zeitgeist simply fail to work without any relevant reason given.
>>
>> I ran into this problem on upgrading an installation from Ubuntu 10.04 to
>> 10.10. After the upgrade, the dockbarx applet failed to run. The error
>> message from gnome-panel just said it had failed to run, and
>> .xsession-errors said the child process did not report any specific error.
>> Running in debug mode (ie with the command "dockbarx-factory.py
>> run-in-window") gave:
>>
>> ERROR:dbus.proxies:Introspect error on
>> :1.134:/org/gnome/zeitgeist/log/activity: dbus.exceptions.DBusException:
>> org.freedesktop.DBus.Error.NoReply: Message did not receive a reply (timeout
>> by message bus)
>> DEBUG:dbus.proxies:Executing introspect queue due to error
>> Traceback (most recent call last):
>> File "/usr/bin/dockbarx_factory.py", line 26, in <module>
>> import dockbarx.dockbar
>> ...
>> File "/usr/lib/pymodules/python2.6/dbus/proxies.py", line 140, in
>> __call__
>> **keywords)
>> File "/usr/lib/pymodules/python2.6/dbus/connection.py", line 620, in
>> call_blocking
>> message, timeout)
>> dbus.exceptions.DBusException: org.freedesktop.DBus.Error.ServiceUnknown:
>> The name :1.134 was not provided by any .service files
>>
>> The error appeared to be a dbus error, but in fact was a problem with
>> zeitgeist, which was failing to run because its database apparently was
>> corrupted during the upgrade. I fixed the problem (eventually) by deleting
>> the zeitgeist database file and restarting the zeitgeist-daemon manually.
>>
>>
>> What I would expect to happen is:
>>
>> 1) The GUI should report that zeitgeist ha...

Read more...

Revision history for this message
Seif Lotfy (seif) wrote :

So my take on this would be to check every table upon startup for validity.
This requires however that we know what current scheme the db is on how how the structure would look like.
This means we need to store the table structures and compare them on startup.
We will be confronted with two scenarios:
1) we are in scheme 0 or 1 and a new zeitgeist with version 2 is installed, we will have to check for 0 or 1 first before trying to upgrade to scheme 2.
2) our current DB is in scheme 2 however is currupt. We will need to rebuild.

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote : Re: [Zeitgeist] [Bug 660307] Re: zeitgeist fails to run if its database structure is not complete

However noble it may be I don't think we stand a realistic chance of
implementing a stable "repair" function if the DB corrupts at an
undefined point in the upgrade process. There are just *way* too many
variables if we have fx. 4 different DB schemes that can all intermix
and corrupt in different ways.

I'd rather we simply made the N -> N+1 transition more reliable. A few ideas:

 a) Implementing a validation step after an upgrade has completed and
taking appropriate measures if it fails (fx. by using c) below)
 b) Back up the DB before starting the upgrade. I recommend that we
back up to a gzipped ttl file because that can be used in a streaming
(aka appending) way
 c) Implement "recover from backup" (see point point b))

These are all relatively simple measures that can be properly unit
tested and are much more limited in the ways they can fail

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Oh, and one point in I forgot to add in my previous comment - I don't
want to validate the DB on startup. That's just way too expensive -
and whas in fact one of the primary points when I implemented DB
versioning.

One thing we could do to easily, and almost freely, detect when we are
killed during an upgrade would be to set the core schema version to -1
just before we start the upgrade, and then set it to the correct
version if the upgrade completes correctly. Then if we start up and
see DB schema version -1 we know that we have to recover from backup.

Revision history for this message
Seif Lotfy (seif) wrote :

OK I think we can start by always backing up while doing an upgrade from n -> n+1.
I don't know what a gzipped ttl file is so why not just copy the activity.sqlite to another back_activity.sqlite?
After upgrade is done we need to check the DB if its properly built. If not we try to recover from the backup DB and upgrade again.
I also like the idea of setting the schema to -1 in case of a kill.

Seif Lotfy (seif)
Changed in zeitgeist:
milestone: 0.7.0 → 0.8.0
Changed in zeitgeist:
milestone: 0.8.0 → none
Revision history for this message
J.P. Lacerda (jplacerda) wrote :

This has been (mostly) fixed.
I think that there are some possible improvements regarding schema upgrading / database creation.

As Seif said in #9, to replicate this:
1) Backup activity.sqlite
2) Restart the daemon
3) While "Updating sql schema" kill the daemon

The problem, however, is that this bug does not manifest in the way described above. What we are doing in the above replication is creating a new database (this is checked by new_database = not os.path.exists(file_path)), and interrupting the process half-way through. If we then apply _do_schema_upgrade on the corrupted database it will fail, as there will be column names that already exist. In short, a corrupted upgrade != a corrupted database creation, and we *cannot* mix the two together. A quick fix is to apply _do_schema_upgrade on both cases, and take a duplicate column name error as a symptom of a corrupted database creation -- this allows us to delete the database, so that the next time zeitgeist-daemon is run it is re-created.

I will be including some test cases soon :)

Revision history for this message
J.P. Lacerda (jplacerda) wrote :

Things have changed slightly:

I implemented Mikkel's suggestion of setting the version to -1 just before an upgrade, and back to it's correct value afterwards. This is only useful if an upgrade is killed: the upgrade can also fail due to a raised OperationalError. If that is the case (regardless of whether or not the corruption comes from a bad database creation), we allow the code to fall through the statements in create_db, which safely restores the database.

Changed in zeitgeist:
status: Triaged → Fix Committed
Changed in zeitgeist (Ubuntu):
importance: Undecided → Low
Changed in zeitgeist:
assignee: Seif Lotfy (seif) → J.P. Lacerda (jplacerda)
Changed in zeitgeist:
milestone: none → 0.8.1
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zeitgeist - 0.8.1-1ubuntu1

---------------
zeitgeist (0.8.1-1ubuntu1) oneiric; urgency=low

  * Merge with debian, remaining change:
    - Build-dep on raptor2-utils instead of raptor-utils

zeitgeist (0.8.1-1) unstable; urgency=low

  * New upstream release. Some of the changes are:
     - Database backup before attempting schema upgrades (LP: #660307).
     - Ensure the engine doesn't attempt to close twice in a row (LP: #793714).
     - Improve the Python API's behavior when Zeitgeist is restarted.
     - Added support for registering custom Event and Subject subclasses with
       ZeitgeistClient (LP: #799199), and added some new API methods.
  * debian/control:
     - Fix typo in the description (hold -> held). Thanks to Travis Reddell.
 -- Didier Roche <email address hidden> Thu, 07 Jul 2011 18:49:45 +0200

Changed in zeitgeist (Ubuntu):
status: Confirmed → Fix Released
Changed in zeitgeist:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.