Comment 23 for bug 131983

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

Responding to Jamie McCracken:

Point 4: yes, it's correct that SQLite writes a journal file and two calls to fsync() to maintain ACID properties. In fact, it's necessary just to make sure the database doesn't get corrupted. It serves the same purpose as ext3's journal does for the filesystem itself.

Each fsync() to the disk is a huge latency overhead, and a huge effect on other processes doing any I/O (reads or writes) when there's a constant trickly of database writes to the disk.

To avoid those, the application (i.e. Tracker), not the database, must be changed to aggregate many writes into a single transaction. That's what I mean by not doing as many commits; I mean batching writes into larger transactions. The fsync()s are only done per transaction, not per write.

For some applications you can't batch writes like that, but for Tracker you obviously could.

Point 6: The rollback journal must be in a place where it survives crashes, otherwise it is completely ineffective and the database file can become corrupt. It should be on the same filesystem as the database file. When it's in /tmp, it doesn't protect against crashes and power failures.

To be precise: If trackerd is killed and restarted, the rollback journal in /tmp will be effective because it's still there. After a reboot or power failure, if /tmp is in RAM (as it is nowadays), then the rollback journal will be gone, so the database can be corrupted by rebooting or power failures. Even "clean" reboots, if they kill a running trackerd and it's not able to commit it's last transaction in time, or if trackerd has been manually killed earlier, and then a clean reboot. If /tmp is on disk, a reboot might still clean it, so the journal might as well be in RAM.

In other words, the journal needs to be in the same directory as the database in practice, otherwise it doesn't effectively prevent the database from low level corruption. That's similar to filesystem corruption: garbage structures, not just wrong data in the database.

Of course the journal isn't perfect for the reasons given on that SQLite page: Linux doesn't always do the right thing with fsync(), and neither do some IDE drives. But on most systems with 2.6 kernels it should work.