Merge into 0.8-python : add_cache_for_get_events : Code : Zeitgeist Framework

Reviewer	Review Type	Date Requested	Status
Mikkel Kamstrup Erlandsen		2010-12-01	Needs Information on 2010-12-12
Review via email: mp+42327@code.launchpad.net

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2010-12-01:

#

So we go from 18ms to 2ms. That's a speedup of a factor of 9, quite nice :-)

As discussed on IRC we need to limit the cache somehow of course. The old LRU cache comes to mind. Also, we need to be relatively cautious here running the danger of ballooning our memory consumption.

I'd like to see some more elaborate speed tests (Markus' multicolor diagram ftw!) as well using a bounded cache implementation of some sort. Also an assessment of the memory impact of this.

And please remember to revive the unit tests for the LRU cache if we do choose to bring it back :-)

I really like where this is going :-)

review: Needs Fixing

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2010-12-01

1644. By Seif Lotfy on 2010-12-01: add rewritten LRU cache with test cases

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-01:

#

I rewrote the LRU cache since the old one gave me around 0.6 ms
and the new one 0.3 ms

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2010-12-01:

#

Something is fishy in LRUCache.__getitem__ and __setitem__. You do a linear search for the id in self._id_seq by using the 'in' operator, and another linear search by using 'remove' afaik. So they become O(2N) in the worst case which is not a very desirable property for a cache ;-)

Unless of course the deque is indexed internally - but that would surprise me. In this case my comments are moot.

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-01:

#

I am not using deque :P
But I can push it

On Wed, Dec 1, 2010 at 4:46 PM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:

> Something is fishy in LRUCache.__getitem__ and __setitem__. You do a linear
> search for the id in self._id_seq by using the 'in' operator, and another
> linear search by using 'remove' afaik. So they become O(2N) in the worst
> case which is not a very desirable property for a cache ;-)
>
> Unless of course the deque is indexed internally - but that would surprise
> me. In this case my comments are moot.
> --
>
> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-01:

#

scratch that i did not understand your comment properly...
I reduced the "in" statement by just using self._map.has_key(key)

On Wed, Dec 1, 2010 at 4:48 PM, Seif Lotfy <email address hidden> wrote:

> I am not using deque :P
> But I can push it
>
>
> On Wed, Dec 1, 2010 at 4:46 PM, Mikkel Kamstrup Erlandsen <
> <email address hidden>> wrote:
>
>> Something is fishy in LRUCache.__getitem__ and __setitem__. You do a
>> linear search for the id in self._id_seq by using the 'in' operator, and
>> another linear search by using 'remove' afaik. So they become O(2N) in the
>> worst case which is not a very desirable property for a cache ;-)
>>
>> Unless of course the deque is indexed internally - but that would surprise
>> me. In this case my comments are moot.
>> --
>>
>> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
>> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>>
>
>
>
> --
> This is me doing some advertisement for my blog http://seilo.geekyogre.com
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2010-12-01

1645. By Seif Lotfy on 2010-12-01: reduced the 'in' statement by replacing it with self._map.has_key(key)
1646. By Seif Lotfy on 2010-12-01: made use of deque
1647. By Seif Lotfy on 2010-12-01: no need to use deque so replace popleft with del at position 0

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-07:

#

Guys I reworked it.
If I recall properly I think Markus said its ok like this, since its not a linear search. I am not sure but please reconsider since it really brings a big improvement to the table.

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-07:

#

OK I don't think Markus said its not a linear search.
I also asked in the python channel and

----
<seiflotfy> is it a linear search to find the item i want to remove
<papna> seiflotfy: Yes.
<seiflotfy> whoa
<papna> seiflotfy: Also, don't use list.remove
<papna> seiflotfy: Decribe your data.
<seiflotfy> i am implementing an lru cache
<seiflotfy> basically a list of ids in a list
<seiflotfy> then i remove a key when its used
<seiflotfy> and append it to the end of the list
<nosklo> seiflotfy: dict
<seiflotfy> yeah
<seiflotfy> i will sill have to go through the dict
<seiflotfy> ot change the value of each key
<seiflotfy> which is still a linear seach AFAIK
<papna> seiflotfy: Yeah, a list isn't appropriate for that.
* zobbo (~ian@88.211.55.77) has joined #python
<papna> seiflotfy: One approach is to keep a linked list and a dict.
<seiflotfy> papna, i did
<seiflotfy> but the list approach was faster
<papna> seiflotfy:
http://bitbucket.org/mikegraham/remember/src/af922eac912c/remember/dicts.py#cl-138
<seiflotfy> papna, i can show u 2 code smaples
<papna> seiflotfy: For realistic n, sure.
-----
I implemented his proposal. And in real life it my list solution is 2x
faster at mimimum.

On Tue, Dec 7, 2010 at 3:37 PM, Seif Lotfy <email address hidden> wrote:

> Guys I reworked it.
> If I recall properly I think Markus said its ok like this, since its not a
> linear search. I am not sure but please reconsider since it really brings a
> big improvement to the table.
> --
>
> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2010-12-07:

#

In lrucache.py:
__delitem__ must also update id_seq

Also can LRUCache maybe log a warning if you create it with maxsize > 1000 or something. This implementation scales very badly, although as you say, it may well be faster for small caches.

Lastly, have you actually benchmarked this with 1000 items in the cache? Because my guess says that that should be around the tipping point where the linear searches starts to make an impact...

Maybe reduce the cache size to 500 just to feel a little safer?

review: Needs Fixing

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2010-12-07

1648. By Seif Lotfy on 2010-12-07: added old lru cache

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-07:

#

I did some testing an with a cache of 700 events my solution starts becoming
slower thus I removed it for a more consitent lru cache (the old one). This
removes 50% off the get_events

On Tue, Dec 7, 2010 at 8:01 PM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:

> Review: Needs Fixing
> In lrucache.py:
> __delitem__ must also update id_seq
>
> Also can LRUCache maybe log a warning if you create it with maxsize > 1000
> or something. This implementation scales very badly, although as you say, it
> may well be faster for small caches.
>
> Lastly, have you actually benchmarked this with 1000 items in the cache?
> Because my guess says that that should be around the tipping point where the
> linear searches starts to make an impact...
>
> Maybe reduce the cache size to 500 just to feel a little safer?
> --
>
> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message

Markus Korn (thekorn) wrote on 2010-12-08:

#

I don't think we want to cache events at all.
So far we have a simple policy, we are only caching objects we completely understand, where we know their size and how big (memory wise) they are (e.g mimetypes, interpretation, manifestation). We are not caching uris (because there can be an indefinite number of them), payload (because payload is unique for each event, and has an undefined max size), and so on.
For me there are three reasons why I don't want to cache events:
* we can't control the in-memory size of this cache, something like LRUCache(50) does not mean it's using less memory than LRUCache(500), simply because if only one event with a 5 GB (*) payload is in one cache the memory usage of the zg daemon will be HUGE.
* by adding an event cache we are indirectly caching objects like payload, uri, etc. we don't want to cache, for above mentioned reasons.
* I don't think we are able to find a performant Cache implementation for all possible clients. Just imaging we have two clients accessing the daemon, one which only gets a few events at a time (e.g. unity) and a one which is frequently getting *all* events (something like the GAJ histogram [not sure if this is still the case, but you should get my point ;)]) While any cache would do a good job for unity, only one GetEvents in GAJ will destroy/shuffle the complete cache in a way that the performance win for unity gets destroyed.

So I suggest two alternative solutions:
* encourage clients to maintain their own caches, and use FindEventIds() + GetEvents[(all ids whih are not in client's cache)] as much as possible. If this turns out to have performance issues, we should improve that
* if you really want a distributed cache, write an event-cache-daemon which only has one public API method "GetEvents" and which works like a cached proxy, similar to memcached

(*) random high number

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-08:

#

It is easy to control the size of the cache if we don't cache events with a
payload.
Usually most applications are interested in the latest events. Ofcourse AJ
breaks that but still, u might be running 3 or 4 apps that are usually
interested in the latest activities.
AFAIK zeitgeist does not really free uri memory at all. So a cache is not an
issue here.

On Wed, Dec 8, 2010 at 9:24 AM, Markus Korn <email address hidden> wrote:

> I don't think we want to cache events at all.
> So far we have a simple policy, we are only caching objects we completely
> understand, where we know their size and how big (memory wise) they are (e.g
> mimetypes, interpretation, manifestation). We are not caching uris (because
> there can be an indefinite number of them), payload (because payload is
> unique for each event, and has an undefined max size), and so on.
> For me there are three reasons why I don't want to cache events:
> * we can't control the in-memory size of this cache, something like
> LRUCache(50) does not mean it's using less memory than LRUCache(500), simply
> because if only one event with a 5 GB (*) payload is in one cache the memory
> usage of the zg daemon will be HUGE.
> * by adding an event cache we are indirectly caching objects like payload,
> uri, etc. we don't want to cache, for above mentioned reasons.
> * I don't think we are able to find a performant Cache implementation for
> all possible clients. Just imaging we have two clients accessing the daemon,
> one which only gets a few events at a time (e.g. unity) and a one which is
> frequently getting *all* events (something like the GAJ histogram [not sure
> if this is still the case, but you should get my point ;)]) While any cache
> would do a good job for unity, only one GetEvents in GAJ will
> destroy/shuffle the complete cache in a way that the performance win for
> unity gets destroyed.
>
> So I suggest two alternative solutions:
> * encourage clients to maintain their own caches, and use FindEventIds() +
> GetEvents[(all ids whih are not in client's cache)] as much as possible. If
> this turns out to have performance issues, we should improve that
> * if you really want a distributed cache, write an event-cache-daemon
> which only has one public API method "GetEvents" and which works like a
> cached proxy, similar to memcached
>
>
> (*) random high number
> --
>
> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

It is easy to control the size of the cache if we don't cache events with a
payload.
Usually most applications are interested in the latest events. Ofcourse AJ
breaks that but still, u might be running 3 or 4 apps that are usually
interested in the latest activities.
AFAIK zeitgeist does not really free uri memory at all. So a cache is not an
issue here.

On Wed, Dec 8, 2010 at 9:24 AM, Markus Korn <thekorn@gmx.de> wrote:

> I don't think we want to cache events at all.
> So far we have a simple policy, we are only caching objects we completely
> understand, where we know their size and how big (memory wise) they are (e.g
> mimetypes, interpretation, manifestation). We are not caching uris (because
> there can be an indefinite number of them), payload (because payload is
> unique for each event, and has an undefined max size), and so on.
> For me there are three reasons why I don't want to cache events:
>  * we can't control the in-memory size of this cache, something like
> LRUCache(50) does not mean it's using less memory than LRUCache(500), simply
> because if only one event with a 5 GB (*) payload is in one cache the memory
> usage of the zg daemon will be HUGE.
>  * by adding an event cache we are indirectly caching objects like payload,
> uri, etc. we don't want to cache, for above mentioned reasons.
>  * I don't think we are able to find a performant Cache implementation for
> all possible clients. Just imaging we have two clients accessing the daemon,
> one which only gets a few events at a time (e.g. unity) and a one which is
> frequently getting *all* events (something like the GAJ histogram [not sure
> if this is still the case, but you should get my point ;)]) While any cache
> would do a good job for unity, only one GetEvents in GAJ will
> destroy/shuffle the complete cache in a way that the performance win for
> unity gets destroyed.
>
> So I suggest two alternative solutions:
>  * encourage clients to maintain their own caches, and use FindEventIds() +
> GetEvents[(all ids whih are not in client's cache)] as much as possible. If
> this turns out to have performance issues, we should improve that
>  * if you really want a distributed cache, write an event-cache-daemon
> which only has one public API method "GetEvents" and which works like a
> cached proxy, similar to memcached
>
>
> (*) random high number
> --
>
> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>

-- 
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message

Markus Korn (thekorn) wrote on 2010-12-08:

#

A quick & dirty graph comparing lp:zeitgeist with this branch is at [0]

I've benchmarked engine.get_events() on an activity log with 50k events for different batch sizes (means length of arguments to get_events())

I'm running each query two times, the first time the cache is cold (marked by "_C") and the second time the cache is HOT ("_H")

Results:
* there is no difference between each run for trunk, which is expected
* cold cache (getting an event the first time) is always slower in this branch
* running the queries with hot cache is fast for small batch sizes
* for huge batch sizes no caching is always faster

[0] https://dl.dropbox.com/u/174479/cache_benchmarks.svg

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-08:

#

Awesome
But what is a batch?

On Wed, Dec 8, 2010 at 12:33 PM, Markus Korn <email address hidden> wrote:

> A quick & dirty graph comparing lp:zeitgeist with this branch is at [0]
>
> I've benchmarked engine.get_events() on an activity log with 50k events for
> different batch sizes (means length of arguments to get_events())
>
> I'm running each query two times, the first time the cache is cold (marked
> by "_C") and the second time the cache is HOT ("_H")
>
> Results:
> * there is no difference between each run for trunk, which is expected
> * cold cache (getting an event the first time) is always slower in this
> branch
> * running the queries with hot cache is fast for small batch sizes
> * for huge batch sizes no caching is always faster
>
> [0] https://dl.dropbox.com/u/174479/cache_benchmarks.svg
> --
>
> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message

Markus Korn (thekorn) wrote on 2010-12-08:

#

batchsize is number of ids in the get_events() argument,
so,
get_events([1, 2]) -> batchsize == 2
get_events(range(80)) -> batchsize == 80

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-08:

#

Thanks for the detailed test. For that I have a solution that I know some of
you might disagree with but I would like to put it out for discussion:
We can create a filter in the cache module to not accept events with
payloads != null
Also we can avoid jumping to look into the cache for batchsizes that exceed
"n" ids.
This is just brainstorming here, because I see a small benefit of LRU cache
for small batches.

On Wed, Dec 8, 2010 at 12:46 PM, Markus Korn <email address hidden> wrote:

> batchsize is number of ids in the get_events() argument,
> so,
> get_events([1, 2]) -> batchsize == 2
> get_events(range(80)) -> batchsize == 80
> --
>
> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-08:

#

I ran some test with bigger cache size (10k) and the change was noticeable that the cache gave a big improvement when Hot.
So I ran some tests and observed that if "cache size < batch size" slowdowns might occur.
My solution for this is to ignore "caching and using cache" for batch_sizes that exceed the cache_size. 1 big batch event can just destroy the cache. Thus using something like batch_size_tolerance = cache_size/2 would allow us to guarantee that one call won't mess up the cache and lead to slowdowns.

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-08:

#

Just updated the merge request with some batch_size_tolerance modification

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2010-12-08

1649. By Seif Lotfy on 2010-12-08: add batch_tolerance support

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-10:

#

Here is a csv file based on markus script (for some reason plotting doesnt work for me)
You can see the resuls are pretty good
basically unitl batch size 1000 the hot cache is always faster, then it is just like the noncached version in trunk

---
;50;500;950;1000;1050;2000;5000;10000
zeitgeist_C;0,002746;0,024526;0,046756;0,048572;0,051662;0,098936;0,245838;0,518534
zeitgeist_H;0,0023;0,023295;0,045963;0,04813;0,052496;0,097886;0,252786;0,493975
add_cache_for_get_events_C;0,003201;0,029415;0,054442;0,060075;0,05168;0,098671;0,24538;0,509424
add_cache_for_get_events_H;0,000868;0,00736;0,014856;0,013968;0,051253;0,09653;0,252904;0,485511
---

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2010-12-10:

#

I think Markus has a very good observation. If we are serving clients that request 10k events and other clients that request 10-100 events then It's really hard to get a good caching scheme. The big requests will just always blow the cache (and a cache size of 10k is a no-go by me).

Adding some logic on top to control when and how the cache is applied is a logical next step. It does however set off my internal alarm system. Unless very carefully tested and maintained it might very well end up slowing us down and giving clients unreliable response times because the cache logic is too hard to predict.

My gut tells me that if we too much logic for cache management then we are doing it wrong. It should be simple.

Regarding the cache size, I'm really only worried about the payloads. If we have a cache size of 500 and URIs have a typical length of 100 letters then we'll use 50.000 bytes, ~50kb, for the URIs. So wrt the payloads let's assume that someone stored 100k XML blobs with all events. That gives us 500*100k ~= 50mb. That is too much. Either not caching events with payloads, or fetching payloads lazily could fix this though.

Revision history for this message

Markus Korn (thekorn) wrote on 2010-12-10:

#

Btw, there is another way to blow up the cache: having events with thousands of subjects, so we should limit the number of subjects for cached events too.
Also, please do not hardcode the limits
([...]34 + if len(ids) > 1000:[...])
please use a modul-wide variable, or maybe eent set the limits in the config object.

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-10:

#

Guys I added a plot of the current state of my "logic" for the cache
http://dl.dropbox.com/u/7162902/cache_plot.svg

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2010-12-10

1650. By Seif Lotfy on 2010-12-10: added ZEITGEIST_CACHE_SIZE env variable and a check on payloads

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2010-12-10:

#

Can you please add a code comment explaining why we don't cache events with payloads, it would seem arbitrary to the uninitiated.

Other than that it looks good to me. If Markus approves feel free to merge.

review: Approve

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2010-12-10:

#

Oh, btw. I couldn't see from the diff and I don't have time to check the full code: Is the cache also used in FindEvents()? That would be where we'd get the biggest win I think.

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-10:

#

It is not used in find_events yet but I have a hint of a solution to make use of it :)

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2010-12-11

1651. By Seif Lotfy on 2010-12-10: added comment on why we dont cache events with payloads
added to lrucache.py to Makefile.am
1652. By Seif Lotfy on 2010-12-10: fix statement in get_events where looking if the event has a payload
1653. By Seif Lotfy on 2010-12-11: replaced lrucache

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2010-12-12:

#

I see you replaced the LRUCache implementation? Is this your own implementation, it doesn't look like your code style. I'm worried about licensing issues because you removed the LGPL header.

That said it makes a lot of sense to leverage the native collections module - so in principle I'm +1 for this approach.

review: Needs Information

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-12:

#

Hey,
Ofcourse its not my code :P
http://pypi.python.org/pypi/lrucache/0.2
<http://pypi.python.org/pypi/lrucache/0.2>AFAIK its AFL 1.2 License. I did
not see it till now so this might be an issue.
http://en.wikipedia.org/wiki/Academic_Free_License

On Sun, Dec 12, 2010 at 8:04 AM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:

> Review: Needs Information
> I see you replaced the LRUCache implementation? Is this your own
> implementation, it doesn't look like your code style. I'm worried about
> licensing issues because you removed the LGPL header.
>
> That said it makes a lot of sense to leverage the native collections module
> - so in principle I'm +1 for this approach.
> --
>
> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message

Markus Korn (thekorn) wrote on 2010-12-12:

#

@Seif, what's the reason for using another cache implementation?

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-12:

#

@Markus
1) The previous LRU has something broken in _move_item_to_end for some
reason. (*)
2) It's faster

(*) This is the Error I get. It can be replicated by opening and closing GAJ
several times.
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/connection.py", line 576, in
msg_reply_handler
    reply_handler(*message.get_args_list(**get_args_opts))
  File "/home/seif/Projects/gnome-activity-journal/src/activity_widgets.py",
line 1303, in do_set
    self.set_items(objs)
  File "/home/seif/Projects/gnome-activity-journal/src/activity_widgets.py",
line 1310, in set_items
    box = CategoryBox(None, items, True, itemoff=4)
  File "/home/seif/Projects/gnome-activity-journal/src/activity_widgets.py",
line 315, in __init__
    + " " +
str((time.localtime(int(event_structs[0].event.timestamp)/1000).tm_hour)/8)
+ " " + str(category)
  File "/home/seif/Projects/gnome-activity-journal/src/store.py", line 108,
in __get__
    value = self.method(instance)
  File "/home/seif/Projects/gnome-activity-journal/src/store.py", line 125,
in event
    events = INTERFACE.GetEvents([self.id])
  File
"/home/seif/Projects/gnome-activity-journal/../zeitgeist/zeitgeist/client.py",
line 88, in _ProxyMethod
    return self._disconnection_safe(lambda:
  File
"/home/seif/Projects/gnome-activity-journal/../zeitgeist/zeitgeist/client.py",
line 68, in _disconnection_safe
    return meth()
  File
"/home/seif/Projects/gnome-activity-journal/../zeitgeist/zeitgeist/client.py",
line 89, in <lambda>
    getattr(self.__iface, name)(*args, **kwargs))
  File "/usr/lib/pymodules/python2.6/dbus/proxies.py", line 140, in __call__
    **keywords)
  File "/usr/lib/pymodules/python2.6/dbus/connection.py", line 620, in
call_blocking
    message, timeout)
dbus.exceptions.DBusException: org.freedesktop.DBus.Python.AttributeError:
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in
_message_cb
    retval = candidate_method(self, *args, **keywords)
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/engine/remote.py",
line 86, in GetEvents
    sender=sender))
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/engine/main.py",
line 202, in get_events
    event = self._event_cache[id]
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/lrucache.py",
line 93, in __getitem__
    self._move_item_to_end(item)
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/lrucache.py",
line 115, in _move_item_to_end
    if item == self._list_start:
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/lrucache.py",
line 38, in __cmp__
    return cmp(self.id, item.id)
AttributeError: 'NoneType' object has no attribute 'id'

@Markus
1) The previous LRU has something broken in _move_item_to_end for some
reason. (*)
2) It's faster

(*) This is the Error I get. It can be replicated by opening and closing GAJ
several times.
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/connection.py", line 576, in
msg_reply_handler
    reply_handler(*message.get_args_list(**get_args_opts))
  File "/home/seif/Projects/gnome-activity-journal/src/activity_widgets.py",
line 1303, in do_set
    self.set_items(objs)
  File "/home/seif/Projects/gnome-activity-journal/src/activity_widgets.py",
line 1310, in set_items
    box = CategoryBox(None, items, True, itemoff=4)
  File "/home/seif/Projects/gnome-activity-journal/src/activity_widgets.py",
line 315, in __init__
    + " " +
str((time.localtime(int(event_structs[0].event.timestamp)/1000).tm_hour)/8)
+ " " + str(category)
  File "/home/seif/Projects/gnome-activity-journal/src/store.py", line 108,
in __get__
    value = self.method(instance)
  File "/home/seif/Projects/gnome-activity-journal/src/store.py", line 125,
in event
    events = INTERFACE.GetEvents([self.id])
  File
"/home/seif/Projects/gnome-activity-journal/../zeitgeist/zeitgeist/client.py",
line 88, in _ProxyMethod
    return self._disconnection_safe(lambda:
  File
"/home/seif/Projects/gnome-activity-journal/../zeitgeist/zeitgeist/client.py",
line 68, in _disconnection_safe
    return meth()
  File
"/home/seif/Projects/gnome-activity-journal/../zeitgeist/zeitgeist/client.py",
line 89, in <lambda>
    getattr(self.__iface, name)(*args, **kwargs))
  File "/usr/lib/pymodules/python2.6/dbus/proxies.py", line 140, in __call__
    **keywords)
  File "/usr/lib/pymodules/python2.6/dbus/connection.py", line 620, in
call_blocking
    message, timeout)
dbus.exceptions.DBusException: org.freedesktop.DBus.Python.AttributeError:
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in
_message_cb
    retval = candidate_method(self, *args, **keywords)
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/engine/remote.py",
line 86, in GetEvents
    sender=sender))
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/engine/main.py",
line 202, in get_events
    event = self._event_cache[id]
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/lrucache.py",
line 93, in __getitem__
    self._move_item_to_end(item)
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/lrucache.py",
line 115, in _move_item_to_end
    if item == self._list_start:
  File
"/home/seif/Projects/add_cache_for_get_events/zeitgeist/../_zeitgeist/lrucache.py",
line 38, in __cmp__
    return cmp(self.id, item.id)
AttributeError: 'NoneType' object has no attribute 'id'

Revision history for this message

Markus Korn (thekorn) wrote on 2010-12-12:

#

Seif, 1) is not unfixable ;) and I also don't see the new cache implementation as a significat speed improvement, compared to the previous version.

markus@thekorn ~/devel/zeitgeist/cache_benchmarks % cat result_small.csv
;1;2;5;10;50
../trunk_C;0,000442;0,0005326;0,0007516;0,0010958;0,0037418
../trunk_H;0,0001498;0,0002262;0,0004252;0,0007444;0,0033072
../add_cache_for_get_events_C;0,0004612;0,0005672;0,000823;0,0012218;0,0044662
../add_cache_for_get_events_H;0,0002604;0,0002824;0,0003426;0,0004424;0,0012568
/tmp/new_lru_C;0,000497;0,000592;0,00085;0,0012678;0,0045354
/tmp/new_lru_H;0,000262;0,0002828;0,0003394;0,0004176;0,0011164
markus@thekorn ~/devel/zeitgeist/cache_benchmarks % cat result_big.csv
;50;100;500;950;1000;1050;2000;3000
../trunk_C;0,0037996;0,0072462;0,0350788;0,0668402;0,0698766;0,0738612;0,1410332;0,213176
../trunk_H;0,0033512;0,006886;0,033931;0,0664788;0,0694176;0,0737804;0,1390564;0,2099196
../add_cache_for_get_events_C;0,0044968;0,0088392;0,0424282;0,0799834;0,085211;0,073756;0,142111;0,2141532
../add_cache_for_get_events_H;0,0012622;0,00226;0,0104036;0,0211476;0,0208178;0,0736068;0,1399124;0,210276
/tmp/new_lru_C;0,004724;0,0092024;0,043668;0,0835366;0,088505;0,0732062;0,141802;0,2119048
/tmp/new_lru_H;0,0011144;0,0019276;0,0087936;0,016713;0,0173434;0,0728648;0,1392448;0,222787
markus@thekorn ~/devel/zeitgeist/cache_benchmarks %

(the one in /tmp is this branch as it is right now, add_* is this branch at rev1649)

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-12:

#

Well I tried to fix it and just checking if "None" does not seem like a
solution to me. I couldn't recreate it over test cases. So I tried over
practical usage.
Unless you can fix it I think it won't hurt to use the new LRU code

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2010-12-12

1654. By Seif Lotfy on 2010-12-12: revert cache implementation for fixing

Revision history for this message

Markus Korn (thekorn) wrote on 2010-12-12:

#

[0] should fixes the issue reported by Seif in one of his comments, and adds a series of tests for it.

[0] http://paste.ubuntu.com/542815/

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-12:

#

OK applied the patch. Thank you Markus

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2010-12-12

1655. By Seif Lotfy on 2010-12-12: fix problems encountered in LRU Cache and added test cases
1656. By Seif Lotfy on 2010-12-12: updated from trunk

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-15:

#

@Markus:
the lrucache dumped out

Error from Zeitgeist engine: org.freedesktop.DBus.Python.KeyError: Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/remote.py", line 253, in FindEvents
    event_templates, storage_state, num_events, result_type, sender))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 461, in find_events
    return self._find_events(1, *args)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 451, in _find_events
    result = self.get_events(ids=[row[0] for row in result], sender=sender)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 241, in get_events
    self._event_cache[event.id] = event
  File "/usr/local/share/zeitgeist/_zeitgeist/lrucache.py", line 97, in __setitem__
    old = self.remove_eldest_item()
  File "/usr/local/share/zeitgeist/_zeitgeist/lrucache.py", line 146, in remove_eldest_item
    del self._map[old.key]
KeyError: 20320

Trying to replicate it

Revision history for this message

Seif Lotfy (seif) wrote on 2010-12-16:

#

And I just got it again

Error from Zeitgeist engine: org.freedesktop.DBus.Python.KeyError: Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/remote.py", line 253, in FindEvents
    event_templates, storage_state, num_events, result_type, sender))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 461, in find_events
    return self._find_events(1, *args)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 451, in _find_events
    result = self.get_events(ids=[row[0] for row in result], sender=sender)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 241, in get_events
    self._event_cache[event.id] = event
  File "/usr/local/share/zeitgeist/_zeitgeist/lrucache.py", line 97, in __setitem__
    old = self.remove_eldest_item()
  File "/usr/local/share/zeitgeist/_zeitgeist/lrucache.py", line 146, in remove_eldest_item
    del self._map[old.key]
KeyError: 19187

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2011-01-04

1657. By Seif Lotfy on 2010-12-15: merge with trunk
1658. By Seif Lotfy on 2011-01-04: fix key errors
1659. By Seif Lotfy on 2011-01-04: remove prints

Revision history for this message

Seif Lotfy (seif) wrote on 2011-01-07:

#

And again

Error from Zeitgeist engine: org.freedesktop.DBus.Python.KeyError: Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/remote.py", line 254, in FindEvents
    event_templates, storage_state, num_events, result_type, sender))
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 461, in find_events
    return self._find_events(1, *args)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 451, in _find_events
    result = self.get_events(ids=[row[0] for row in result], sender=sender)
  File "/usr/local/share/zeitgeist/_zeitgeist/engine/main.py", line 241, in get_events
    self._event_cache[event.id] = event
  File "/usr/local/share/zeitgeist/_zeitgeist/lrucache.py", line 97, in __setitem__
    old = self.remove_eldest_item()
  File "/usr/local/share/zeitgeist/_zeitgeist/lrucache.py", line 146, in remove_eldest_item
    del self[old.key]
  File "/usr/local/share/zeitgeist/_zeitgeist/lrucache.py", line 71, in __delitem__
    item = self._map[key]
KeyError: 23795

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2011-01-07

1660. By Seif Lotfy on 2011-01-07: replace our lru implementation with a more stable one

Revision history for this message

Seif Lotfy (seif) wrote on 2011-01-07:

#

I think we should go with the online LRU implementation since it does not crash here and I cant find the issue with ours

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2011-01-07:

#

Seif; I don't think you got the lrucache.py sources from that link because the contained link to the source is a dead and I know this package from history. It's based on a heap (not a double linked list or deque) and is dog slow.

Also the AFL is incompatible with LGPL... http://www.gnu.org/licenses/license-list.html

Revision history for this message

Seif Lotfy (seif) wrote on 2011-01-07:

#

scratch that this is the wrong link

Revision history for this message

Markus Korn (thekorn) wrote on 2011-01-07:

#

Omg, not again, please. If you want to use this lru cache implementation, please create a debian package of this module, get it into debian/ubuntu and add this package as zg dependency. but please don't copy their code into ours (btw, I would consider it as rude if someone copies my code without leaving a comment pointing to the source of the code)

Seif, I'm sure we can fix our implmentation, *if* we know what's going wrong. And from what I see you where not able to provide this information. We know there is a KeyError in __delitem__ at some point. And we know that this is reproducable by you, that's all. Can you please give us more information on the state of the cache when it crashes? has the cache reached its max size? has the event with this id ever been in the cache?

Side note: please *never* merge this branch into lp:zeitgeist, it contains too much trash in the revision history, if you think you found the best solution for such cache please prepare a new clean branch and propose this branch for merge.

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2011-01-09

1661. By Seif Lotfy on 2011-01-09: repairt cache

Revision history for this message

Markus Korn (thekorn) wrote on 2011-01-10:

#

Okidoki, seif. Can you please summarize and comment on the last change you did (against the last revision with this cache implementation). AFAICS you removed _current_id and reformated the docstrings, is this it?
Which bug in the old code were you actually fixing? is this fixing the KeyError you got several times?
Can you please write a test for the case this revision is fixing?
Thanks

Revision history for this message

Seif Lotfy (seif) wrote on 2011-01-10:

#

On Mon, Jan 10, 2011 at 10:36 AM, Markus Korn <email address hidden> wrote:

> Okidoki, seif. Can you please summarize and comment on the last change you
> did (against the last revision with this cache implementation). AFAICS you
> removed _current_id and reformated the docstrings, is this it?
> Which bug in the old code were you actually fixing? is this fixing the
> KeyError you got several times?
> Can you please write a test for the case this revision is fixing?
> Thanks
> --
>
> https://code.launchpad.net/~seif/zeitgeist/add_cache_for_get_events/+merge/42327
> You are the owner of lp:~seif/zeitgeist/add_cache_for_get_events.
>

Yes, you used an incremental counter "current_id" that was redundant since
the keys for the cache could have been done over the "event id". I have been
running the branch now since I committed it in a terminal and noKeyError. I
will retrace the issue now again since it is in the current_id. But I think
this is it.

Revision history for this message

Markus Korn (thekorn) wrote on 2011-01-10:

#

I've seen discussions on memory usage in irc today (seif complaining about this cache using >30 MB writable memory), and I think I found a bug:

20:39 < thekorn> seiflotfy: I think I know what's making the cache size bigger
                 than max_size
20:39 < thekorn> seiflotfy: can you please remove old.next = None in line
                 125 of lrucache.py
20:40 < thekorn> and check writable memory again
20:41 < thekorn> if I get 4000 events with a cache size of 2000 I've 14MB
                 writable memory
20:41 < thekorn> which is far away from your 30MB

lp:~seif/zeitgeist/add_cache_for_get_events updated on 2011-01-11

1662. By Seif Lotfy on 2011-01-11: removed old.next to fix the memory leak

Revision history for this message

Seif Lotfy (seif) wrote on 2011-01-12:

#

I am moving the work to a new clean branch lp:~seif/zeitgeist/fix-686732
Please lets continue there

Zeitgeist Framework

Merge lp:~seif/zeitgeist/add_cache_for_get_events into lp:zeitgeist/0.1

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers

 === modified file '_zeitgeist/Makefile.am'
 --- _zeitgeist/Makefile.am	2010-10-13 15:33:17 +0000
 +++ _zeitgeist/Makefile.am	2011-01-11 08:49:14 +0000
@@ -4,4 +4,5 @@
  app_PYTHON = \
  	__init__.py \
--	singleton.py
++	singleton.py \
++	lrucache.py
 === modified file '_zeitgeist/engine/__init__.py'
 --- _zeitgeist/engine/__init__.py	2010-12-04 17:49:33 +0000
 +++ _zeitgeist/engine/__init__.py	2011-01-11 08:49:14 +0000
@@ -60,5 +60,12 @@
  	CORE_SCHEMA_VERSION = 3
  	USER_EXTENSION_PATH = os.path.join(DATA_PATH, "extensions")
++
++	__default_cache_size = 2000
++	if "ZEITGEIST_CACHE_SIZE" in os.environ:
++		CACHE_SIZE = int(os.environ["ZEITGEIST_CACHE_SIZE"])
++	else:
++		CACHE_SIZE = __default_cache_size
++	log.debug("Cache size = %i" %CACHE_SIZE)
  constants = _Constants()
 === modified file '_zeitgeist/engine/main.py'
 --- _zeitgeist/engine/main.py	2010-12-28 22:32:47 +0000
 +++ _zeitgeist/engine/main.py	2011-01-11 08:49:14 +0000
@@ -34,6 +34,7 @@
  from _zeitgeist.engine import constants
  from _zeitgeist.engine.sql import get_default_cursor, unset_cursor, \
  	TableLookup, WhereClause
++from _zeitgeist.lrucache import LRUCache
  log = logging.getLogger("zeitgeist.engine")
@@ -118,6 +119,7 @@
  		self._manifestation = TableLookup(cursor, "manifestation")
  		self._mimetype = TableLookup(cursor, "mimetype")
  		self._actor = TableLookup(cursor, "actor")
++		self._event_cache = LRUCache(constants.CACHE_SIZE)
  	@property
  	def extensions(self):
@@ -166,10 +168,22 @@
  		if not ids:
  			return []
--		rows = self._cursor.execute("""
--			SELECT * FROM event_view
--			WHERE id IN (%s)
--			""" % ",".join("%d" % id for id in ids)).fetchall()
++		# Split ids into cached and uncached
++		uncached_ids = []
++		cached_ids = []
++
++		# Id ids batch greater than CACHE_SIZE ids ignore cache
++		use_cache = True
++		if len(ids) > constants.CACHE_SIZE/2:
++			use_cache = False
++		if not use_cache:
++			uncached_ids = ids
++		else:
++			for id in ids:
++				if id in self._event_cache:
++					cached_ids.append(id)
++				else:
++					uncached_ids.append(id)
  		id_hash = defaultdict(list)
  		for n, id in enumerate(ids):
@@ -183,10 +197,36 @@
  		# deleted
  		events = {}
  		sorted_events = [None]*len(ids)
++
++		for id in cached_ids:
++			event = self._event_cache[id]
++			if event:
++				event = self.extensions.apply_get_hooks(event, sender)
++				if event is not None:
++					for n in id_hash[event.id]:
++						# insert the event into all necessary spots (LP: #673916)
++						sorted_events[n] = event
++
++		# Get uncached events
++		rows = self._cursor.execute("""
++			SELECT * FROM event_view
++			WHERE id IN (%s)
++			""" % ",".join("%d" % id for id in uncached_ids)).fetchall()
++
++		log.debug("Got %d raw events in %fs" % (len(rows), time.time()-t))
++		t = time.time()
++
++		t_get_event = 0
++		t_get_subject = 0
++		t_apply_get_hooks = 0
++
  		for row in rows:
  			# Assumption: all rows of a same event for its different
  			# subjects are in consecutive order.
++			t_get_event -= time.time()
  			event = self._get_event_from_row(row)
++			t_get_event += time.time()
++
  			if event:
  				# Check for existing event.id in event to attach
  				# other subjects to it
@@ -194,14 +234,28 @@
  					events[event.id] = event
  				else:
  					event = events[event.id]
++
++				# Avoid caching events with payloads to have keep the cache MB size
++				# at a decent level
++				if use_cache and not event.payload:
++					self._event_cache[event.id] = event
++
++				t_get_subject -= time.time()
  				event.append_subject(self._get_subject_from_row(row))
++				t_get_subject += time.time()
++
++				t_apply_get_hooks -= time.time()
  				event = self.extensions.apply_get_hooks(event, sender)
++				t_apply_get_hooks += time.time()
  				if event is not None:
  					for n in id_hash[event.id]:
  						# insert the event into all necessary spots (LP: #673916)
  						sorted_events[n] = event
--
++
  		log.debug("Got %d events in %fs" % (len(sorted_events), time.time()-t))
++		log.debug("    Where time spent in _get_event_from_row in %fs" % (t_get_event))
++		log.debug("    Where time spent in _get_subject_from_row in %fs" % (t_get_subject))
++		log.debug("    Where time spent in apply_get_hooks in %fs" % (t_apply_get_hooks))
  		return sorted_events
  	@staticmethod
@@ -604,6 +658,11 @@
  		""" % ",".join(str(int(_id)) for _id in ids))
  		timestamps = self._cursor.fetchone()
++		# Remove events from cache
++		for id in ids:
++			if id in self._event_cache:
++				del self._event_cache[id]
++
  		# Make sure that we actually found some events with these ids...
  		# We can't do all(timestamps) here because the timestamps may be 0
  		if timestamps and timestamps[0] is not None and timestamps[1] is not None:
@@ -613,7 +672,6 @@
  			log.debug("Deleted %s" % map(int, ids))
  			self.extensions.apply_post_delete(ids, sender)
--
  			return timestamps
  		else:
  			log.debug("Tried to delete non-existing event(s): %s" % map(int, ids))
 === added file '_zeitgeist/lrucache.py'
 --- _zeitgeist/lrucache.py	1970-01-01 00:00:00 +0000
 +++ _zeitgeist/lrucache.py	2011-01-11 08:49:14 +0000
@@ -0,0 +1,125 @@
++# -.- coding: utf-8 -.-
++
++# lrucache.py
++#
++# Copyright © 2009 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@gmail.com>
++# Copyright © 2009 Markus Korn <thekorn@gmx.de>
++# Copyright © 2011 Seif Lotfy <seif@lotfy.com>
++#
++# This program is free software: you can redistribute it and/or modify
++# it under the terms of the GNU Lesser General Public License as published by
++# the Free Software Foundation, either version 3 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU Lesser General Public License for more details.
++#
++# You should have received a copy of the GNU Lesser General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++class LRUCache:
++	"""
++	A simple LRUCache implementation backed by a linked list and a dict.
++	It can be accessed and updated just like a dict. To check if an element
++	exists in the cache the following type of statements can be used:
++		if "foo" in cache
++	"""
++
++	class _Item:
++		"""
++		A container for each item in LRUCache which knows about the
++		item's position and relations
++		"""
++		def __init__(self, item_key, item_value):
++			self.value = item_value
++			self.key = item_key
++			self.next = None
++			self.prev = None
++
++	def __init__(self, max_size):
++		"""
++		The size of the cache (in number of cached items) is guaranteed to
++		never exceed 'size'
++		"""
++		self._max_size = max_size
++		self.clear()
++
++
++	def clear(self):
++		self._list_end = None # The newest item
++		self._list_start = None # Oldest item
++		self._map = {}
++
++	def __len__(self):
++		return len(self._map)
++
++	def __contains__(self, key):
++		return key in self._map
++
++	def __delitem__(self, key):
++		item = self._map[key]
++		if item.prev:
++			item.prev.next = item.next
++		else:
++			# we are deleting the first item, so we need a new first one
++			self._list_start = item.next
++		if item.next:
++			item.next.prev = item.prev
++		else:
++			# we are deleting the last item, get a new last one
++			self._list_end = item.prev
++		del self._map[key], item
++
++	def __setitem__(self, key, value):
++		if key in self._map:
++			item = self._map[key]
++			item.value = value
++			self._move_item_to_end(item)
++		else:
++			new = LRUCache._Item(key, value)
++			self._append_to_list(new)
++
++			if len(self._map) > self._max_size :
++				# Remove eldest entry from list
++				self.remove_eldest_item()
++
++	def __getitem__(self, key):
++		item = self._map[key]
++		self._move_item_to_end(item)
++		return item.value
++
++	def __iter__(self):
++		"""
++		Iteration is in order from eldest to newest,
++		and returns (key,value) tuples
++		"""
++		iter = self._list_start
++		while iter != None:
++			yield (iter.key, iter.value)
++			iter = iter.next
++
++	def _move_item_to_end(self, item):
++		del self[item.key]
++		self._append_to_list(item)
++
++	def _append_to_list(self, item):
++		self._map[item.key] = item
++		if not self._list_start:
++			self._list_start = item
++		if self._list_end:
++			self._list_end.next = item
++			item.prev = self._list_end
++			item.next = None
++		self._list_end = item
++
++	def remove_eldest_item(self):
++		if self._list_start == self._list_end:
++			self._list_start = None
++			self._list_end = None
++			return
++		old = self._list_start
++		old.next.prev = None
++		self._list_start = old.next
++		del self[old.key], old
 \ No newline at end of file
 === added file 'test/lrucache_test.py'
 --- test/lrucache_test.py	1970-01-01 00:00:00 +0000
 +++ test/lrucache_test.py	2011-01-11 08:49:14 +0000
@@ -0,0 +1,98 @@
++#!/usr/bin/python
++
++# Update python path to use local zeitgeist module
++import sys
++import os
++sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
++
++from _zeitgeist.lrucache import LRUCache
++
++import unittest
++
++class LRUCacheTest(unittest.TestCase):
++
++	def testPutGetOne(self):
++		"""Test that we can cache and then retrieve one single item"""
++		cache = LRUCache(10)
++		cache["foo"] = "bar"
++		self.assertEquals("bar", cache["foo"])
++		self.assertRaises(KeyError, lambda : cache["nosuchelement"])
++
++	def testPutGetTwo(self):
++		"""Test that we can cache and then retrieve two items"""
++		cache = LRUCache(10)
++		cache["foo1"] = "bar1"
++		cache["foo2"] = "bar2"
++		self.assertEquals("bar1", cache["foo1"])
++		self.assertEquals("bar2", cache["foo2"])
++		self.assertRaises(KeyError, lambda : cache["nosuchelement"])
++
++	def testExceedMaxSize(self):
++		"""Test that we can restrict the cache size to one element, and that
++		   this one element is the latest one we've added"""
++		cache = LRUCache(1)
++		cache["foo1"] = "bar1"
++		cache["foo2"] = "bar2"
++		self.assertRaises(KeyError, lambda : cache["foo1"])
++		self.assertEquals("bar2", cache["foo2"])
++		self.assertEquals(1, len(cache))
++
++	def testInKeyword(self):
++		"""Make sure we can do 'if "foo" in cache' type of statements"""
++		cache = LRUCache(5)
++		cache["foo1"] = "bar1"
++		self.assertFalse("bork" in cache)
++		self.assertTrue("foo1" in cache)
++
++	def testIteration(self):
++		"""Make sure that iteration is in the correct order; oldest to newest"""
++		cache = LRUCache(4)
++		cache["foo1"] = "bar1"
++		cache["foo2"] = "bar2"
++		cache["foo3"] = "bar3"
++		cache["foo4"] = "bar4"
++		cache["foo1"] = "bar1" # "foo1" should now be newest
++
++		l = []
++		for key_val in cache : l.append(key_val)
++		self.assertEquals([("foo2", "bar2"),
++		                   ("foo3", "bar3"),
++		                   ("foo4", "bar4"),
++		                   ("foo1", "bar1")], l)
++
++	def testDeleteItem(self):
++		cache = LRUCache(4)
++		cache["foo1"] = "bar1"
++		cache["foo2"] = "bar2"
++		cache["foo3"] = "bar3"
++		cache["foo4"] = "bar4"
++		self.assertTrue("foo2" in cache)
++		del cache["foo2"] # delete item in the middle of the cache
++		self.assertTrue("foo2" not in cache)
++		self.assertEquals(
++			[("foo1", "bar1"), ("foo3", "bar3"), ("foo4", "bar4")],
++			list(cache)
++		)
++		self.assertTrue("foo1" in cache)
++		del cache["foo1"] # delete first item
++		self.assertTrue("foo1" not in cache)
++		self.assertEquals(
++			[("foo3", "bar3"), ("foo4", "bar4")],
++			list(cache)
++		)
++		self.assertTrue("foo4" in cache)
++		del cache["foo4"] # delete last item
++		self.assertTrue("foo4" not in cache)
++		self.assertEquals(
++			[("foo3", "bar3")],
++			list(cache)
++		)
++		del cache["foo3"]
++		self.assertTrue("foo3" not in cache)
++		self.assertEquals([], list(cache))
++		self.assertTrue(cache._list_start is None)
++		self.assertTrue(cache._list_end is None)
++
++
++if __name__ == '__main__':
++	unittest.main()
 \ No newline at end of file
 === modified file 'zeitgeist/client.py'
 --- zeitgeist/client.py	2010-12-26 09:18:11 +0000
 +++ zeitgeist/client.py	2011-01-11 08:49:14 +0000
@@ -919,4 +919,4 @@
  			normal_reply_handler(normal_reply_data)
  _FIND_EVENTS_FOR_TEMPLATES_ARGS = inspect.getargspec(
--	ZeitgeistClient.find_events_for_templates)[0]
++	ZeitgeistClient.find_events_for_templates)[0]
 \ No newline at end of file