Merge lp:~jtatum/ubuntu-accomplishments-daemon/lp-caching-part-1 into lp:ubuntu-accomplishments-daemon

Proposed by James Tatum
Status: Rejected
Rejected by: Jono Bacon
Proposed branch: lp:~jtatum/ubuntu-accomplishments-daemon/lp-caching-part-1
Merge into: lp:ubuntu-accomplishments-daemon
Diff against target: 126 lines (+117/-0)
2 files modified
accomplishments/lpdata.py (+35/-0)
accomplishments/util/cacheddata.py (+82/-0)
To merge this branch: bzr merge lp:~jtatum/ubuntu-accomplishments-daemon/lp-caching-part-1
Reviewer Review Type Date Requested Status
Jono Bacon Disapprove
Rafał Cieślak Needs Information
Review via email: mp+105419@code.launchpad.net

Description of the change

This is part one of changes for simplifying LaunchPad (and other) related accomplishments and caching LaunchPad output so we don't beat it up so much.

The basic premise here is to create an object that a) makes it possible to easily query team membership and other launchpad data and b) store the output locally rather than constantly asking LP each script.

This enables code that executes pretty much instantly on cache hits and looks like:

>>> import accomplishments.lpdata
Daemon seems to be run not installed, branch base path used: /mnt/hgfs/jtatum/Projects/ubuntu-accomplishments-daemon
>>> me = accomplishments.lpdata.LPData.fetch('<email address hidden>')
>>> 'locoteams' in me.super_teams
True
>>> 'motu' in me.super_teams
False
>>> 'ubuntu-california' in me.direct_teams
True

To post a comment you must log in.
41. By James Tatum

New editor was set to use tabs in some places. Hopefully fixed

Revision history for this message
James Tatum (jtatum) wrote :

I put this in the daemon code on the assumption that it would be useful to multiple consumers, but if we think that only ubuntu-community-accomplishments will need it then it probably belongs there.

Revision history for this message
Rafał Cieślak (rafalcieslak256) wrote :

Thanks for your work, James. I really like the idea of simplifying scripts, it seems you have done a pretty good job here.
However, as you have mentioned, it seems that these scripts will make sense only for the ubuntu-community collection. Could this code be somehow bundled with the accomplishments scripts within that collection, to avoid adding to the daemon pieces of code that are meant to be useful only for a particular collection? If yes, then it would make much more sense if this was included in that package instead of the daemon.

There is one more thing that worries me - the cache is not updated. Imagine such situation:
1. User A is not a Ubuntu Member.
2. The cache is created for him, it remembers that is is not a Member.
3. He reads the instructions on how to become a Member, and soon accomplishes this trophy.
4. User A clicks 'check-accomplishment', or waits up to 15 minutes for the scripts to be run.
5. The daemon uses scripts to check if Ubuntu Member has been accomplished.
6. The scripts ask the cache if he is a member of 'ubuntu-members'
7. Obviously, the cache denies that, and so the user is unable to gain this trophy (and similarly a lot of others).

Revision history for this message
James Tatum (jtatum) wrote :

Hi Rafal,

Actually, the cache is only valid for one hour. So in about an hour, the trophy will be awarded. The timeout value can be changed in the library. I think once an hour is a good amount of time for beating up the launchpad servers to request data that doesn't change that often. Even a very active user might only change teams what.. 50 times a year? Actually, I worry that even one hour is too much if this becomes distributed with Ubuntu - the Launchpad servers could become inundated if they start getting a few hundred thousand of these requests per hour.

Also, I don't think this will be just for ubuntu-community-accomplishments. Any accomplishment set that uses Launchpad data should be using this cache. As an example, the Italian Loco's accomplishments mentioned at http://www.jonobacon.org/2012/04/04/quick-ubuntu-accomplishments-update/ should almost certainly be updated to use cached data.

But, that said, I don't know about putting it here. Where are the dbus client calls going for the new API?

Revision history for this message
Rafał Cieślak (rafalcieslak256) wrote :

James,
Sorry for not replying, Launchpad had never notified me about the new comment :(

I must have not noticed the cache validity check, it is indeed valid for one hour, which makes sense. And in case we changed out mind this can be fine-tuned later.

I am still convinced that this enhancements should be shipped with the accomplishments collection rather than the core system. We aim to make the platform as little Ubuntu-oriented as possible, so that even a paragliding club or ninjas community might create collections that fully satisfy their needs. This means that such launchpad-related calls shall be avoided in the daemon itself. You are right that other accomplishments collections might want to use the vary same cache - so I imagine ubuntu-it-community might want to include the same .py module(s) that ubuntu-community-accomplishments uses for caching, and both might work on the very same cache file.

Does that make sense?

review: Needs Information
Revision history for this message
James Tatum (jtatum) wrote :

If both are to use the same file, they really need to use the same source. The source is essentially implementing a schema in the cache, so if ubuntu-it-community and ubuntu-community-accomplishments are using different versions of the cache schema, they will definitely conflict in unpleasant ways.

If you are adamant about this not belonging here, then we need a new project that stores things common to any accomplishments that are launchpad based.

Revision history for this message
Rafał Cieślak (rafalcieslak256) wrote :

This is a valid point, and I have to agree that particular collections shouldn't work on the same cache file. I guess the most elegant solution then would be to simply use separate files for each collection. I don't really think we need another project for that, this really could be done on scripts' side. The scripts are responsible for fetching the data, so ideally they would care about their cache too. I imagine such caching mechanism might be even merged into a single file (or a few) that would be shipped alongside our current scripts and included/imported by them, so that it would be easily re-usable in other collections.

As a side note, since we now have a lot of AU accomplishments, they would make a great use of their own cache, for they all download the very same file.

Revision history for this message
James Tatum (jtatum) wrote :

To the contrary - I think they all should use the same cache file! :) This code and code like it should be a common library. Why have n separate caches for the same data?

Revision history for this message
Rafał Cieślak (rafalcieslak256) wrote :

That would be possible only in an ideal situation, but as you've pointed out yourself, the cache schema may differ, thus it's the safest solution is to keep them separate.

After all, this is not a big deal. I do not expect anyone will ever have installed more than 3 accomplishment collections that use launchpad API. Very few, only Ubuntu-related collections will make use of it.

I am still convinced that the simplest and easiest to manage solution is to implement the caching algorithm on scripts side. The daemon should at most point the scripts to their collection's cache directory, and should not contain any code that works in favor of ubuntu-affiliated scripts. All that can be very simply done as a single .py file shipped within accomplishments collection, which may result in having more than one cache for the same thing, but it will very rarely be a significant case, and even then will never cause problems.

Revision history for this message
James Tatum (jtatum) wrote :

I guess I just don't understand the resistance to it. Don't repeat yourself :) If the code has a bug or needs an update, it has to be updated.. everywhere. The only mitigating argument is that this is already the case for the individual accomplishment scripts - and I think that should be changed too :)

I am with you about the daemon not being the right place for this type of code. So, let's spin up a new package that all Launchpad related accomplishments can depend on. Any suggestions on the name?

Revision history for this message
Jono Bacon (jonobacon) wrote :

Rafal has landed caching support in UCA which I believe is based on your work, so I believe this MP is no longer need. Thanks, James!

review: Disapprove
Revision history for this message
Jono Bacon (jonobacon) :
review: Disapprove

Unmerged revisions

41. By James Tatum

New editor was set to use tabs in some places. Hopefully fixed

40. By James Tatum

Removing some extra whitespace

39. By James Tatum

First part of LaunchPad data caching mechanism and accomplishment simplification

38. By Launchpad Translations on behalf of jonobacon

Launchpad automatic translations update.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'accomplishments/lpdata.py'
2--- accomplishments/lpdata.py 1970-01-01 00:00:00 +0000
3+++ accomplishments/lpdata.py 2012-05-10 23:04:19 +0000
4@@ -0,0 +1,35 @@
5+from launchpadlib.launchpad import Launchpad
6+
7+from util.cacheddata import CachedData
8+
9+
10+class LPData(CachedData):
11+ """Force launchpadlib to evaluate some of the lazy data it knows about a
12+ user and store it in member variables.
13+ """
14+ # If you add any fields to the LPData object, increment this number to
15+ # ensure the cache is invalidated on update
16+ VERSION = 1
17+
18+ def __init__(self):
19+ super(LPData, self).__init__()
20+ # This is a list of all teams the user is a direct or indirect member
21+ self.super_teams = []
22+ # Store a list of teams user is a direct member of
23+ self.direct_teams = []
24+ self.name = None
25+
26+ @classmethod
27+ def populate(cls, email):
28+ """Return a new LPData object populated from the key, which is the
29+ user's email address.
30+ """
31+ data = LPData()
32+ lp = Launchpad.login_anonymously('ubuntu-community accomplishments',
33+ 'production')
34+ user = lp.people.getByEmail(email=email)
35+ data.name = user.name
36+ data.key = email
37+ data.super_teams = [i.name for i in user.super_teams]
38+ data.direct_teams = [i.team.name for i in user.memberships_details]
39+ return data
40
41=== added file 'accomplishments/util/cacheddata.py'
42--- accomplishments/util/cacheddata.py 1970-01-01 00:00:00 +0000
43+++ accomplishments/util/cacheddata.py 2012-05-10 23:04:19 +0000
44@@ -0,0 +1,82 @@
45+try:
46+ import cPickle as pickle
47+except ImportError:
48+ import pickle
49+import os
50+import os.path
51+import time
52+
53+from launchpadlib.launchpad import Launchpad
54+
55+
56+CACHE_LIFESPAN = 60*60
57+
58+
59+class CachedData(object):
60+ """CachedData is a parent object which allows inheriting objects to perform
61+ some costly fetch operation and cache the output.
62+
63+ Child objects should create a populate(key) method. This method will
64+ perform the costly retrieval and store the data in member variables.
65+ Calling the fetch(key) classmethod will check the cache for this data
66+ and return it if the following conditions are met:
67+
68+ 1) The cache file exists
69+ 2) The cache is not stale (see the CACHE_LIFESPAN module global)
70+ 3) The key in the cache matches the requested key
71+ 4) The version of data in the cache is equal to the defined class
72+ VERSION
73+
74+ If a condition is not met, the populate(key) method is called to fetch
75+ the data and it is stored in the cache.
76+ """
77+ def __init__(self):
78+ self.key = None
79+ self.version = self.VERSION
80+
81+ def __repr__(self):
82+ cls = self.__class__
83+ return '<%s.%s object - %r>' % (cls.__module__,
84+ cls.__name__,
85+ self.name)
86+
87+ @classmethod
88+ def fetch(cls, key):
89+ """Fetch data from the cache or from a costly source.
90+
91+ This will call the populate(key) method on derived classes. That
92+ method should return a populated object of the derived class.
93+ This method will then store the object in the cache for later
94+ retrieval.
95+ """
96+ # basedir spec says to check $XDG_CACHE_HOME for the location of
97+ # the user cache dir first. Failing that, it's just ~/.cache.
98+ try:
99+ cache_dir = os.environ['XDG_CACHE_HOME']
100+ except KeyError:
101+ cache_dir = '~/.cache'
102+
103+ cache_dir = os.path.expanduser(cache_dir)
104+ cache_dir = os.path.join(cache_dir, 'accomplishments')
105+ if not os.path.exists(cache_dir):
106+ os.makedirs(cache_dir)
107+
108+ cache_file = os.path.join(cache_dir, cls.__name__.lower())
109+
110+ if os.path.exists(cache_file):
111+ mtime = os.path.getmtime(cache_file)
112+ if abs(time.time() - mtime) < CACHE_LIFESPAN:
113+ with open(cache_file, 'rb') as input:
114+ obj = pickle.load(input)
115+ if cls.VERSION == obj.version:
116+ if obj.key == key:
117+ # Cache hit. All conditions met.
118+ return obj
119+
120+ # Cache miss. Call populate()
121+ obj = cls.populate(key)
122+
123+ with open(cache_file, 'wb') as output:
124+ pickle.dump(obj, output)
125+
126+ return obj

Subscribers

People subscribed via source and target branches