Merge lp:~robru/friends/since_id into lp:friends

Proposed by Robert Bruce Park
Status: Merged
Approved by: Ken VanDine
Approved revision: 158
Merged at revision: 160
Proposed branch: lp:~robru/friends/since_id
Merge into: lp:friends
Diff against target: 505 lines (+250/-34)
6 files modified
friends/protocols/twitter.py (+48/-25)
friends/tests/test_cache.py (+69/-0)
friends/tests/test_identica.py (+12/-3)
friends/tests/test_twitter.py (+44/-5)
friends/utils/base.py (+1/-1)
friends/utils/cache.py (+76/-0)
To merge this branch: bzr merge lp:~robru/friends/since_id
Reviewer Review Type Date Requested Status
Ken VanDine Approve
PS Jenkins bot (community) continuous-integration Approve
Review via email: mp+152315@code.launchpad.net

Commit message

    Start using since_id= on Twitter API requests. (LP: #1152417)

    This was accomplished by implementing two new classes, and ended up
    simplifying some of the RateLimiter code as a side effect.

    The first new class is called JsonCache. It is a subclass of dict,
    which attempts to populate it's initial state by reading in a json
    text file at a configurable location, and also adds a new "write()"
    method that dumps the json back out to the same location. This class
    was a generalization of what we were already doing inside the
    RateLimiter, so it should not be considered a "new feature" if we are
    going to butt heads with today's feature freeze.

    The second new class is a subclass of JsonCache, which enforces that:

    A) keys may not contain slashes, to avoid it getting polluted with
    every search term ever searched for, or every message that's ever been
    replied to ever, it only observes the values of the "main" streams,
    such as "messages", "mentions" and "private", although those values
    are not hardcoded so it's flexible to adapt to new streams in the
    future.

    B) values must be ints (tweet_ids), and values can only be
    incremented. This is so that we can easily just throw every observed
    tweet_id into the cache, and it only records the largest (newest) one.

    The end result is that we now have two new files located at
    ~/.cache/friends/twitter_ids.json and
    ~/.cache/friends/identica_ids.json which track the newest tweet_id
    that we have ever seen for each of the streams that we publish to.
    These values are then consulted to form the since_id= argument to
    several of Twitter's API endpoints, which solves bug #1152417.

    As an added bonus, this also greatly reduces our network usage because
    we are no longer redownloading duplicate messages over and over, so if
    there are no new messages, Twitter is now returning an empty list of
    Tweets rather than a large list of stale tweets.

    This commit includes full test coverage for all new code.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:158
http://jenkins.qa.ubuntu.com/job/friends-ci/3/
Executed test runs:
    SUCCESS: http://jenkins.qa.ubuntu.com/job/friends-raring-amd64-ci/3//console

Click here to trigger a rebuild:
http://jenkins.qa.ubuntu.com/job/friends-ci/3//rebuild/?

review: Approve (continuous-integration)
Revision history for this message
Ken VanDine (ken-vandine) wrote :

Looks great, works well and i love the tests!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'friends/protocols/twitter.py'
2--- friends/protocols/twitter.py 2013-03-01 19:52:25 +0000
3+++ friends/protocols/twitter.py 2013-03-08 02:36:21 +0000
4@@ -22,10 +22,7 @@
5 ]
6
7
8-import os
9 import time
10-import json
11-import errno
12 import logging
13
14 from urllib.parse import quote
15@@ -33,14 +30,14 @@
16
17 from friends.utils.avatar import Avatar
18 from friends.utils.base import Base, feature
19+from friends.utils.cache import JsonCache
20+from friends.utils.model import Model
21 from friends.utils.http import BaseRateLimiter, Downloader
22 from friends.utils.time import parsetime, iso8601utc
23 from friends.errors import FriendsError
24
25
26 TWITTER_ADDRESS_BOOK = 'friends-twitter-contacts'
27-TWITTER_RATELIMITER_CACHE = os.path.join(
28- GLib.get_user_cache_dir(), 'friends', 'twitter.rates')
29
30
31 log = logging.getLogger(__name__)
32@@ -72,6 +69,8 @@
33 def __init__(self, account):
34 super().__init__(account)
35 self._rate_limiter = RateLimiter()
36+ # Can be 'twitter_ids' or 'identica_ids'
37+ self._tweet_ids = TweetIdCache(self.__class__.__name__.lower() + '_ids')
38
39 def _whoami(self, authdata):
40 """Identify the authenticating user."""
41@@ -103,6 +102,13 @@
42 log.info('Ignoring tweet with no id_str value')
43 return
44
45+ # We need to record tweet_ids for use with since_id. Note that
46+ # _tweet_ids is a special dict subclass that only accepts
47+ # tweet_ids that are larger than the existing value, so at any
48+ # given time it will map the stream to the largest (most
49+ # recent) tweet_id we've seen for that stream.
50+ self._tweet_ids[stream] = tweet_id
51+
52 # 'user' for tweets, 'sender' for direct messages.
53 user = tweet.get('user', {}) or tweet.get('sender', {})
54 screen_name = user.get('screen_name', '')
55@@ -129,12 +135,20 @@
56 )
57 return permalink
58
59+ def _append_since(self, url, stream='messages'):
60+ since = self._tweet_ids.get(stream)
61+ if since is not None:
62+ return '{}&since_id={}'.format(url, since)
63+ return url
64+
65 # https://dev.twitter.com/docs/api/1.1/get/statuses/home_timeline
66 @feature
67 def home(self):
68 """Gather the user's home timeline."""
69- url = self._timeline.format(
70- 'home') + '?count={}'.format(self._DOWNLOAD_LIMIT)
71+ url = '{}?count={}'.format(
72+ self._timeline.format('home'),
73+ self._DOWNLOAD_LIMIT)
74+ url = self._append_since(url)
75 for tweet in self._get_url(url):
76 self._publish_tweet(tweet)
77 return self._get_n_rows()
78@@ -143,7 +157,10 @@
79 @feature
80 def mentions(self):
81 """Gather the tweets that mention us."""
82- url = self._mentions_timeline
83+ url = '{}?count={}'.format(
84+ self._mentions_timeline,
85+ self._DOWNLOAD_LIMIT)
86+ url = self._append_since(url, 'mentions')
87 for tweet in self._get_url(url):
88 self._publish_tweet(tweet, stream='mentions')
89 return self._get_n_rows()
90@@ -185,11 +202,17 @@
91 @feature
92 def private(self):
93 """Gather the direct messages sent to/from us."""
94- url = self._api_base.format(endpoint='direct_messages')
95+ url = '{}?count={}'.format(
96+ self._api_base.format(endpoint='direct_messages'),
97+ self._DOWNLOAD_LIMIT)
98+ url = self._append_since(url, 'private')
99 for tweet in self._get_url(url):
100 self._publish_tweet(tweet, stream='private')
101
102- url = self._api_base.format(endpoint='direct_messages/sent')
103+ url = '{}?count={}'.format(
104+ self._api_base.format(endpoint='direct_messages/sent'),
105+ self._DOWNLOAD_LIMIT)
106+ url = self._append_since(url, 'private')
107 for tweet in self._get_url(url):
108 self._publish_tweet(tweet, stream='private')
109 return self._get_n_rows()
110@@ -374,35 +397,36 @@
111 return self._delete_service_contacts(source)
112
113
114+class TweetIdCache(JsonCache):
115+ """Persist most-recent tweet_ids as JSON."""
116+
117+ def __setitem__(self, key, value):
118+ if key.find('/') >= 0:
119+ # Don't flood the cache with irrelevant "reply_to/..." and
120+ # "search/..." streams, we only need the main streams.
121+ return
122+ value = int(value)
123+ if value > self.get(key, 0):
124+ JsonCache.__setitem__(self, key, value)
125+
126+
127 class RateLimiter(BaseRateLimiter):
128 """Twitter rate limiter."""
129
130 def __init__(self):
131- try:
132- with open(TWITTER_RATELIMITER_CACHE, 'r') as cache:
133- self._limits = json.loads(cache.read())
134- except IOError as error:
135- if error.errno != errno.ENOENT:
136- raise
137- # File not found, so create it:
138- self._limits = {}
139- self._persist_data()
140+ self._limits = JsonCache('twitter-ratelimiter')
141
142 def _sanitize_url(self, uri):
143 # Cache the URL sans any query parameters.
144 return uri.host + uri.path
145
146- def _persist_data(self):
147- with open(TWITTER_RATELIMITER_CACHE, 'w') as cache:
148- cache.write(json.dumps(self._limits))
149-
150 def wait(self, message):
151 # If we haven't seen this URL, default to no wait.
152 seconds = self._limits.pop(self._sanitize_url(message.get_uri()), 0)
153 log.debug('Sleeping for {} seconds!'.format(seconds))
154 time.sleep(seconds)
155 # Don't sleep the same length of time more than once!
156- self._persist_data()
157+ self._limits.write()
158
159 def update(self, message):
160 info = message.response_headers
161@@ -427,7 +451,6 @@
162 else:
163 wait_secs = rate_delta / rate_count
164 self._limits[url] = wait_secs
165- self._persist_data()
166 log.debug(
167 'Next access to {} must wait {} seconds!'.format(
168 url, self._limits.get(url, 0)))
169
170=== added file 'friends/tests/test_cache.py'
171--- friends/tests/test_cache.py 1970-01-01 00:00:00 +0000
172+++ friends/tests/test_cache.py 2013-03-08 02:36:21 +0000
173@@ -0,0 +1,69 @@
174+# friends-dispatcher -- send & receive messages from any social network
175+# Copyright (C) 2012 Canonical Ltd
176+#
177+# This program is free software: you can redistribute it and/or modify
178+# it under the terms of the GNU General Public License as published by
179+# the Free Software Foundation, version 3 of the License.
180+#
181+# This program is distributed in the hope that it will be useful,
182+# but WITHOUT ANY WARRANTY; without even the implied warranty of
183+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
184+# GNU General Public License for more details.
185+#
186+# You should have received a copy of the GNU General Public License
187+# along with this program. If not, see <http://www.gnu.org/licenses/>.
188+
189+"""Test the JSON cacher."""
190+
191+__all__ = [
192+ 'TestJsonCache',
193+ ]
194+
195+
196+import os
197+import time
198+import shutil
199+import tempfile
200+import unittest
201+
202+from datetime import date, timedelta
203+from pkg_resources import resource_filename
204+
205+from friends.utils.cache import JsonCache
206+
207+
208+class TestJsonCache(unittest.TestCase):
209+ """Test JsonCache logic."""
210+
211+ def setUp(self):
212+ self._temp_cache = tempfile.mkdtemp()
213+ self._root = JsonCache._root = os.path.join(
214+ self._temp_cache, '{}.json')
215+
216+ def tearDown(self):
217+ # Clean up the temporary cache directory.
218+ shutil.rmtree(self._temp_cache)
219+
220+ def test_creation(self):
221+ cache = JsonCache('foo')
222+ with open(self._root.format('foo'), 'r') as fd:
223+ empty = fd.read()
224+ self.assertEqual(empty, '{}')
225+
226+ def test_values(self):
227+ cache = JsonCache('bar')
228+ cache['hello'] = 'world'
229+ with open(self._root.format('bar'), 'r') as fd:
230+ result = fd.read()
231+ self.assertEqual(result, '{"hello": "world"}')
232+
233+ def test_writes(self):
234+ cache = JsonCache('stuff')
235+ cache.update(dict(pi=289/92))
236+ with open(self._root.format('stuff'), 'r') as fd:
237+ empty = fd.read()
238+ self.assertEqual(empty, '{}')
239+ cache.write()
240+ with open(self._root.format('stuff'), 'r') as fd:
241+ result = fd.read()
242+ self.assertEqual(result, '{"pi": 3.141304347826087}')
243
244=== modified file 'friends/tests/test_identica.py'
245--- friends/tests/test_identica.py 2013-02-05 01:11:35 +0000
246+++ friends/tests/test_identica.py 2013-03-08 02:36:21 +0000
247@@ -21,12 +21,16 @@
248 ]
249
250
251+import os
252+import tempfile
253 import unittest
254+import shutil
255
256 from gi.repository import Dee
257
258 from friends.protocols.identica import Identica
259 from friends.tests.mocks import FakeAccount, LogMock, mock
260+from friends.utils.cache import JsonCache
261 from friends.utils.model import COLUMN_TYPES
262 from friends.errors import AuthorizationError
263
264@@ -43,6 +47,9 @@
265 """Test the Identica API."""
266
267 def setUp(self):
268+ self._temp_cache = tempfile.mkdtemp()
269+ self._root = JsonCache._root = os.path.join(
270+ self._temp_cache, '{}.json')
271 self.account = FakeAccount()
272 self.protocol = Identica(self.account)
273 self.log_mock = LogMock('friends.utils.base',
274@@ -52,6 +59,7 @@
275 # Ensure that any log entries we haven't tested just get consumed so
276 # as to isolate out test logger from other tests.
277 self.log_mock.stop()
278+ shutil.rmtree(self._temp_cache)
279
280 @mock.patch.dict('friends.utils.authentication.__dict__', LOGIN_TIMEOUT=1)
281 @mock.patch('friends.utils.authentication.Signon.AuthSession.new')
282@@ -83,7 +91,7 @@
283
284 publish.assert_called_with('tweet', stream='mentions')
285 get_url.assert_called_with(
286- 'http://identi.ca/api/statuses/mentions.json')
287+ 'http://identi.ca/api/statuses/mentions.json?count=50')
288
289 def test_user(self):
290 get_url = self.protocol._get_url = mock.Mock(return_value=['tweet'])
291@@ -116,8 +124,9 @@
292 publish.assert_called_with('tweet', stream='private')
293 self.assertEqual(
294 get_url.mock_calls,
295- [mock.call('http://identi.ca/api/direct_messages.json'),
296- mock.call('http://identi.ca/api/direct_messages/sent.json')])
297+ [mock.call('http://identi.ca/api/direct_messages.json?count=50'),
298+ mock.call('http://identi.ca/api/direct_messages' +
299+ '/sent.json?count=50')])
300
301 def test_send_private(self):
302 get_url = self.protocol._get_url = mock.Mock(return_value='tweet')
303
304=== modified file 'friends/tests/test_twitter.py'
305--- friends/tests/test_twitter.py 2013-02-27 22:22:38 +0000
306+++ friends/tests/test_twitter.py 2013-03-08 02:36:21 +0000
307@@ -21,13 +21,17 @@
308 ]
309
310
311+import os
312+import tempfile
313 import unittest
314+import shutil
315
316 from gi.repository import GLib, Dee
317 from urllib.error import HTTPError
318
319 from friends.protocols.twitter import RateLimiter, Twitter
320 from friends.tests.mocks import FakeAccount, FakeSoupMessage, LogMock, mock
321+from friends.utils.cache import JsonCache
322 from friends.utils.model import COLUMN_TYPES
323 from friends.errors import AuthorizationError
324
325@@ -44,6 +48,9 @@
326 """Test the Twitter API."""
327
328 def setUp(self):
329+ self._temp_cache = tempfile.mkdtemp()
330+ self._root = JsonCache._root = os.path.join(
331+ self._temp_cache, '{}.json')
332 TestModel.clear()
333 self.account = FakeAccount()
334 self.protocol = Twitter(self.account)
335@@ -54,6 +61,7 @@
336 # Ensure that any log entries we haven't tested just get consumed so
337 # as to isolate out test logger from other tests.
338 self.log_mock.stop()
339+ shutil.rmtree(self._temp_cache)
340
341 @mock.patch.dict('friends.utils.authentication.__dict__', LOGIN_TIMEOUT=1)
342 @mock.patch('friends.utils.authentication.Signon.AuthSession.new')
343@@ -162,6 +170,32 @@
344
345 @mock.patch('friends.utils.base.Model', TestModel)
346 @mock.patch('friends.utils.http.Soup.Message',
347+ FakeSoupMessage('friends.tests.data', 'twitter-home.dat'))
348+ @mock.patch('friends.protocols.twitter.Twitter._login',
349+ return_value=True)
350+ @mock.patch('friends.utils.base._seen_messages', {})
351+ @mock.patch('friends.utils.base._seen_ids', {})
352+ def test_home_since_id(self, *mocks):
353+ self.account.access_token = 'access'
354+ self.account.secret_token = 'secret'
355+ self.account.auth.parameters = dict(
356+ ConsumerKey='key',
357+ ConsumerSecret='secret')
358+ self.assertEqual(self.protocol.home(), 3)
359+
360+ with open(self._root.format('twitter_ids'), 'r') as fd:
361+ self.assertEqual(fd.read(), '{"messages": 240558470661799936}')
362+
363+ get_url = self.protocol._get_url = mock.Mock()
364+ get_url.return_value = []
365+ self.assertEqual(self.protocol.home(), 3)
366+ get_url.assert_called_once_with(
367+ 'https://api.twitter.com/1.1/statuses/' +
368+ 'home_timeline.json?count=50&since_id=240558470661799936')
369+
370+
371+ @mock.patch('friends.utils.base.Model', TestModel)
372+ @mock.patch('friends.utils.http.Soup.Message',
373 FakeSoupMessage('friends.tests.data', 'twitter-send.dat'))
374 @mock.patch('friends.protocols.twitter.Twitter._login',
375 return_value=True)
376@@ -216,7 +250,8 @@
377
378 publish.assert_called_with('tweet', stream='mentions')
379 get_url.assert_called_with(
380- 'https://api.twitter.com/1.1/statuses/mentions_timeline.json')
381+ 'https://api.twitter.com/1.1/statuses/' +
382+ 'mentions_timeline.json?count=50')
383
384 @mock.patch('friends.utils.base.Model', TestModel)
385 @mock.patch('friends.utils.base._seen_messages', {})
386@@ -270,8 +305,10 @@
387 publish.assert_called_with('tweet', stream='private')
388 self.assertEqual(
389 get_url.mock_calls,
390- [mock.call('https://api.twitter.com/1.1/direct_messages.json'),
391- mock.call('https://api.twitter.com/1.1/direct_messages/sent.json')
392+ [mock.call('https://api.twitter.com/1.1/' +
393+ 'direct_messages.json?count=50'),
394+ mock.call('https://api.twitter.com/1.1/' +
395+ 'direct_messages/sent.json?count=50')
396 ])
397
398 @mock.patch('friends.protocols.twitter.Avatar.get_image',
399@@ -302,8 +339,10 @@
400 message_id='1452456')
401 self.assertEqual(
402 get_url.mock_calls,
403- [mock.call('https://api.twitter.com/1.1/direct_messages.json'),
404- mock.call('https://api.twitter.com/1.1/direct_messages/sent.json')
405+ [mock.call('https://api.twitter.com/1.1/' +
406+ 'direct_messages.json?count=50'),
407+ mock.call('https://api.twitter.com/1.1/' +
408+ 'direct_messages/sent.json?count=50&since_id=1452456')
409 ])
410
411 @mock.patch('friends.utils.base.Model', TestModel)
412
413=== modified file 'friends/utils/base.py'
414--- friends/utils/base.py 2013-02-27 22:34:48 +0000
415+++ friends/utils/base.py 2013-03-08 02:36:21 +0000
416@@ -548,7 +548,7 @@
417 def _is_error(self, data):
418 """Is the return data an error response?"""
419 try:
420- error = data.get('error')
421+ error = data.get('error') or data.get('errors')
422 except AttributeError:
423 return False
424 if error is None:
425
426=== added file 'friends/utils/cache.py'
427--- friends/utils/cache.py 1970-01-01 00:00:00 +0000
428+++ friends/utils/cache.py 2013-03-08 02:36:21 +0000
429@@ -0,0 +1,76 @@
430+# friends-dispatcher -- send & receive messages from any social network
431+# Copyright (C) 2012 Canonical Ltd
432+#
433+# This program is free software: you can redistribute it and/or modify
434+# it under the terms of the GNU General Public License as published by
435+# the Free Software Foundation, version 3 of the License.
436+#
437+# This program is distributed in the hope that it will be useful,
438+# but WITHOUT ANY WARRANTY; without even the implied warranty of
439+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
440+# GNU General Public License for more details.
441+#
442+# You should have received a copy of the GNU General Public License
443+# along with this program. If not, see <http://www.gnu.org/licenses/>.
444+
445+"""Persistent data store using JSON."""
446+
447+__all__ = [
448+ 'JsonCache',
449+ ]
450+
451+import os
452+import json
453+import errno
454+import logging
455+
456+from gi.repository import GLib
457+
458+
459+log = logging.getLogger(__name__)
460+
461+
462+class JsonCache(dict):
463+ """Simple dict that is backed by JSON data in a text file.
464+
465+ Serializes itself to disk with every call to __setitem__, so it's
466+ not well suited for large, frequently-changing dicts. But useful
467+ for small dicts that change infrequently. Typically I expect this
468+ to be used for dicts that only change once or twice during the
469+ lifetime of the program, but needs to remember its state between
470+ invocations.
471+
472+ If, for some unforeseen reason, you do need to dump a lot of data
473+ into this dict without triggering a ton of disk writes, it is
474+ possible to call dict.update with all the new values, followed by
475+ a single call to .write(). Keep in mind that the more data you
476+ store in this dict, the slower read/writes will be with each
477+ invocation. At the time of this writing, there are only three
478+ instances used throughout Friends, and they are all under 200
479+ bytes.
480+ """
481+ # Where to store all the json files.
482+ _root = os.path.join(GLib.get_user_cache_dir(), 'friends', '{}.json')
483+
484+ def __init__(self, name):
485+ dict.__init__(self)
486+ self._path = self._root.format(name)
487+
488+ try:
489+ with open(self._path, 'r') as cache:
490+ self.update(json.loads(cache.read()))
491+ except IOError as error:
492+ if error.errno != errno.ENOENT:
493+ raise
494+ # This writes '{}' to self._filename on first run.
495+ self.write()
496+
497+ def write(self):
498+ """Write our dict contents to disk as a JSON string."""
499+ with open(self._path, 'w') as cache:
500+ cache.write(json.dumps(self))
501+
502+ def __setitem__(self, key, value):
503+ """Write to disk every time dict is updated."""
504+ dict.__setitem__(self, key, value)
505+ self.write()

Subscribers

People subscribed via source and target branches