Friends

Merge lp:~robru/friends/since_id into lp:friends

since_id
Merge into trunk

Proposed by Robert Bruce Park on 2013-03-08

Status:

Merged

Approved by:

Ken VanDine on 2013-03-08

Approved revision:

158

Merged at revision:

160

Proposed branch:

lp:~robru/friends/since_id

Merge into:

lp:friends

Diff against target:

505 lines (+250/-34)

6 files modified

friends/protocols/twitter.py (+48/-25)
friends/tests/test_cache.py (+69/-0)
friends/tests/test_identica.py (+12/-3)
friends/tests/test_twitter.py (+44/-5)
friends/utils/base.py (+1/-1)
friends/utils/cache.py (+76/-0)

To merge this branch:

bzr merge lp:~robru/friends/since_id

Critical

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Ken VanDine		2013-03-08	Approve on 2013-03-08
PS Jenkins bot (community)	continuous-integration		Approve on 2013-03-08
Review via email: mp+152315@code.launchpad.net

Commit message

Start using since_id= on Twitter API requests. (LP: #1152417)

This was accomplished by implementing two new classes, and ended up
simplifying some of the RateLimiter code as a side effect.

    The first new class is called JsonCache. It is a subclass of dict,
    which attempts to populate it's initial state by reading in a json
    text file at a configurable location, and also adds a new "write()"
    method that dumps the json back out to the same location. This class
    was a generalization of what we were already doing inside the
    RateLimiter, so it should not be considered a "new feature" if we are
    going to butt heads with today's feature freeze.

The second new class is a subclass of JsonCache, which enforces that:

    A) keys may not contain slashes, to avoid it getting polluted with
    every search term ever searched for, or every message that's ever been
    replied to ever, it only observes the values of the "main" streams,
    such as "messages", "mentions" and "private", although those values
    are not hardcoded so it's flexible to adapt to new streams in the
    future.

    B) values must be ints (tweet_ids), and values can only be
    incremented. This is so that we can easily just throw every observed
    tweet_id into the cache, and it only records the largest (newest) one.

    The end result is that we now have two new files located at
    ~/.cache/friends/twitter_ids.json and
    ~/.cache/friends/identica_ids.json which track the newest tweet_id
    that we have ever seen for each of the streams that we publish to.
    These values are then consulted to form the since_id= argument to
    several of Twitter's API endpoints, which solves bug #1152417.

    As an added bonus, this also greatly reduces our network usage because
    we are no longer redownloading duplicate messages over and over, so if
    there are no new messages, Twitter is now returning an empty list of
    Tweets rather than a large list of stale tweets.

This commit includes full test coverage for all new code.

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2013-03-08:

PASSED: Continuous integration, rev:158
http://jenkins.qa.ubuntu.com/job/friends-ci/3/
Executed test runs:
SUCCESS: http://jenkins.qa.ubuntu.com/job/friends-raring-amd64-ci/3//console

Click here to trigger a rebuild:
http://jenkins.qa.ubuntu.com/job/friends-ci/3//rebuild/?

review: Approve (continuous-integration)

Revision history for this message

Ken VanDine (ken-vandine) wrote on 2013-03-08:

Looks great, works well and i love the tests!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Robert Bruce Park

Super Friends

 === modified file 'friends/protocols/twitter.py'
 --- friends/protocols/twitter.py	2013-03-01 19:52:25 +0000
 +++ friends/protocols/twitter.py	2013-03-08 02:36:21 +0000
@@ -22,10 +22,7 @@
+     ]
--import os
  import time
--import json
--import errno
  import logging
  from urllib.parse import quote
@@ -33,14 +30,14 @@
  from friends.utils.avatar import Avatar
  from friends.utils.base import Base, feature
++from friends.utils.cache import JsonCache
++from friends.utils.model import Model
  from friends.utils.http import BaseRateLimiter, Downloader
  from friends.utils.time import parsetime, iso8601utc
  from friends.errors import FriendsError
  TWITTER_ADDRESS_BOOK = 'friends-twitter-contacts'
--TWITTER_RATELIMITER_CACHE = os.path.join(
--    GLib.get_user_cache_dir(), 'friends', 'twitter.rates')
  log = logging.getLogger(__name__)
@@ -72,6 +69,8 @@
      def __init__(self, account):
          super().__init__(account)
          self._rate_limiter = RateLimiter()
++        # Can be 'twitter_ids' or 'identica_ids'
++        self._tweet_ids = TweetIdCache(self.__class__.__name__.lower() + '_ids')
      def _whoami(self, authdata):
          """Identify the authenticating user."""
@@ -103,6 +102,13 @@
              log.info('Ignoring tweet with no id_str value')
              return
++        # We need to record tweet_ids for use with since_id. Note that
++        # _tweet_ids is a special dict subclass that only accepts
++        # tweet_ids that are larger than the existing value, so at any
++        # given time it will map the stream to the largest (most
++        # recent) tweet_id we've seen for that stream.
++        self._tweet_ids[stream] = tweet_id
++
          # 'user' for tweets, 'sender' for direct messages.
          user = tweet.get('user', {}) or tweet.get('sender', {})
          screen_name = user.get('screen_name', '')
@@ -129,12 +135,20 @@
+             )
          return permalink
++    def _append_since(self, url, stream='messages'):
++        since = self._tweet_ids.get(stream)
++        if since is not None:
++            return '{}&since_id={}'.format(url, since)
++        return url
++
  # https://dev.twitter.com/docs/api/1.1/get/statuses/home_timeline
      @feature
      def home(self):
          """Gather the user's home timeline."""
--        url = self._timeline.format(
--            'home') + '?count={}'.format(self._DOWNLOAD_LIMIT)
++        url = '{}?count={}'.format(
++            self._timeline.format('home'),
++            self._DOWNLOAD_LIMIT)
++        url = self._append_since(url)
          for tweet in self._get_url(url):
              self._publish_tweet(tweet)
          return self._get_n_rows()
@@ -143,7 +157,10 @@
      @feature
      def mentions(self):
          """Gather the tweets that mention us."""
--        url = self._mentions_timeline
++        url = '{}?count={}'.format(
++            self._mentions_timeline,
++            self._DOWNLOAD_LIMIT)
++        url = self._append_since(url, 'mentions')
          for tweet in self._get_url(url):
              self._publish_tweet(tweet, stream='mentions')
          return self._get_n_rows()
@@ -185,11 +202,17 @@
      @feature
      def private(self):
          """Gather the direct messages sent to/from us."""
--        url = self._api_base.format(endpoint='direct_messages')
++        url = '{}?count={}'.format(
++            self._api_base.format(endpoint='direct_messages'),
++            self._DOWNLOAD_LIMIT)
++        url = self._append_since(url, 'private')
          for tweet in self._get_url(url):
              self._publish_tweet(tweet, stream='private')
--        url = self._api_base.format(endpoint='direct_messages/sent')
++        url = '{}?count={}'.format(
++            self._api_base.format(endpoint='direct_messages/sent'),
++            self._DOWNLOAD_LIMIT)
++        url = self._append_since(url, 'private')
          for tweet in self._get_url(url):
              self._publish_tweet(tweet, stream='private')
          return self._get_n_rows()
@@ -374,35 +397,36 @@
          return self._delete_service_contacts(source)
++class TweetIdCache(JsonCache):
++    """Persist most-recent tweet_ids as JSON."""
++
++    def __setitem__(self, key, value):
++        if key.find('/') >= 0:
++            # Don't flood the cache with irrelevant "reply_to/..." and
++            # "search/..." streams, we only need the main streams.
++            return
++        value = int(value)
++        if value > self.get(key, 0):
++            JsonCache.__setitem__(self, key, value)
++
++
  class RateLimiter(BaseRateLimiter):
      """Twitter rate limiter."""
      def __init__(self):
--        try:
--            with open(TWITTER_RATELIMITER_CACHE, 'r') as cache:
--                self._limits = json.loads(cache.read())
--        except IOError as error:
--            if error.errno != errno.ENOENT:
--                raise
--            # File not found, so create it:
--            self._limits = {}
--            self._persist_data()
++        self._limits = JsonCache('twitter-ratelimiter')
      def _sanitize_url(self, uri):
          # Cache the URL sans any query parameters.
          return uri.host + uri.path
--    def _persist_data(self):
--        with open(TWITTER_RATELIMITER_CACHE, 'w') as cache:
--            cache.write(json.dumps(self._limits))
--
      def wait(self, message):
          # If we haven't seen this URL, default to no wait.
          seconds = self._limits.pop(self._sanitize_url(message.get_uri()), 0)
          log.debug('Sleeping for {} seconds!'.format(seconds))
          time.sleep(seconds)
          # Don't sleep the same length of time more than once!
--        self._persist_data()
++        self._limits.write()
      def update(self, message):
          info = message.response_headers
@@ -427,7 +451,6 @@
              else:
                  wait_secs = rate_delta / rate_count
                  self._limits[url] = wait_secs
--            self._persist_data()
              log.debug(
                  'Next access to {} must wait {} seconds!'.format(
                      url, self._limits.get(url, 0)))
 === added file 'friends/tests/test_cache.py'
 --- friends/tests/test_cache.py	1970-01-01 00:00:00 +0000
 +++ friends/tests/test_cache.py	2013-03-08 02:36:21 +0000
@@ -0,0 +1,69 @@
++# friends-dispatcher -- send & receive messages from any social network
++# Copyright (C) 2012  Canonical Ltd
++#
++# This program is free software: you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation, version 3 of the License.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++"""Test the JSON cacher."""
++
++__all__ = [
++    'TestJsonCache',
++    ]
++
++
++import os
++import time
++import shutil
++import tempfile
++import unittest
++
++from datetime import date, timedelta
++from pkg_resources import resource_filename
++
++from friends.utils.cache import JsonCache
++
++
++class TestJsonCache(unittest.TestCase):
++    """Test JsonCache logic."""
++
++    def setUp(self):
++        self._temp_cache = tempfile.mkdtemp()
++        self._root = JsonCache._root = os.path.join(
++            self._temp_cache, '{}.json')
++
++    def tearDown(self):
++        # Clean up the temporary cache directory.
++        shutil.rmtree(self._temp_cache)
++
++    def test_creation(self):
++        cache = JsonCache('foo')
++        with open(self._root.format('foo'), 'r') as fd:
++            empty = fd.read()
++        self.assertEqual(empty, '{}')
++
++    def test_values(self):
++        cache = JsonCache('bar')
++        cache['hello'] = 'world'
++        with open(self._root.format('bar'), 'r') as fd:
++            result = fd.read()
++        self.assertEqual(result, '{"hello": "world"}')
++
++    def test_writes(self):
++        cache = JsonCache('stuff')
++        cache.update(dict(pi=289/92))
++        with open(self._root.format('stuff'), 'r') as fd:
++            empty = fd.read()
++        self.assertEqual(empty, '{}')
++        cache.write()
++        with open(self._root.format('stuff'), 'r') as fd:
++            result = fd.read()
++        self.assertEqual(result, '{"pi": 3.141304347826087}')
 === modified file 'friends/tests/test_identica.py'
 --- friends/tests/test_identica.py	2013-02-05 01:11:35 +0000
 +++ friends/tests/test_identica.py	2013-03-08 02:36:21 +0000
@@ -21,12 +21,16 @@
+     ]
++import os
++import tempfile
  import unittest
++import shutil
  from gi.repository import Dee
  from friends.protocols.identica import Identica
  from friends.tests.mocks import FakeAccount, LogMock, mock
++from friends.utils.cache import JsonCache
  from friends.utils.model import COLUMN_TYPES
  from friends.errors import AuthorizationError
@@ -43,6 +47,9 @@
      """Test the Identica API."""
      def setUp(self):
++        self._temp_cache = tempfile.mkdtemp()
++        self._root = JsonCache._root = os.path.join(
++            self._temp_cache, '{}.json')
          self.account = FakeAccount()
          self.protocol = Identica(self.account)
          self.log_mock = LogMock('friends.utils.base',
@@ -52,6 +59,7 @@
          # Ensure that any log entries we haven't tested just get consumed so
          # as to isolate out test logger from other tests.
          self.log_mock.stop()
++        shutil.rmtree(self._temp_cache)
      @mock.patch.dict('friends.utils.authentication.__dict__', LOGIN_TIMEOUT=1)
      @mock.patch('friends.utils.authentication.Signon.AuthSession.new')
@@ -83,7 +91,7 @@
          publish.assert_called_with('tweet', stream='mentions')
          get_url.assert_called_with(
--            'http://identi.ca/api/statuses/mentions.json')
++            'http://identi.ca/api/statuses/mentions.json?count=50')
      def test_user(self):
          get_url = self.protocol._get_url = mock.Mock(return_value=['tweet'])
@@ -116,8 +124,9 @@
          publish.assert_called_with('tweet', stream='private')
          self.assertEqual(
              get_url.mock_calls,
--            [mock.call('http://identi.ca/api/direct_messages.json'),
--             mock.call('http://identi.ca/api/direct_messages/sent.json')])
++            [mock.call('http://identi.ca/api/direct_messages.json?count=50'),
++             mock.call('http://identi.ca/api/direct_messages' +
++                       '/sent.json?count=50')])
      def test_send_private(self):
          get_url = self.protocol._get_url = mock.Mock(return_value='tweet')
 === modified file 'friends/tests/test_twitter.py'
 --- friends/tests/test_twitter.py	2013-02-27 22:22:38 +0000
 +++ friends/tests/test_twitter.py	2013-03-08 02:36:21 +0000
@@ -21,13 +21,17 @@
+     ]
++import os
++import tempfile
  import unittest
++import shutil
  from gi.repository import GLib, Dee
  from urllib.error import HTTPError
  from friends.protocols.twitter import RateLimiter, Twitter
  from friends.tests.mocks import FakeAccount, FakeSoupMessage, LogMock, mock
++from friends.utils.cache import JsonCache
  from friends.utils.model import COLUMN_TYPES
  from friends.errors import AuthorizationError
@@ -44,6 +48,9 @@
      """Test the Twitter API."""
      def setUp(self):
++        self._temp_cache = tempfile.mkdtemp()
++        self._root = JsonCache._root = os.path.join(
++            self._temp_cache, '{}.json')
          TestModel.clear()
          self.account = FakeAccount()
          self.protocol = Twitter(self.account)
@@ -54,6 +61,7 @@
          # Ensure that any log entries we haven't tested just get consumed so
          # as to isolate out test logger from other tests.
          self.log_mock.stop()
++        shutil.rmtree(self._temp_cache)
      @mock.patch.dict('friends.utils.authentication.__dict__', LOGIN_TIMEOUT=1)
      @mock.patch('friends.utils.authentication.Signon.AuthSession.new')
@@ -162,6 +170,32 @@
      @mock.patch('friends.utils.base.Model', TestModel)
      @mock.patch('friends.utils.http.Soup.Message',
++                FakeSoupMessage('friends.tests.data', 'twitter-home.dat'))
++    @mock.patch('friends.protocols.twitter.Twitter._login',
++                return_value=True)
++    @mock.patch('friends.utils.base._seen_messages', {})
++    @mock.patch('friends.utils.base._seen_ids', {})
++    def test_home_since_id(self, *mocks):
++        self.account.access_token = 'access'
++        self.account.secret_token = 'secret'
++        self.account.auth.parameters = dict(
++            ConsumerKey='key',
++            ConsumerSecret='secret')
++        self.assertEqual(self.protocol.home(), 3)
++
++        with open(self._root.format('twitter_ids'), 'r') as fd:
++            self.assertEqual(fd.read(), '{"messages": 240558470661799936}')
++
++        get_url = self.protocol._get_url = mock.Mock()
++        get_url.return_value = []
++        self.assertEqual(self.protocol.home(), 3)
++        get_url.assert_called_once_with(
++            'https://api.twitter.com/1.1/statuses/' +
++            'home_timeline.json?count=50&since_id=240558470661799936')
++
++
++    @mock.patch('friends.utils.base.Model', TestModel)
++    @mock.patch('friends.utils.http.Soup.Message',
                  FakeSoupMessage('friends.tests.data', 'twitter-send.dat'))
      @mock.patch('friends.protocols.twitter.Twitter._login',
                  return_value=True)
@@ -216,7 +250,8 @@
          publish.assert_called_with('tweet', stream='mentions')
          get_url.assert_called_with(
--            'https://api.twitter.com/1.1/statuses/mentions_timeline.json')
++            'https://api.twitter.com/1.1/statuses/' +
++            'mentions_timeline.json?count=50')
      @mock.patch('friends.utils.base.Model', TestModel)
      @mock.patch('friends.utils.base._seen_messages', {})
@@ -270,8 +305,10 @@
          publish.assert_called_with('tweet', stream='private')
          self.assertEqual(
              get_url.mock_calls,
--            [mock.call('https://api.twitter.com/1.1/direct_messages.json'),
--             mock.call('https://api.twitter.com/1.1/direct_messages/sent.json')
++            [mock.call('https://api.twitter.com/1.1/' +
++                       'direct_messages.json?count=50'),
++             mock.call('https://api.twitter.com/1.1/' +
++                       'direct_messages/sent.json?count=50')
               ])
      @mock.patch('friends.protocols.twitter.Avatar.get_image',
@@ -302,8 +339,10 @@
              message_id='1452456')
          self.assertEqual(
              get_url.mock_calls,
--            [mock.call('https://api.twitter.com/1.1/direct_messages.json'),
--             mock.call('https://api.twitter.com/1.1/direct_messages/sent.json')
++            [mock.call('https://api.twitter.com/1.1/' +
++                       'direct_messages.json?count=50'),
++             mock.call('https://api.twitter.com/1.1/' +
++                       'direct_messages/sent.json?count=50&since_id=1452456')
               ])
      @mock.patch('friends.utils.base.Model', TestModel)
 === modified file 'friends/utils/base.py'
 --- friends/utils/base.py	2013-02-27 22:34:48 +0000
 +++ friends/utils/base.py	2013-03-08 02:36:21 +0000
@@ -548,7 +548,7 @@
      def _is_error(self, data):
          """Is the return data an error response?"""
          try:
--            error = data.get('error')
++            error = data.get('error') or data.get('errors')
          except AttributeError:
              return False
          if error is None:
 === added file 'friends/utils/cache.py'
 --- friends/utils/cache.py	1970-01-01 00:00:00 +0000
 +++ friends/utils/cache.py	2013-03-08 02:36:21 +0000
@@ -0,0 +1,76 @@
++# friends-dispatcher -- send & receive messages from any social network
++# Copyright (C) 2012  Canonical Ltd
++#
++# This program is free software: you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation, version 3 of the License.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++"""Persistent data store using JSON."""
++
++__all__ = [
++    'JsonCache',
++    ]
++
++import os
++import json
++import errno
++import logging
++
++from gi.repository import GLib
++
++
++log = logging.getLogger(__name__)
++
++
++class JsonCache(dict):
++    """Simple dict that is backed by JSON data in a text file.
++
++    Serializes itself to disk with every call to __setitem__, so it's
++    not well suited for large, frequently-changing dicts. But useful
++    for small dicts that change infrequently. Typically I expect this
++    to be used for dicts that only change once or twice during the
++    lifetime of the program, but needs to remember its state between
++    invocations.
++
++    If, for some unforeseen reason, you do need to dump a lot of data
++    into this dict without triggering a ton of disk writes, it is
++    possible to call dict.update with all the new values, followed by
++    a single call to .write(). Keep in mind that the more data you
++    store in this dict, the slower read/writes will be with each
++    invocation. At the time of this writing, there are only three
++    instances used throughout Friends, and they are all under 200
++    bytes.
++    """
++    # Where to store all the json files.
++    _root = os.path.join(GLib.get_user_cache_dir(), 'friends', '{}.json')
++
++    def __init__(self, name):
++        dict.__init__(self)
++        self._path = self._root.format(name)
++
++        try:
++            with open(self._path, 'r') as cache:
++                self.update(json.loads(cache.read()))
++        except IOError as error:
++            if error.errno != errno.ENOENT:
++                raise
++            # This writes '{}' to self._filename on first run.
++            self.write()
++
++    def write(self):
++        """Write our dict contents to disk as a JSON string."""
++        with open(self._path, 'w') as cache:
++            cache.write(json.dumps(self))
++
++    def __setitem__(self, key, value):
++        """Write to disk every time dict is updated."""
++        dict.__setitem__(self, key, value)
++        self.write()

Friends

Merge lp:~robru/friends/since_id into lp:friends

Commit message

Description of the change

Preview Diff

Subscribers