Friends

Merge lp:~robru/friends/linkify into lp:~super-friends/friends/raring

linkify
Merge into raring

Proposed by Robert Bruce Park on 2013-03-19

Status:	Merged
Approved by:	Ken VanDine on 2013-03-19
Approved revision:	178
Merged at revision:	177
Proposed branch:	lp:~robru/friends/linkify
Merge into:	lp:~super-friends/friends/raring
Diff against target:	176 lines (+116/-3) 3 files modified friends/tests/test_protocols.py (+76/-1) friends/tests/test_twitter.py (+3/-2) friends/utils/base.py (+37/-0)
To merge this branch:	bzr merge lp:~robru/friends/linkify
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Ken VanDine		2013-03-19	Approve on 2013-03-19
Review via email: mp+153965@code.launchpad.net

Description of the change

Alright Ken, the regex was a little bit more complicated than I had anticipated, but I've taken steps to ensure that it is readable, and also I've included some quite comprehensive tests against it, so long term maintenance of it shouldn't be too difficult.

It's speedy, too! You were complaining that the frontend was not very performant while trying to linkify URLs, so I did a bit of profiling with this. Seems it handles an empty string in 6.0 usec, a URL-only string in 10.8 usec, an average sentence in 13.8 usec, and max-tweet-length in 16.6 usec.

Also, on top of that, we also get the performance benefit of only having to do it once, and then caching the result (as opposed to the frontend which would have to re-do it every time it displayed any message).

lp:~robru/friends/linkify updated on 2013-03-19

178. By Robert Bruce Park on 2013-03-19: Make the deduplication logic a little bit more robust.

Revision history for this message

Ken VanDine (ken-vandine) wrote on 2013-03-19:

Works great!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Robert Bruce Park

Super Friends

 === modified file 'friends/tests/test_protocols.py'
 --- friends/tests/test_protocols.py	2013-03-14 19:14:03 +0000
 +++ friends/tests/test_protocols.py	2013-03-19 00:45:26 +0000
@@ -27,7 +27,7 @@
  from friends.protocols.flickr import Flickr
  from friends.protocols.twitter import Twitter
  from friends.tests.mocks import FakeAccount, LogMock, TestModel, mock
--from friends.utils.base import Base, feature
++from friends.utils.base import Base, feature, linkify_string
  from friends.utils.manager import ProtocolManager
  from friends.utils.model import COLUMN_INDICES, Model
@@ -392,3 +392,78 @@
      def test_features(self):
          self.assertEqual(MyProtocol.get_features(), ['feature_1', 'feature_2'])
++
++    def test_linkify_string(self):
++        # String with no URL is unchanged.
++        self.assertEqual('Hello!', linkify_string('Hello!'))
++        # http:// works.
++        self.assertEqual(
++            '<a href="http://www.example.com">http://www.example.com</a>',
++            linkify_string('http://www.example.com'))
++        # https:// works, too.
++        self.assertEqual(
++            '<a href="https://www.example.com">https://www.example.com</a>',
++            linkify_string('https://www.example.com'))
++        # http:// is optional if you include www.
++        self.assertEqual(
++            '<a href="www.example.com">www.example.com</a>',
++            linkify_string('www.example.com'))
++        # Haha, nobody uses ftp anymore!
++        self.assertEqual(
++            '<a href="ftp://example.com/">ftp://example.com/</a>',
++            linkify_string('ftp://example.com/'))
++        # Trailing periods are not linkified.
++        self.assertEqual(
++            '<a href="http://example.com">http://example.com</a>.',
++            linkify_string('http://example.com.'))
++        # URL can contain periods without getting cut off.
++        self.assertEqual(
++            '<a href="http://example.com/products/buy.html">'
++            'http://example.com/products/buy.html</a>.',
++            linkify_string('http://example.com/products/buy.html.'))
++        # Don't linkify trailing brackets.
++        self.assertEqual(
++            'Example Co (<a href="http://example.com">http://example.com</a>).',
++            linkify_string('Example Co (http://example.com).'))
++        # Don't linkify trailing exclamation marks.
++        self.assertEqual(
++            'Go to <a href="https://example.com">https://example.com</a>!',
++            linkify_string('Go to https://example.com!'))
++        # Don't linkify trailing commas, also ensure all links are found.
++        self.assertEqual(
++            '<a href="www.example.com">www.example.com</a>, <a '
++            'href="http://example.com/stuff">http://example.com/stuff</a>, and '
++            '<a href="http://example.com/things">http://example.com/things</a> '
++            'are my favorite sites.',
++            linkify_string('www.example.com, http://example.com/stuff, and '
++                           'http://example.com/things are my favorite sites.'))
++        # Don't linkify trailing question marks.
++        self.assertEqual(
++            'Ever been to <a href="www.example.com">www.example.com</a>?',
++            linkify_string('Ever been to www.example.com?'))
++        # URLs can contain question marks ok.
++        self.assertEqual(
++            'Like <a href="http://example.com?foo=bar&grill=true">'
++            'http://example.com?foo=bar&grill=true</a>?',
++            linkify_string('Like http://example.com?foo=bar&grill=true?'))
++        # Multi-line strings are also supported.
++        self.assertEqual(
++            'Hey, visit us online!\n\n'
++            '<a href="http://example.com">http://example.com</a>',
++            linkify_string('Hey, visit us online!\n\nhttp://example.com'))
++        # Don't accidentally duplicate linkification.
++        self.assertEqual(
++            '<a href="www.example.com">click here!</a>',
++            linkify_string('<a href="www.example.com">click here!</a>'))
++        self.assertEqual(
++            '<a href="www.example.com">www.example.com</a>',
++            linkify_string('<a href="www.example.com">www.example.com</a>'))
++        self.assertEqual(
++            '<a href="www.example.com">www.example.com</a> is our website',
++            linkify_string(
++                '<a href="www.example.com">www.example.com</a> is our website'))
++        # This, apparently, is valid HTML.
++        self.assertEqual(
++            '<a href = "www.example.com">www.example.com</a>',
++            linkify_string(
++                '<a href = "www.example.com">www.example.com</a>'))
 === modified file 'friends/tests/test_twitter.py'
 --- friends/tests/test_twitter.py	2013-03-14 19:14:03 +0000
 +++ friends/tests/test_twitter.py	2013-03-19 00:45:26 +0000
@@ -141,8 +141,9 @@
               ],
              ['twitter', 88, '240556426106372096',
               'messages', 'Raffi Krikorian', '8285392', 'raffi', False,
--             '2012-08-28T21:08:15Z', 'lecturing at the "analyzing big data ' +
--             'with twitter" class at @cal with @othman  http://t.co/bfj7zkDJ',
++             '2012-08-28T21:08:15Z', 'lecturing at the "analyzing big data '
++             'with twitter" class at @cal with @othman  '
++             '<a href="http://t.co/bfj7zkDJ">http://t.co/bfj7zkDJ</a>',
               GLib.get_user_cache_dir() +
               '/friends/avatars/0219effc03a3049a622476e6e001a4014f33dc31',
               'https://twitter.com/raffi/status/240556426106372096',
 === modified file 'friends/utils/base.py'
 --- friends/utils/base.py	2013-03-14 13:29:53 +0000
 +++ friends/utils/base.py	2013-03-19 00:45:26 +0000
@@ -23,6 +23,7 @@
+     ]
++import re
  import time
  import logging
  import threading
@@ -51,6 +52,35 @@
  ACCT_IDX = COLUMN_INDICES['account_id']
  TIME_IDX = COLUMN_INDICES['timestamp']
++# See friends/tests/test_protocols.py for further documentation
++LINKIFY_REGEX = re.compile(
++    r"""
++    # Do not match if URL is preceded by '"' or '>'
++    # This is used to prevent duplication of linkification.
++    (?<![\"\>])
++    # Record everything that we're about to match.
++    (
++      # URLs can start with 'http://', 'https://', 'ftp://', or 'www.'
++      (?:(?:https?|ftp)://|www\.)
++      # Match many non-whitespace characters, but not greedily.
++      (?:\S+?)
++    # Stop recording the match.
++    )
++    # This section will peek ahead (without matching) in order to
++    # determine precisely where the URL actually *ends*.
++    (?=
++      # Do not include any trailing period, comma, exclamation mark,
++      # question mark, or closing parentheses, if any are present.
++      [.,!?\)]*
++      # With "trailing" defined as immediately preceding the first
++      # space, or end-of-string.
++      (?:\s|$)
++      # But abort the whole thing if the URL ends with '</a>',
++      # again to prevent duplication of linkification.
++      (?!</a>)
++    )""",
++    flags=re.VERBOSE).sub
++
  # This is a mapping from message_ids to DeeModel row index ints. It is
  # used for quickly and easily preventing the same message from being
@@ -101,6 +131,11 @@
      log.debug('_seen_ids: {}'.format(len(_seen_ids)))
++def linkify_string(string):
++    """Finds all URLs in a string and turns them into HTML links."""
++    return LINKIFY_REGEX(r'<a href="\1">\1</a>', string)
++
++
  class _OperationThread(threading.Thread):
      """Manage async callbacks, and log subthread exceptions."""
      # main.py will replace this with a reference to the mainloop.quit method
@@ -324,6 +359,8 @@
                  account_id=self._account.id
+                 )
+             )
++        # linkify the message
++        kwargs['message'] = linkify_string(kwargs.get('message', ''))
          args = []
          # Now iterate through all the column names listed in the
          # SCHEMA, and pop matching column values from the kwargs, in

Friends

Merge lp:~robru/friends/linkify into lp:~super-friends/friends/raring

Commit message

Description of the change

Preview Diff

Subscribers