Merge lp:~jcsackett/launchpad/linkifier-bugs-2 into lp:launchpad
| Status: | Merged | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Approved by: | j.c.sackett on 2010-12-17 | ||||||||
| Approved revision: | no longer in the source branch. | ||||||||
| Merged at revision: | 12102 | ||||||||
| Proposed branch: | lp:~jcsackett/launchpad/linkifier-bugs-2 | ||||||||
| Merge into: | lp:launchpad | ||||||||
| Diff against target: |
218 lines (+135/-18) 2 files modified
lib/lp/app/browser/stringformatter.py (+67/-15) lib/lp/app/browser/tests/test_stringformatter.py (+68/-3) |
||||||||
| To merge this branch: | bzr merge lp:~jcsackett/launchpad/linkifier-bugs-2 | ||||||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Brad Crittenden (community) | code | 2010-12-16 | Approve on 2010-12-17 |
|
Review via email:
|
|||
Commit Message
[r=bac][ui=none][bug=60172,118284] Fixes linkifier so URIs of just a protocol are not linked, and parenthesis are handled better.
Description of the Change
Summary
=======
As part of bugjam2010, a series of bugs having to do with the linkifier have been targeted. This branch deals with standard URI links that need some postprocessing after the regex capture.
URIs are identified in stringformatter via a serious regex, but to do certain things (like reject urls that consist of just a protocol with no domain) you would have to modify the regex to catch invalid urls, which would make it even harder to deal with. Instead it's easier and cleaner to just make decisions about when to linkify based on the output of the regex.
Implementation
==============
Two methods have been added to the FormattersAPI in lib/lp/
The first, _linkify_
The second _handle_
Tests
=====
bin/test -t TestLinkifyingP
QA
==
Create a bug comment on launchpad.dev with the following text:
"http:// isn't a link. http://
The first http:// shouldn't link. The second URI should link, including the closing paren at the end.
Lint
====
= Launchpad lint =
Checking for conflicts and issues in changed files.
Linting changed files:
lib/lp/
lib/lp/
./lib/lp/
450: E501 line too long (90 characters)
454: E501 line too long (88 characters)
450: Line exceeds 78 characters.
454: Line exceeds 78 characters.
The two lines that are too long are part of the regex, which is difficult enough to read without breaking up those sections.
| j.c.sackett (jcsackett) wrote : | # |
> Hi Jon,
>
> * _handle_
> parameters. Just looking at it cold I cannot figure out the purpose. A
> trailer is something you haul cows to the sale barn in. You also agreed to
> add some more corner case tests -- thanks!
Done.
> * In your blacklist method please use triple double-quotes for the docstring
> open/close and do put in closing punctuation.
Done.
> * On IRC we discussed a name better than 'blacklist'. How about 'ignored' or
> something even better if you can think of it?
Couldn't think of better than ignored. Done.
> * Please create a dict and use named targets in:
>
> 83 + return '<a rel="nofollow" href="%s">%s</a>%s' % (
> 84 + cgi.escape(url, quote=True),
> 85 + add_word_
> 86 + cgi.escape(
Done.
> * FormattersAPI.
> is way down below it -- I think some code was re-arranged. Please put it
> closer to the call site.
It's actually called in several places; I've moved the method so it's grouped with _handle_parens and the former blacklist method, that places it closer to the methods it's called in.
> Also document in English what is being matched for
> the RE impaired to easily understand it.
I'm still not sure which RE section you're asking about? Can you include the block or file number so I can find it?
>
> Other than those issues the branch looks good and thorough. Thanks. Marking
> it approved, pending changes.
| Brad Crittenden (bac) wrote : | # |
Thanks for the fixes. I was referring to _re_url_trailers.
| j.c.sackett (jcsackett) wrote : | # |
> I was referring to _re_url_trailers.
Ah, okay. I was trying to find an regex I had made or altered, therein was my confusion. I've updated comments for it. Thanks for the review.
| Robert Collins (lifeless) wrote : | # |
blacklisting protocols seems bizarre.
Why not just create a URL object and check that more than the scheme
is filled out?
| j.c.sackett (jcsackett) wrote : | # |
That actually seems bizarre to me; why build a new object when we're working with strings?
This also allows us to expand to other things we might wish to ignore, should another case crop up.
Blacklisting as a notion has been removed, if the loaded definition of that bothers you--we're just treating them as things the linkifier should ignore.

Hi Jon,
* _handle_ parens_ in_trailers needs a doc string with explanation of parameters. Just looking at it cold I cannot figure out the purpose. A trailer is something you haul cows to the sale barn in. You also agreed to add some more corner case tests -- thanks!
* In your blacklist method please use triple double-quotes for the docstring open/close and do put in closing punctuation.
* On IRC we discussed a name better than 'blacklist'. How about 'ignored' or something even better if you can think of it?
* Please create a dict and use named targets in:
83 + return '<a rel="nofollow" href="%s">%s</a>%s' % ( breaks( cgi.escape( url)), trailers) )
84 + cgi.escape(url, quote=True),
85 + add_word_
86 + cgi.escape(
* FormattersAPI. _split_ url_and_ trailers is only used once but its definition is way down below it -- I think some code was re-arranged. Please put it closer to the call site. Also document in English what is being matched for the RE impaired to easily understand it.
Other than those issues the branch looks good and thorough. Thanks. Marking it approved, pending changes.