Merge lp:~coalwater/lernid/fix-810122 into lp:lernid

Proposed by Mohammad AbuShady
Status: Needs review
Proposed branch: lp:~coalwater/lernid/fix-810122
Merge into: lp:lernid
Diff against target: 25 lines (+2/-2)
2 files modified
lernid/widgets/Browser.py (+1/-1)
lernid/widgets/IrcWidget.py (+1/-1)
To merge this branch: bzr merge lp:~coalwater/lernid/fix-810122
Reviewer Review Type Date Requested Status
John S. Gruber Needs Fixing
Mohammad AbuShady (community) not sure :d Needs Resubmitting
Lernid Development Team Pending
Review via email: mp+100341@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John S. Gruber (jsjgruber) wrote :

Your patch is certainly an improvement, url's ending in a comma are now properly marked so that clicking on them worked for the examples I tried.

The autoloading still doesn't work, however. You might take a look at the autoloading search code at around line 232 in lernid/widgets/Browser.py.

review: Needs Fixing
Revision history for this message
Mohammad AbuShady (coalwater) wrote :

I could change the regex manually too, but is there somehow i could make both files read the same regex from the same place, like some global variable maybe?

Revision history for this message
Mohammad AbuShady (coalwater) wrote :

I also don't know if i need a special permission for lernid to auto load my url

Revision history for this message
John S. Gruber (jsjgruber) wrote :

I wouldn't worry about duplicating them. The code surrounding them is in the appropriate place in each case..

I think the easiest way to test is to start x-chat and join ##ub and then start lernid with the --classroom \#\#ub --unsafe-override options. The backslashes are so the shell you are using doesn't process the # symbols. I guess lernid --classroom "##ub" --unsafe-override would work, too. Then you type url's and slide directives from x_chat into ##ub.

You can also make a custom config using /etc/lernidclassroom.d/x file as a template. The config can point to your own irc rooms and your own google calendar. That's often what I do. lernid --config file:///home/coalwater/lernidconfig, for example.

lp:~coalwater/lernid/fix-810122 updated
232. By Mohammad AbuShady

Fixing the regex in Browser widget to support autoloading

Revision history for this message
Mohammad AbuShady (coalwater) wrote :

Launchpad could be a little confusing sometimes, *tries to resubmit*

review: Needs Resubmitting (not sure :d)
Revision history for this message
John S. Gruber (jsjgruber) wrote :

I missed something when reviewing your fix above. The one in Browser.py doesn't include in the returning URL any surrounding brackets. Those surrounding brackets are used to tell lernid not to follow a specific URL, that the instructor just wanted to mention the URL for the student's future reference. If we were to use your regex as is we'd regress the fix for LP: #518229 Michael Budde applied. I'm sorry I missed that before.

While we're at it, do you think there's anything wrong with ignoring the case in the ircwidget, too? Seems like we might as well be consistent in this case unless you can think of a reason that it should be different.

It would save me some work if you could base your change on lp:~jsjgruber/lernid/lernid-proposed
(use the command: bzr branch lp:~jsjgruber/lernid/lernid-proposed yourtree and then make the changes in yourtree).
or you could alternatively put them on your previous code change to ircwidget.py where you caught and fixed the string verses unicode type error (nice catch, by the way). If you do it the former way you can take a look at how the changelog and the merges (including your other ones) look so far. You could also try it out and look for errors I've made (hint, hint).

The reason I ask you to base it on one of the other branches is because I get a merge conflict when
trying to merge this branch, and I might as well ask you to see how you want to resolve it since all of it is your code. I think you'll see what I mean as soon as you look at patching Ircwidget.py onto either of the above places. It's not that something's wrong with the previous patch or this one, just that there hitting the same lines of code. In your new code the regex search just goes against "text" as you have either converted it already to unicode or not, as appropriate.

Again, sorry for missing the [URL] thingy.

Thanks again for working on this.

review: Needs Fixing
Revision history for this message
Mohammad AbuShady (coalwater) wrote :

> While we're at it, do you think there's anything wrong with ignoring the case
> in the ircwidget, too? Seems like we might as well be consistent in this case
> unless you can think of a reason that it should be different.
I think we don't really need to ignore case in either cases, because we aren't using letter ranges like [a-z], I'm using '\w' which matches [a-z][A-Z] and underscores, so it's not really needed.

> I missed something when reviewing your fix above. The one in Browser.py
> doesn't include in the returning URL any surrounding brackets. Those
> surrounding brackets are used to tell lernid not to follow a specific URL,
> that the instructor just wanted to mention the URL for the student's future
> reference. If we were to use your regex as is we'd regress the fix for LP:
> #518229 Michael Budde applied. I'm sorry I missed that before.
Ok i see now what you're talking about, could you explain to me exactly the arguments and the return of the cleanurl function, cause if i understand it as a whole i'll be able to think better, i see i could just add the bracket at the start and the end but i still want to understand the function and what it should do.

> It would save me some work if you could base your change on
> lp:~jsjgruber/lernid/lernid-proposed
Ok so I'm going to branch this and try to re-apply my changes, if there are other merges that have the same problem just mention it in it's review and I'll redo it in the proposed branch.

> If you do it the former way you can take a look at
> how the changelog and the merges (including your other ones) look so far. You
> could also try it out and look for errors I've made (hint, hint).
Ok, I could take a look over there heh.

> Again, sorry for missing the [URL] thingy.
heh, no worries.

> Thanks again for working on this.
my pleasure :D

Revision history for this message
Mohammad AbuShady (coalwater) wrote :

forgot to say that \w also matches numbers

Revision history for this message
John S. Gruber (jsjgruber) wrote :

With regard to case--the test case is HTTP://www.google.com. Isn't caught with your current regex without specifying re.IGNORECASE. Otherwise I agree that the rest of the regex is not case sensitive.

HTTP://www.something.something is valid, though not the canonical way to put it. See http://tools.ietf.org/html/rfc3986#section-3.1

WRT the brackets. The code following regex says:
10 for url in urls:
11 if (url.startswith('[') and url.endswith(']')) or load:
12 ignore.append(cleanurl(url))

So the url needs to include incorporate the [] in [http://www.google.com] so this test will work and the URL will be ignored when desired.

You asked about cleanurl:

 def cleanurl(url):
            if url.startswith('['):
                url = url[1:]
            if any([url.endswith(c) for c in '.;:]']):
                url = url[:-1]
            return url

Clean url, as I think you surmise, then takes the brackets off the url before appending it to the ignore list. cleanurl takes a url found by the regex and strips not only the []'s but also some other trailing punctuation, '.;:'

url = url[1:] is a slice operation that says make url equal to the old url but without the first character. url = url[:-1] says make the new url the old url minus the very last character. [url.endswith(c) for c in '.;:]' ] is a list comprehension that, for 'http://www.google.com;' returns [False, True, False, False] . Any ([False, True, False, False]) returns True because one of the elements is True (the ';' was found as the last character). I'm not sure I've seen the any function or the endswith string method used before, perhaps they looked odd to you. List comprehensions are cool shortcuts.

Let me know if something's still not clear.

There shouldn't be any problems with your other merges--unless I've missed one they should all be in that lernid-proposed branch already (it would be great if you'd double check me on that). I think we are down to this merge.

Revision history for this message
John S. Gruber (jsjgruber) wrote :
review: Needs Fixing
Revision history for this message
Mohammad AbuShady (coalwater) wrote :

Ok i could add those missing characters, ( just enough to make those examples you posted work ) just keep in mind this won't be perfect till people start to use it and other cases start to appear, here's a list of the characters that need to be added ( the ones between the [ ] ).
[ % ; , + # ]
I will try update the branch with this and the other bug about the [ ] for ignoring URL's soon hopefully

Revision history for this message
John S. Gruber (jsjgruber) wrote :

> Ok i could add those missing characters, ( just enough to make those examples
> you posted work ) just keep in mind this won't be perfect till people start to
> use it and other cases start to appear, here's a list of the characters that
> need to be added ( the ones between the [ ] ).
> [ % ; , + # ]
> I will try update the branch with this and the other bug about the [ ] for
> ignoring URL's soon hopefully

This doesn't seem like a good idea to me. What we have now has been working for a long time, and
the only problem at hand seems to be the trailing "," . We could be trading a fix for that for
a bunch more problems -- not a good trade. We should do our best not to introduce any regressions.

Seems to me that there are two possible ways to go:

1) Update the regex to include all of the characters possible in the URL. The bnf in
http://tools.ietf.org/html/rfc3986 gives the lists. (bnf makes my head hurt, but
that's a safer way to do it. Let me know if you want to do it this way and want my help
interpretting the RFC.

2) We could also steal an approach from somewhere else that is doing it well. (Launchpad, for example).
http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/view/head:/lib/lp/app/browser/stringformatter.py
but that's pretty sophisticated. They look for the URL, match up parentheses, and remove trailing punctuation.
I didn't look at it long enough to really figure it out. Seems to me this might take quite a bit of our time.

Either 1) or 2) could open regressions, however.

3) Safe and effective, but not as comprehensive as 1) and 2), would be to simply add "," to the line
in cleanurl

 if any([url.endswith(c) for c in '.;:]']):

We could then also add clean url to Ircbackend,
which would then fix the problem of trailing . ; : etc. in the links it marks. I didn't realize until now
that that problem existed. http://www.google.com; now has the ";" marked as part of the url, though it opens
correctly through the code in Browser.py it won't click properly. Thats problem has been there for ages, with no one ever noticing it, as far as I know.

I wish I had tried more URL's through your regex much sooner. The things I just tried were only links that occured to me when I realized that your regex was sensitive to special characters in a URL. I'm sure there are many more
that would produce errors.

I just pushed again to lernid-proposed. I think it's getting closer to being ready to release.

Good luck and please let me know if you'd like some help.

Unmerged revisions

232. By Mohammad AbuShady

Fixing the regex in Browser widget to support autoloading

231. By Mohammad AbuShady

fixing the url highlighting function

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lernid/widgets/Browser.py'
2--- lernid/widgets/Browser.py 2011-08-27 22:15:03 +0000
3+++ lernid/widgets/Browser.py 2012-04-18 14:18:20 +0000
4@@ -229,7 +229,7 @@
5 url = url[:-1]
6 return url
7 if re.search("slidefile", message, re.IGNORECASE): return (load, ignore)
8- urls = re.findall("(\[?https?://[^\s\)]*)[\s\)\.\]]?", message, re.IGNORECASE)
9+ urls = re.findall("https?://[\w\.\-\/\?\=\&]+[\w]", message, re.IGNORECASE)
10 for url in urls:
11 if (url.startswith('[') and url.endswith(']')) or load:
12 ignore.append(cleanurl(url))
13
14=== modified file 'lernid/widgets/IrcWidget.py'
15--- lernid/widgets/IrcWidget.py 2011-08-13 02:05:14 +0000
16+++ lernid/widgets/IrcWidget.py 2012-04-18 14:18:20 +0000
17@@ -87,7 +87,7 @@
18 else:
19 self._buffer.insert(iend, text)
20
21- for url in re.finditer(r"https?://\S+", unicode(text, 'utf-8')):
22+ for url in re.finditer(r"https?://[\w\.\-\/\?\=\&]+[\w]", unicode(text, 'utf-8')):
23 self._buffer.apply_tag_by_name('hyperlink',
24 self._buffer.get_iter_at_offset(start_pos + url.start()),
25 self._buffer.get_iter_at_offset(start_pos + url.end()))

Subscribers

People subscribed via source and target branches