Merge lp:~jtv/wikkid/bug-614286 into lp:wikkid

Proposed by Jeroen T. Vermeulen
Status: Work in progress
Proposed branch: lp:~jtv/wikkid/bug-614286
Merge into: lp:wikkid
Diff against target: 12 lines (+1/-1)
1 file modified
wikkid/contrib/creole_1_1/creole.py (+1/-1)
To merge this branch: bzr merge lp:~jtv/wikkid/bug-614286
Reviewer Review Type Date Requested Status
Tim Penhey Needs Fixing
Paul Hummer (community) Abstain
Review via email: mp+33507@code.launchpad.net

Commit message

UTF-8-enable the Creole parser.

Description of the change

This was meant as a pure proof-of-concept branch for someone who knows the code better, but it's worked well for me in production for a few weeks (with tons and tons of non-ASCII on creole pages) and Tim seems to want a merge proposal.

Here goes.

To post a comment you must log in.
Revision history for this message
Paul Hummer (rockstar) wrote :

I wonder if, for sanity's sake, we should just encode all strings as UTF-8. I can't think of any place where we'd want just ascii.

I'll abstain for now. Maybe Tim has a stronger opinion.

review: Abstain
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

> I wonder if, for sanity's sake, we should just encode all strings as UTF-8. I
> can't think of any place where we'd want just ascii.

It's definitely a better *default* choice than ASCII, which is what's implicitly used now. Any text that isn't UTF-8 is exceedingly unlikely to pass a simple decoding check. So defaulting to UTF-8 passes more, gives access to full unicode, and yet isn't likely to lead to misinterpretation of data.

Jeroen

Revision history for this message
Tim Penhey (thumper) wrote :

Ideally I'd like to have the parsing work with unicode strings, but the creole support for now is with this external library. I've started looking at other parsers but we'll just have to wait an see. I'm trying to work out how to fit more wikkid hacking into my week.

review: Approve
Revision history for this message
Tim Penhey (thumper) wrote :

Actually, the tests don't pass with this applied.

review: Needs Fixing
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

This is the failure I get when trying to run the test. It's probably not the one you're getting; it doesn't look related to my change at all.

http://paste.ubuntu.com/501282/

Unmerged revisions

49. By Jeroen T. Vermeulen

Make Creole assume UTF-8, not ASCII.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'wikkid/contrib/creole_1_1/creole.py'
2--- wikkid/contrib/creole_1_1/creole.py 2010-05-10 10:34:10 +0000
3+++ wikkid/contrib/creole_1_1/creole.py 2010-08-24 11:30:55 +0000
4@@ -407,7 +407,7 @@
5 def parse(self):
6 """Parse the text given as self.raw and return DOM tree."""
7
8- self.parse_block(self.raw)
9+ self.parse_block(self.raw.decode('utf-8'))
10 return self.root
11
12 #################### Helper classes

Subscribers

People subscribed via source and target branches