Merge lp:~edwin-grubbs/launchpad/bug-615655-unicode-oops into lp:launchpad
| Status: | Merged | ||||
|---|---|---|---|---|---|
| Approved by: | Aaron Bentley on 2010-08-24 | ||||
| Approved revision: | no longer in the source branch. | ||||
| Merged at revision: | 11428 | ||||
| Proposed branch: | lp:~edwin-grubbs/launchpad/bug-615655-unicode-oops | ||||
| Merge into: | lp:launchpad | ||||
| Diff against target: |
204 lines (+71/-44) 3 files modified
lib/canonical/encoding.py (+45/-31) lib/canonical/launchpad/xmlrpc/mailinglist.py (+11/-1) lib/lp/registry/doc/message-holds-xmlrpc.txt (+15/-12) |
||||
| To merge this branch: | bzr merge lp:~edwin-grubbs/launchpad/bug-615655-unicode-oops | ||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Aaron Bentley (community) | 2010-08-23 | Approve on 2010-08-24 | |
|
Review via email:
|
|||
Description of the Change
Summary
-------
This branch fixes an oops caused by nonascii characters in an email
preventing a str from being converted to a unicode object. Normally,
this means the message is spam, but since we are not absolutely certain
that will be the case, we will just escape the offending characters and
let the mailing list manager review the email in Launchpad.
Tests
-----
./bin/test -vv -t canonical.encoding -t message-
| Edwin Grubbs (edwin-grubbs) wrote : | # |
Hi Aaron,
I've fixed it to only escape the headers. I tried using message_
-Edwin
=== modified file 'lib/canonical/
--- lib/canonical/
+++ lib/canonical/
@@ -8,6 +8,7 @@
'MailingLi
]
+import re
import xmlrpclib
from zope.component import getUtility
@@ -233,10 +234,15 @@
# though it's much more convenient to just pass 8-bit strings.
if isinstance(bytes, xmlrpclib.Binary):
bytes = bytes.data
- # Although it is illegal for an email to have unencoded non-ascii
- # characters, it is better to let the list owner process the
- # message than to cause an oops.
- bytes = escape_
+ # Although it is illegal for an email header to have unencoded
+ # non-ascii characters, it is better to let the list owner
+ # process the message than to cause an oops.
+ header_
+ match = header_
+ header = bytes[:
+ header = escape_
+ bytes = header + bytes[match.
+
message = getUtility(
=== modified file 'lib/lp/
--- lib/lp/
+++ lib/lp/
@@ -226,7 +226,7 @@
Non-ascii messages
==================
-Messages with non-ascii in their headers or bodies are not exactly legal
+Messages with non-ascii in their headers are not exactly legal
(they should be encoded) but do occur especially in spam. These
messages can be held for moderator approval too. To avoid blowing up
later if the string is converted to a unicode object, the non-ascii
@@ -239,8 +239,7 @@
... Message-ID: <fifth-post\xa9>
... Date: Fri, 01 Aug 2000 01:08:59 -0000
...
- ... Watch out for badgers! \xa9
- ... Don't double quote characters: =E3=F6=FC
+ ... Don't escape non-ascii characters in the body! \xa9
... """)
>>> import xmlrpclib
@@ -269,8 +268,7 @@
'Message-ID: <fifth-post\\xa9>',
'Date: Fri, 01 Aug 2000 01:08:59 -0000',
'',
- 'Watch out for badgers! \\xa9',
- "Don't double quote characters: =E3=F6=FC"]
+ "Don't escape non-ascii characters in the body! \xa9"]
>>> held_message_
<DBItem PostedMessageSt

This needs some work, because 8-bit characters are only illegal in headers. (It is legal, though probably inadvisable, to use Content- transfer- encoding: 8bit or binary for message bodies.)