Merge lp:~hirt/ocb-addons/6.1_fix_mailaddress-parsing into lp:ocb-addons/6.1

Proposed by Etienne Hirt
Status: Merged
Merged at revision: 6824
Proposed branch: lp:~hirt/ocb-addons/6.1_fix_mailaddress-parsing
Merge into: lp:ocb-addons/6.1
Diff against target: 33 lines (+9/-3)
1 file modified
mail/mail_message.py (+9/-3)
To merge this branch: bzr merge lp:~hirt/ocb-addons/6.1_fix_mailaddress-parsing
Reviewer Review Type Date Requested Status
Holger Brunn (Therp) code review Approve
Stefan Rijnhart (Opener) Approve
Pedro Manuel Baeza Approve
Review via email: mp+194030@code.launchpad.net

Description of the change

While using 6.1 mail_message to_email function I found that it does not
cope with all the possible names in the emaillist.

example: 'bla@2.ch' <bla@2.ch>, (bla@2.ch) <bla@2.ch>, "'bla@2.ch'" <bla@2.ch>, etc

The regexp in the function to_email of mail/mail_message.py is enhanced that it also removes ", () and
even if the allowed '.

To post a comment you must log in.
Revision history for this message
Pedro Manuel Baeza (pedro.baeza) wrote :

Thanks for the MP, Etienne.

Regards.

review: Approve
Revision history for this message
Stefan Rijnhart (Opener) (stefan-opener) wrote :

In the case of email address parsing, which is notoriously difficult, I would have more faith in a standard function than a regular expression, at least without a large test set. Did you look at getaddressess from email.utils? See http://docs.python.org/2/library/email.util.html#email.utils.getaddresses for docs, and the example at the top of http://pastebin.com/zbzXQYS9

review: Needs Information
Revision history for this message
Etienne Hirt (hirt) wrote :

Dear Stefan,
Thanks for the comments and hints. I fully agree with your doubts in regular expressions. I have the same fear. That's why I had one of my employees that often uses regular expressions in Perl etc to help me.

We use this RE now since 15.3.13 daily to import emails from our imap server into Openerp to be assigned to projects, purchase orders, opportunities etc. without any fails.

The getaddresses function as well as the self built addrparser in the pastebin can not cope with troubles like "'bla@2.ch'" <bla@2.ch>.

I'm therefore convinced from the proposed approach.

Revision history for this message
Stefan Rijnhart (Opener) (stefan-opener) wrote :

On 11/29/2013 11:17 PM, Etienne Hirt wrote:
> We use this RE now since 15.3.13 daily to import emails from our imap server into Openerp to be assigned to projects, purchase orders, opportunities etc. without any fails.
Hi Etienne, thank you for your response. I still think that testing a
regexp for email addresses on a single OpenERP server counts as
anecdotal evidence.

> The getaddresses function as well as the self built addrparser in the pastebin can not cope with troubles like "'bla@2.ch'" <bla@2.ch>.

Am I missing something? I get

from email.utils import getaddresses
 >>> getaddresses(["\"'bla@2.ch'\" <bla@2.ch>"])
[("'bla@2.ch'", 'bla@2.ch')]

This what I would expect.

Revision history for this message
Etienne Hirt (hirt) wrote :

Dear Stefan,

Sorry for the delayed continuation. Still the approach with getaddresses is not perfect. See below where the name 'bla@21.ch' is converted into a funny emailaddress. However, it works sufficiently to replace the regexp. Will upload the update for your review.
You might have a better proposal to get the second column of the matrix.

Thanks

Etienne

text = "'bla@21.ch' <bla@22.ch>, (bla@31.ch) <bla@32.ch>, \"'bla@41.ch'\" <bla@42.ch>,"
getaddresses([text])
[('', "'bla@21.ch'"), ('', 'bla@22.ch'), (' (bla@31.ch)', 'bla@32.ch'), ("'bla@41.ch'", 'bla@42.ch')]

re.findall(r'([^ ,(\'"<@]+@[^> ,)\'"]+)',text)
['bla@21.ch', 'bla@22.ch', 'bla@31.ch', 'bla@32.ch', 'bla@41.ch', 'bla@42.ch']

Revision history for this message
Stefan Rijnhart (Opener) (stefan-opener) wrote :

Hi Etienne,

thanks for the follow up! I believe the interpretation of 'bla@21.ch' as a separate emailaddress is correct as RFC822 mentions that strings (like the display name) containing the character '@' should be enclosed in the ascii quote character '"', which is not the case here.

You may like to compress the code using a list comprehension like

    return [address[1] for address in getaddresses([text])]

but if you don't that is fine too. Approved.

review: Approve
Revision history for this message
Etienne Hirt (hirt) wrote :

Dear Stefan,

I finalized the code now using list comprehension. Hope this does not delay the merging.

Revision history for this message
Holger Brunn (Therp) (hbrunn) :
review: Approve (code review)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'mail/mail_message.py'
2--- mail/mail_message.py 2013-11-20 01:17:39 +0000
3+++ mail/mail_message.py 2013-12-20 22:41:55 +0000
4@@ -26,10 +26,10 @@
5 import email
6 import logging
7 import pytz
8-import re
9 import time
10 from email.header import decode_header
11 from email.message import Message
12+from email.utils import getaddresses
13
14 import tools
15 from osv import osv
16@@ -60,9 +60,15 @@
17 return ''.join([tools.ustr(x[0], x[1]) for x in text])
18
19 def to_email(text):
20- """Return a list of the email addresses found in ``text``"""
21+ """Returns a list of the email addresses found in ``text``
22+ """
23 if not text: return []
24- return re.findall(r'([^ ,<@]+@[^> ,]+)', text)
25+
26+ people = getaddresses([text])
27+ addresses = [person[1] for person in people]
28+
29+ return addresses
30+
31
32 class mail_message_common(osv.osv_memory):
33 """Common abstract class for holding the main attributes of a

Subscribers

People subscribed via source and target branches