Merge lp:~thumper/launchpad/mail-header-oops into lp:launchpad

Proposed by Tim Penhey
Status: Merged
Approved by: Robert Collins
Approved revision: no longer in the source branch.
Merged at revision: 12566
Proposed branch: lp:~thumper/launchpad/mail-header-oops
Merge into: lp:launchpad
Diff against target: 241 lines (+91/-30)
4 files modified
lib/lp/services/mail/incoming.py (+36/-20)
lib/lp/services/mail/tests/incomingmail.txt (+6/-6)
lib/lp/services/mail/tests/test_incoming.py (+44/-1)
lib/lp/testing/factory.py (+5/-3)
To merge this branch: bzr merge lp:~thumper/launchpad/mail-header-oops
Reviewer Review Type Date Requested Status
Robert Collins (community) Approve
Ian Booth (community) *code Approve
Review via email: mp+52156@code.launchpad.net

Commit message

[r=lifeless,wallyworld][bug=701976,728827] Lower some log.warning to log.info as warnings generate oopses.

Description of the change

Warnings that are logged are turned into OOPSes by the
OOPS log handler added by the base CronScript class.

I've updated the behaviour of the header checking based
on the comments in bug 701976.

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

+def extract_addresses(mail, raw_mail, file_alias_url, log):
+ # Extract the domain the mail was sent to. Mails sent to
+ # Launchpad should have an X-Launchpad-Original-To header.

Why not turn that into a docstring? And, maybe add things just saying the types of the parameters and the return code. It's the kind of thing that easily has bugs about passing a string vs a list of strings vs a file etc. (Or, you could use a statically type-checked language. :-P)

+ if ORIGINAL_TO_HEADER in mail:
+ return [mail[ORIGINAL_TO_HEADER]]
+
+ if ORIGINAL_TO_HEADER in raw_mail:
+ # Almost certainly a spam email with a blank line in the email headers
+ log.info('Suspected spam: %s' % file_alias_url)

I really do not see how the comment follows from the condition. How can you even have a blank line in the headers? You can have a blank line that causes things that ought to be headers to be part of the body, but no reasonable mail agent will see them as headers then.

Perhaps this should say something like:

 # Doesn't have an X-Launchpad-Original-To in the headers, but does have one in the body, because of a forwarding loop or attempted spam. See <https://bugs.launchpad.net/launchpad/+bug/701976>

Revision history for this message
Martin Pool (mbp) wrote :

I'm +1 on the dkim changes, and -1 (pending more information) on the x-l-original-to changes. Having that substring within a message doesn't mean it's spam.

Revision history for this message
Tim Penhey (thumper) wrote :

I agree that having that substring within the message doesn't mean it is spam, but having it in the message but not in the headers means that the headers have been messed with somehow, and that is more indicative of spam. I've updated the code to match your suggested wording.

Revision history for this message
Martin Pool (mbp) wrote :

I don't understand at the moment how incoming spam can cause the
headers to be split and x-l-original-to not to be attached. That
seems to me to be more likely to be a bug or misconfiguration in our
software.

Martin

Revision history for this message
Ian Booth (wallyworld) wrote :

I like the wording change suggested by Martin.
The scenarios tested for seem to match the observed data referred to in the bug report so to me it looks ok.

review: Approve (*code)
Revision history for this message
Martin Pool (mbp) wrote :

On 4 March 2011 14:54, Tim Penhey <email address hidden> wrote:
> I agree that having that substring within the message doesn't mean it is spam, but having it in the message but not in the headers means that the headers have been messed with somehow, and that is more indicative of spam.

I think we ought to identify why spam is causing the messages to be
corrupted. Checking for this string is just superstition.

Revision history for this message
Martin Pool (mbp) wrote :

Note that we're not checking for a body that looks like it contains headers, which might be a somewhat-reliable indicator of either spam or something broken. We're just looking for that string anywhere in the body. It's reasonably likely to match on messages that do not contain broken headers.

Revision history for this message
Robert Collins (lifeless) wrote :

@Martin - spam often has invalid headers (attempting smuggling) and our x-original-to mangler adds it after the invalid header, but our lp parser stops at the invalid point. So this is pretty reliable and won't reject valid mail AFAWCT.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/services/mail/incoming.py'
2--- lib/lp/services/mail/incoming.py 2011-03-03 02:29:54 +0000
3+++ lib/lp/services/mail/incoming.py 2011-03-09 20:52:00 +0000
4@@ -143,19 +143,19 @@
5 # in addition to the dkim signature being valid, we have to check that it
6 # was actually signed by the user's domain.
7 if len(signing_details) != 1:
8- log.errors(
9+ log.info(
10 'expected exactly one DKIM details record: %r'
11 % (signing_details,))
12 return False
13 signing_domain = signing_details[0]['d']
14 from_domain = extract_address_domain(signed_message['From'])
15 if signing_domain != from_domain:
16- log.warning("DKIM signing domain %s doesn't match From address %s; "
17+ log.info("DKIM signing domain %s doesn't match From address %s; "
18 "disregarding signature"
19 % (signing_domain, from_domain))
20 return False
21 if not _isDkimDomainTrusted(signing_domain):
22- log.warning("valid DKIM signature from untrusted domain %s"
23+ log.info("valid DKIM signature from untrusted domain %s"
24 % (signing_domain,))
25 return False
26 return True
27@@ -262,6 +262,36 @@
28 self.setOopsToken('EMAIL')
29
30
31+ORIGINAL_TO_HEADER = 'X-Launchpad-Original-To'
32+
33+
34+def extract_addresses(mail, raw_mail, file_alias_url, log):
35+ """Extract the domain the mail was sent to.
36+
37+ Mails sent to Launchpad should have an X-Launchpad-Original-To header.
38+ This is added by the MTA before it ends up the mailbox for Launchpad.
39+ """
40+ if ORIGINAL_TO_HEADER in mail:
41+ return [mail[ORIGINAL_TO_HEADER]]
42+
43+ if ORIGINAL_TO_HEADER in raw_mail:
44+ # Doesn't have an X-Launchpad-Original-To in the headers, but does
45+ # have one in the body, because of a forwarding loop or attempted
46+ # spam. See <https://bugs.launchpad.net/launchpad/+bug/701976>
47+ log.info('Suspected spam: %s' % file_alias_url)
48+ else:
49+ # This most likely means a email configuration problem, and it should
50+ # log an oops.
51+ log.warn(
52+ "No X-Launchpad-Original-To header was present "
53+ "in email: %s" % file_alias_url)
54+ # Process all addresses found as a fall back.
55+ cc = mail.get_all('cc') or []
56+ to = mail.get_all('to') or []
57+ names_addresses = getaddresses(to + cc)
58+ return [addr for name, addr in names_addresses]
59+
60+
61 def report_oops(file_alias_url=None, error_msg=None):
62 """Record an OOPS for the current exception and return the OOPS ID."""
63 info = sys.exc_info()
64@@ -280,7 +310,7 @@
65
66
67 def handleMail(trans=transaction,
68- signature_timestamp_checker=None):
69+ signature_timestamp_checker=None):
70 # First we define an error handler. We define it as a local
71 # function, to avoid having to pass a lot of parameters.
72 # pylint: disable-msg=W0631
73@@ -394,21 +424,8 @@
74 file_alias_url, notify=False)
75 continue
76
77- # Extract the domain the mail was sent to. Mails sent to
78- # Launchpad should have an X-Launchpad-Original-To header.
79- if 'X-Launchpad-Original-To' in mail:
80- addresses = [mail['X-Launchpad-Original-To']]
81- else:
82- log = logging.getLogger('lp.services.mail')
83- log.warn(
84- "No X-Launchpad-Original-To header was present "
85- "in email: %s" %
86- file_alias_url)
87- # Process all addresses found as a fall back.
88- cc = mail.get_all('cc') or []
89- to = mail.get_all('to') or []
90- names_addresses = getaddresses(to + cc)
91- addresses = [addr for name, addr in names_addresses]
92+ addresses = extract_addresses(
93+ mail, raw_mail, file_alias_url, log)
94
95 try:
96 do_paranoid_envelope_to_validation(addresses)
97@@ -464,7 +481,6 @@
98 # from being processed.
99 _handle_error(
100 "Unhandled exception", file_alias_url)
101- log = logging.getLogger('canonical.launchpad.mail')
102 if file_alias_url is not None:
103 email_info = file_alias_url
104 else:
105
106=== modified file 'lib/lp/services/mail/tests/incomingmail.txt'
107--- lib/lp/services/mail/tests/incomingmail.txt 2010-12-13 17:34:34 +0000
108+++ lib/lp/services/mail/tests/incomingmail.txt 2011-03-09 20:52:00 +0000
109@@ -101,10 +101,10 @@
110 header is missing.)
111
112 >>> handleMailForTest()
113- WARNING:lp.services.mail:No X-Launchpad-Original-To header was present ...
114- WARNING:lp.services.mail:No X-Launchpad-Original-To header was present ...
115- WARNING:lp.services.mail:No X-Launchpad-Original-To header was present ...
116- WARNING:lp.services.mail:No X-Launchpad-Original-To header was present ...
117+ WARNING:process-mail:No X-Launchpad-Original-To header was present ...
118+ WARNING:process-mail:No X-Launchpad-Original-To header was present ...
119+ WARNING:process-mail:No X-Launchpad-Original-To header was present ...
120+ WARNING:process-mail:No X-Launchpad-Original-To header was present ...
121
122 Now we can see that each handler handled the emails sent to its domain:
123
124@@ -252,7 +252,7 @@
125 ... """)
126 >>> msgid = sendmail(msg, ['edit@malone-domain'])
127 >>> handleMailForTest()
128- ERROR:canonical.launchpad.mail:An exception was raised inside the handler:
129+ ERROR:process-mail:An exception was raised inside the handler:
130 ...
131 TestOopsException
132
133@@ -300,7 +300,7 @@
134 ... """)
135 >>> msgid = sendmail(msg, ['edit@malone-domain'])
136 >>> handleMailForTest()
137- ERROR:canonical.launchpad.mail:An exception was raised inside the handler:
138+ ERROR:process-mail:An exception was raised inside the handler:
139 ...
140 Unauthorized
141
142
143=== modified file 'lib/lp/services/mail/tests/test_incoming.py'
144--- lib/lp/services/mail/tests/test_incoming.py 2011-03-03 02:29:54 +0000
145+++ lib/lp/services/mail/tests/test_incoming.py 2011-03-09 20:52:00 +0000
146@@ -6,7 +6,7 @@
147 import os
148 import unittest
149
150-from testtools.matchers import Is
151+from testtools.matchers import Equals, Is
152 import transaction
153 from zope.security.management import setSecurityPolicy
154
155@@ -19,10 +19,13 @@
156 helpers,
157 )
158 from canonical.testing.layers import LaunchpadZopelessLayer
159+from lp.services.log.logger import BufferLogger
160 from lp.services.mail.incoming import (
161 authenticateEmail,
162+ extract_addresses,
163 handleMail,
164 MailErrorUtility,
165+ ORIGINAL_TO_HEADER,
166 )
167 from lp.services.mail.sendmail import MailController
168 from lp.services.mail.stub import TestMailer
169@@ -119,6 +122,46 @@
170 self.assertThat(authenticateEmail(mail), Is(None))
171
172
173+class TestExtractAddresses(TestCaseWithFactory):
174+
175+ layer = LaunchpadZopelessLayer
176+
177+ def test_original_to(self):
178+ mail = self.factory.makeSignedMessage()
179+ original_to = 'eric@vikings.example.com'
180+ mail[ORIGINAL_TO_HEADER] = original_to
181+ self.assertThat(
182+ extract_addresses(mail, None, None, None),
183+ Equals([original_to]))
184+
185+ def test_original_to_in_body(self):
186+ header_to = 'eric@vikings-r-us.example.com'
187+ original_to = 'eric@vikings.example.com'
188+ alias = 'librarian-somewhere'
189+ body = '%s: %s\n\nsome body stuff' % (
190+ ORIGINAL_TO_HEADER, original_to)
191+ log = BufferLogger()
192+ mail = self.factory.makeSignedMessage(
193+ body=body, to_address=header_to)
194+ addresses = extract_addresses(mail, mail.as_string(), alias, log)
195+ self.assertThat(addresses, Equals([header_to]))
196+ self.assertThat(
197+ log.getLogBuffer(),
198+ Equals('INFO Suspected spam: librarian-somewhere\n'))
199+
200+ def test_original_to_missing(self):
201+ header_to = 'eric@vikings-r-us.example.com'
202+ alias = 'librarian-somewhere'
203+ log = BufferLogger()
204+ mail = self.factory.makeSignedMessage(to_address=header_to)
205+ addresses = extract_addresses(mail, mail.as_string(), alias, log)
206+ self.assertThat(addresses, Equals([header_to]))
207+ self.assertThat(
208+ log.getLogBuffer(),
209+ Equals('WARNING No X-Launchpad-Original-To header was present '
210+ 'in email: librarian-somewhere\n'))
211+
212+
213 def setUp(test):
214 test._old_policy = setSecurityPolicy(LaunchpadSecurityPolicy)
215 LaunchpadZopelessLayer.switchDbUser(config.processmail.dbuser)
216
217=== modified file 'lib/lp/testing/factory.py'
218--- lib/lp/testing/factory.py 2011-03-07 19:06:47 +0000
219+++ lib/lp/testing/factory.py 2011-03-09 20:52:00 +0000
220@@ -1829,7 +1829,7 @@
221
222 def makeSignedMessage(self, msgid=None, body=None, subject=None,
223 attachment_contents=None, force_transfer_encoding=False,
224- email_address=None, signing_context=None):
225+ email_address=None, signing_context=None, to_address=None):
226 """Return an ISignedMessage.
227
228 :param msgid: An rfc2822 message-id.
229@@ -1847,8 +1847,10 @@
230 person = self.makePerson()
231 email_address = removeSecurityProxy(person).preferredemail.email
232 mail['From'] = email_address
233- mail['To'] = removeSecurityProxy(
234- self.makePerson()).preferredemail.email
235+ if to_address is None:
236+ to_address = removeSecurityProxy(
237+ self.makePerson()).preferredemail.email
238+ mail['To'] = to_address
239 if subject is None:
240 subject = self.getUniqueString('subject')
241 mail['Subject'] = subject