Mailman 2.1.15 and later crashes on more email

Bug #1235101 reported by Axis Communications
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
Medium
Mark Sapiro

Bug Description

After upgrading from Mailman 2.1.14 to 2.1.15 we noticed a steep increase in the number of shunted email. The error message is the following:
  File "/var/lib/mailman/Mailman/Queue/Runner.py", line 119, in _oneloop
    self._onefile(msg, msgdata)
  File "/var/lib/mailman/Mailman/Queue/Runner.py", line 190, in _onefile
    keepqueued = self._dispose(mlist, msg, msgdata)
  File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 130, in _dispose
    more = self._dopipeline(mlist, msg, msgdata, pipeline)
  File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 153, in _dopipeline
    sys.modules[modname].process(mlist, msg, msgdata)
  File "/var/lib/mailman/Mailman/Handlers/SpamDetect.py", line 99, in process
    headers += getDecodedHeaders(p, lcset)
  File "/var/lib/mailman/Mailman/Handlers/SpamDetect.py", line 71, in getDecodedHeaders
    v = decode_header(re.sub('\n\s', ' ', v))
  File "/usr/lib/python2.7/email/header.py", line 108, in decode_header
    raise HeaderParseError
HeaderParseError

The older version of SpamDetect.py did not crash nearly as often. The reason seems to be that now python's email module is used and it crashes much more often than the earlier solution where the HeaderGenerator class in SpamDetect.py was used.

Related branches

Revision history for this message
Mark Sapiro (msapiro) wrote :

I'm not convinced that the post 2.1.14 changes are responsible. There could be a bug in email.Header.decode_header(), but the main difference is the 2.1.14 method didn't decode RFC 2047 encoded headers. It could be that the messages that throw the exception are all malformed spam anyway. In any case, please provide one or more of the messages that cause the exception to be thrown.

Also, it will be easy to catch the exception and prevent the shunting, but what should be done with the message? At this point in the processing, we don't know what rule if any this header might match. I am inclined to do one of two things:

1) just leave the header undecoded.

2) replace the header along the lines of "Header-Name: unparseable RFC 2047 encoding" becomes "X-Undecodable-Header: Header-Name: unparseable RFC 2047 encoding".

Any thoughts?

Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
importance: Undecided → Medium
milestone: none → 2.1.16
status: New → Incomplete
Revision history for this message
Mark Sapiro (msapiro) wrote :

I have tentatively fixed this for 2.1.16 by just adding the undecoded header to the header text. This means that the text that will be searched by header_filter_rules will include just the raw header for any unparseable RFC 2047 encoded header. This was the case in 2.1.14 and prior for all RFC 2047 encoded headers.

Changed in mailman:
status: Incomplete → Fix Committed
Mark Sapiro (msapiro)
Changed in mailman:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.