Merge lp:~sinzui/launchpad/moderate-empty-messages into lp:launchpad

Proposed by Curtis Hovey on 2010-04-06
Status: Merged
Approved by: Henning Eggers on 2010-04-06
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~sinzui/launchpad/moderate-empty-messages
Merge into: lp:launchpad
Diff against target: 132 lines (+65/-6)
2 files modified
lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py (+16/-0)
lib/lp/services/mailman/doc/postings.txt (+49/-6)
To merge this branch: bzr merge lp:~sinzui/launchpad/moderate-empty-messages
Reviewer Review Type Date Requested Status
Henning Eggers (community) code 2010-04-06 Approve on 2010-04-06
Review via email: mp+22870@code.launchpad.net

Description of the Change

This is my branch to prevent empty messages from entering the moderation queue.

    lp:~sinzui/launchpad/moderate-empty-messages
    Diff size: 121
    Launchpad bug: https://bugs.launchpad.net/bugs/498741
    Test command: ./bin/test -vv --layer=Mailman -t postings
    Pre-implementation: barry
    Target release: 10.04

Prevent empty messages from entering the moderation queue
----------------------------------------------------------

Launchpad only supports text/plain email messages, but mailman is asking
users to moderate text/html or image/* messages. Moderators see a subject with
no body. Besides the insanity of this case, there are two bugs that directly
relate to these messages:

Bug 498741 [spam offered for moderation on a ML]
    More that 99% of all spam in the moderation queue have no body.
Bug 498740 [timeout trying to moderate a mailing list with many messages]
    The number of spam messages causes timeouts.

Note that no user has ever complained that Launchpad does not support
non-text messages. We should discard any message that has not meaningful
content.

Rules
-----

    * Iterate over all the parts of each message that mailman call hold()
      on. If there are no text parts with 1 or more non-white-space, raise
      Errors.DiscardMessage

QA
--

    * On staging, send an image only or html-only message to haibunku from
      a non-subscribed address.
      * Most spam messages purport to be from the list itself, eg
        "spam" <email address hidden>.
      * Verify the message does not appear in the moderation queue.
    * On staging send a normal email (with a text part) from the same address.
      * Verify that the message is in the moderation queue.
    * On staging, send an image only or html-only message to haibunku from
      your real address (assuming you are a list member)
      * Verify the message arrives to the list members.

Lint
----

Linting changed files:
  lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py
  lib/lp/services/mailman/doc/postings.txt

Test
----

    * lib/lp/services/mailman/doc/postings.txt
      * Revised the large message size test to look like real examples.
        Barry and I felt that these tests looked like spam according to
        the new rule and would probably fail when the code is added.
      * Added tests to verify that list members can send non-text messages,
        but such messages from non-members are discarded.

Implementation
--------------

    * lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py
      * Added a guard to the hold() function to ensure that non-text
        messages are discarded.

To post a comment you must log in.
Henning Eggers (henninge) wrote :
Download full text (6.9 KiB)

Hi Curtis
thank you for this branch and making moderators lifes easier!

Pleae find my commengs in-line.

Am 06.04.2010 16:45, schrieb Curtis Hovey:
> Prevent empty messages from entering the moderation queue
> ----------------------------------------------------------
>
> Launchpad only supports text/plain email messages, but mailman is asking
> users to moderate text/html or image/* messages. Moderators see a subject with
> no body. Besides the insanity of this case, there are two bugs that directly
> relate to these messages:
>
> Bug 498741 [spam offered for moderation on a ML]
> More that 99% of all spam in the moderation queue have no body.
> Bug 498740 [timeout trying to moderate a mailing list with many messages]
> The number of spam messages causes timeouts.

Too bugs in one branch? Nice hunting! ;-)

>
> Note that no user has ever complained that Launchpad does not support
> non-text messages. We should discard any message that has not meaningful
> content.
>
>
> Rules
> -----
>
> * Iterate over all the parts of each message that mailman call hold()
> on. If there are no text parts with 1 or more non-white-space, raise
> Errors.DiscardMessage

I do not know the inner workings of the Mailman code but I trust that you and
Barry came up with The Right Thing here ... ;-)

>
>
> QA
> --
>
> * On staging, send an image only or html-only message to haibunku from
> a non-subscribed address.
> * Most spam messages purport to be from the list itself, eg
> "spam"<email address hidden>.
> * Verify the message does not appear in the moderation queue.
> * On staging send a normal email (with a text part) from the same address.
> * Verify that the message is in the moderation queue.
> * On staging, send an image only or html-only message to haibunku from
> your real address (assuming you are a list member)
> * Verify the message arrives to the list members.

Only question I have here is how to send emails on staging?

>
>
> Lint
> ----
>
> Linting changed files:
> lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py
> lib/lp/services/mailman/doc/postings.txt
>
>
>
> Test
> ----
>
> * lib/lp/services/mailman/doc/postings.txt
> * Revised the large message size test to look like real examples.
> Barry and I felt that these tests looked like spam according to
> the new rule and would probably fail when the code is added.
> * Added tests to verify that list members can send non-text messages,
> but such messages from non-members are discarded.
>
>
> Implementation
> --------------
>
> * lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py
> * Added a guard to the hold() function to ensure that non-text
> messages are discarded.
>

I am not familiar with the overall structure of the file but wouldn't it have
been nicer to put that guard into it's own function?

> === modified file 'lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py'
> --- lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py 2009-10-01 15:36:20 +0000
> +++ li...

Read more...

review: Approve (code)
Curtis Hovey (sinzui) wrote :
Download full text (7.4 KiB)

Hi Henning.

Thanks for the review.

On Tue, 2010-04-06 at 15:51 +0000, Henning Eggers wrote:
> Review: Approve code
...

> > Rules
> > -----
> >
> > * Iterate over all the parts of each message that mailman call hold()
> > on. If there are no text parts with 1 or more non-white-space, raise
> > Errors.DiscardMessage
>
> I do not know the inner workings of the Mailman code but I trust that you and
> Barry came up with The Right Thing here ... ;-)

Yes. I was perplexed by how to locate a text/plain part reliably. Barry
pointed me to the iterators.

> >
> >
> > QA
> > --
> >
> > * On staging, send an image only or html-only message to haibunku from
> > a non-subscribed address.
> > * Most spam messages purport to be from the list itself, eg
> > "spam"<email address hidden>.
> > * Verify the message does not appear in the moderation queue.
> > * On staging send a normal email (with a text part) from the same address.
> > * Verify that the message is in the moderation queue.
> > * On staging, send an image only or html-only message to haibunku from
> > your real address (assuming you are a list member)
> > * Verify the message arrives to the list members.
>
> Only question I have here is how to send emails on staging?

Staging has a separate mailman. The team page lists the staging mailman
address. Just send the messages to the listed address. Barry and I can
see the messages that go to haibunku.

...

> > Implementation
> > --------------
> >
> > * lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py
> > * Added a guard to the hold() function to ensure that non-text
> > messages are discarded.
> >
>
> I am not familiar with the overall structure of the file but wouldn't it have
> been nicer to put that guard into it's own function?

I like your suggestion.

> > === modified file 'lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py'
> > --- lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py 2009-10-01 15:36:20 +0000
> > +++ lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py 2010-04-06 14:45:42 +0000
...

> > @@ -73,6 +74,18 @@
> > syslog('vette',
> > 'Discarding duplicate held message-id: %s', message_id)
> > raise Errors.DiscardMessage
> > + # Discard messages without text content since there will be nothing to
> > + # moderate. Most of these messages are spam.
> > + has_content = False
> > + for part in typed_subpart_iterator(msg, 'text'):
> > + if part.get_content_subtype() == 'plain':
> > + if len(part.get_payload().strip()) > 0:
> > + has_content = True
> > + break
>
> Yes, at least this loop could be put in a function:
>
> def is_message_empty(msg):
> for part in typed_subpart_iterator(msg, 'text'):
> if part.get_content_subtype() == 'plain':
> if len(part.get_payload().strip()) > 0:
> return False
> return True
>
> > + if not has_content:
>
> if is_message_empty(msg):

Yes. That is much better.

> > + syslog('vette',
> Not ...

Read more...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py'
2--- lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py 2009-10-01 15:36:20 +0000
3+++ lib/canonical/launchpad/mailman/monkeypatches/lpmoderate.py 2010-04-06 17:22:43 +0000
4@@ -6,6 +6,7 @@
5
6 import xmlrpclib
7
8+from email.iterators import typed_subpart_iterator
9 from email.Utils import formatdate, make_msgid
10
11 # pylint: disable-msg=F0401
12@@ -40,6 +41,15 @@
13 hold(mlist, msg, msgdata, 'Not subscribed')
14
15
16+def is_message_empty(msg):
17+ """Is the message missing a text/plain part with content?"""
18+ for part in typed_subpart_iterator(msg, 'text'):
19+ if part.get_content_subtype() == 'plain':
20+ if len(part.get_payload().strip()) > 0:
21+ return False
22+ return True
23+
24+
25 def hold(mlist, msg, msgdata, annotation):
26 """Hold the message in both Mailman and Launchpad.
27
28@@ -73,6 +83,12 @@
29 syslog('vette',
30 'Discarding duplicate held message-id: %s', message_id)
31 raise Errors.DiscardMessage
32+ # Discard messages without text content since there will be nothing to
33+ # moderate. Most of these messages are spam.
34+ if is_message_empty(msg):
35+ syslog('vette',
36+ 'Discarding text-less message-id: %s', message_id)
37+ raise Errors.DiscardMessage
38 holds[message_id] = request_id
39 # In addition to Message-ID, the librarian requires a Date header.
40 if 'date' not in msg:
41
42=== modified file 'lib/lp/services/mailman/doc/postings.txt'
43--- lib/lp/services/mailman/doc/postings.txt 2009-10-19 16:21:21 +0000
44+++ lib/lp/services/mailman/doc/postings.txt 2010-04-06 17:22:43 +0000
45@@ -845,13 +845,19 @@
46 >>> config.mailman.soft_max_size
47 40000
48
49- >>> from email.MIMENonMultipart import MIMENonMultipart
50- >>> big_message = MIMENonMultipart('application', 'octet-stream')
51- >>> big_message.set_payload('\n'.join(['x' * 50] * 1000))
52+ >>> from email.mime.multipart import MIMEMultipart
53+ >>> from email.mime.application import MIMEApplication
54+ >>> from email.mime.text import MIMEText
55+
56+ >>> big_message = MIMEMultipart()
57 >>> big_message['From'] = 'anne.person@example.com'
58 >>> big_message['To'] = 'itest-one@lists.launchpad.dev'
59 >>> big_message['Subject'] = 'A huge message'
60 >>> big_message['Message-ID'] = '<puma>'
61+ >>> big_message.attach(
62+ ... MIMEText('look at this pdf.', 'plain'))
63+ >>> big_message.attach(
64+ ... MIMEApplication('\n'.join(['x' * 50] * 1000), 'octet-stream'))
65 >>> inject('itest-one', big_message.as_string())
66
67 >>> vette_watcher.wait_for_hold('itest-one', 'puma')
68@@ -888,12 +894,15 @@
69 >>> config.mailman.hard_max_size
70 1000000
71
72- >>> huge_message = MIMENonMultipart('application', 'octet-stream')
73- >>> huge_message.set_payload('\n'.join(['x' * 50] * 20000))
74+ >>> huge_message = MIMEMultipart()
75 >>> huge_message['From'] = 'anne.person@example.com'
76 >>> huge_message['To'] = 'itest-one@lists.launchpad.dev'
77 >>> huge_message['Subject'] = 'A huge message'
78 >>> huge_message['Message-ID'] = '<quahog>'
79+ >>> huge_message.attach(
80+ ... MIMEText('look at this huge pdf.', 'plain'))
81+ >>> huge_message.attach(
82+ ... MIMEApplication('\n'.join(['x' * 50] * 20000), 'octet-stream'))
83 >>> inject('itest-one', huge_message.as_string())
84
85 >>> vette_watcher.wait_for_discard('quahog')
86@@ -920,7 +929,7 @@
87 ...
88 ... Watch out for badgers! \xa9
89 ... """)
90-
91+
92 However, because Cate is not a member of the mailing list, her message gets
93 held for moderator approval.
94
95@@ -931,3 +940,37 @@
96 ...
97 Itest One <noreply@launchpad.net>
98 New mailing list message requiring approval for Itest One
99+
100+
101+Non-text messages
102+=================
103+
104+List members can send non-text messages to agitate the other list members.
105+
106+ >>> html_message = MIMEText('<a><img /></a>.', 'html')
107+ >>> html_message['From'] = 'anne.person@example.com'
108+ >>> html_message['To'] = 'itest-one@lists.launchpad.dev'
109+ >>> html_message['Subject'] = "It's full of stars"
110+ >>> html_message['Message-ID'] = '<loggerhead>'
111+ >>> inject('itest-one', html_message.as_string())
112+
113+ >>> smtpd_watcher.wait_for_mbox_delivery('loggerhead')
114+ >>> messages = list(smtpd)
115+ >>> for message in messages:
116+ ... print message['Subject']
117+ [Itest-one] It's full of stars
118+ [Itest-one] It's full of stars
119+
120+Messages without non-empty text/plain parts from non-list members are rejected
121+because launchpad does not store non-text messages for moderation.
122+
123+ >>> spam_message = MIMEText('<a><img /></a>.', 'html')
124+ >>> spam_message['From'] = 'itest-one@lists.launchpad.dev'
125+ >>> spam_message['To'] = 'itest-one@lists.launchpad.dev'
126+ >>> spam_message['Subject'] = 'get drugs'
127+ >>> spam_message['Message-ID'] = '<ocelot>'
128+ >>> inject('itest-one', spam_message.as_string())
129+
130+ >>> vette_watcher.wait_for_discard('ocelot')
131+ >>> print_message_summaries()
132+ Number of messages: 0