Merge lp:~widelands-dev/widelands-website/anti_spam into lp:widelands-website

Proposed by kaputtnik
Status: Merged
Merged at revision: 423
Proposed branch: lp:~widelands-dev/widelands-website/anti_spam
Merge into: lp:widelands-website
Diff against target: 360 lines (+151/-25)
11 files modified
pybb/admin.py (+2/-2)
pybb/forms.py (+0/-1)
pybb/migrations/0002_auto_20161001_2046.py (+19/-0)
pybb/models.py (+24/-0)
pybb/urls.py (+2/-1)
pybb/views.py (+32/-5)
templates/pybb/forum.html (+35/-4)
templates/pybb/inlines/display_category.html (+16/-7)
templates/pybb/last_posts.html (+7/-5)
templates/pybb/pybb_moderate_info.html (+12/-0)
templates/pybb/topic.html (+2/-0)
To merge this branch: bzr merge lp:~widelands-dev/widelands-website/anti_spam
Reviewer Review Type Date Requested Status
SirVer Approve
GunChleoc Approve
Review via email: mp+307869@code.launchpad.net

Commit message

Hide all posts/topics which are potentially spam using a keyword filter.

Description of the change

Hide all posts/topics which are potentially spam using a keyword filter.

Add a boolean 'hided' field to pybb.Post.

Post filter: Applies if a post contains 'vashikaran' AND 'baba'. This catches each case insentive occurrence of this strings, so 'VaShIkaRan' is caught as well as 'babaco' or 'blabababla'. Because 'Baba' is also known as 'Bye-bye' these keywords must occur both.

Topic name filter: Applies if ' baba ' (as a word) OR 'ji' is used. Also case insensitive. Should catch ' baBaJi' as 'baBa ji' or ' baba '

The filters are set in pybb/views.py

When commiting on the server, ./manage.py migrate must be run before the website is restartet. This adds the 'hided' field to all columns in the database with default value 'False'.

To post a comment you must log in.
Revision history for this message
kaputtnik (franku) wrote :

Hm, the diff is truncated... maybe remove the akismet api?

Revision history for this message
SirVer (sirver) wrote :

That Is Fine. Well review offline then. I hope I get around to it tomorrow. Maybe it will be Saturday.

> Am 06.10.2016 um 20:02 schrieb kaputtnik <email address hidden>:
>
> Hm, the diff is truncated... maybe remove the akismet api?
> --
> https://code.launchpad.net/~widelands-dev/widelands-website/anti_spam/+merge/307869
> You are subscribed to branch lp:widelands-website.

Revision history for this message
kaputtnik (franku) wrote :

Just a thought of last night:
Change the Text of the redirect page saying that EACH Post is hided now and will be moderated.

So if the spammers ever read this, they should assume that spamming has no chance here. If a non spammer read this (i guess this would be very rare, if it ever comes up) we could explain him the text.

And of course, it would be better to have the lists of keywords in local_settings.py.

I am very busy these days with normal work, so on Saturday i am working the whole day and in the evening my son is visiting me.

Revision history for this message
GunChleoc (gunchleoc) wrote :

Just a quick note: past tense of "hide" is "hid" ;)

429. By kaputtnik

changed text of redirect page

430. By GunChleoc

Proofreading.

Revision history for this message
GunChleoc (gunchleoc) wrote :

I have decided to rename hided -> hidden right now, because it affects the models.

Not tested but code LGTM as far as I can tell.

I'd say do a final text to make sure that I didn't mess anything up and then go live.

review: Approve
431. By SirVer

Minor nits.

Revision history for this message
SirVer (sirver) wrote :

1) Is it necessary that we ship the askimet client in our repo, i.e. can it not be installed wia pip_requirements.txt? We definitively should remove the .svn directory if possible. Also, for this branch it is completely unused, so I actually now agree with #1 and suggest removing the API for now.

2) I was also changing hided -> hidden, at the very same time than Gun - but she beat me to submitting :).

3) You should also check in the title of the post, not only the body. We had some spam that only had data in the title. You can do that cheaply by just concatenating: text = title + body and then working over text.

4) It might make sense to add a regular expression checking for international phone numbers - I do not think such a post can ever be legit.

5) You are marking any post of a user with 'ji' in its name as SPAM. Is that not a bit too broad?

6) There is no admin notification right now - so we will not know if posts are in the queue for moderation. I think that is fine for now, but we will need to remember to check daily once this is deployed until we have that. I am all for deploying this soon though, this SPAM is annoying.

review: Approve
432. By kaputtnik

removed akismet api

Revision history for this message
kaputtnik (franku) wrote :

Just short answers because i am in hurry:

> 1) Is it necessary that we ship the askimet client in our repo, i.e. can it
> not be installed wia pip_requirements.txt?

I don't why i am downloaded the complete zip file, whereas one could also download the py-file (containing all classes and functions) I have removed the complete zip file now in this branch. Regarding to pip_requirements: I actually don't know, sorry. But a single file shouldn't be a problem for our repo.

> 3) You should also check in the title of the post, not only the body. We had
> some spam that only had data in the title. You can do that cheaply by just
> concatenating: text = title + body and then working over text.

This is already in there. See the description of this merge proposal "Topic name filter: "

> 4) It might make sense to add a regular expression checking for international
> phone numbers - I do not think such a post can ever be legit.

Yes, but i am not a regular expression freak :-D Will take some time and tests for me to do something like that.

> 5) You are marking any post of a user with 'ji' in its name as SPAM. Is that
> not a bit too broad?

Yes, maybe change it into ' ji ' or ' ji'

> 6) There is no admin notification right now - so we will not know if posts are
> in the queue for moderation. I think that is fine for now, but we will need to
> remember to check daily once this is deployed until we have that. I am all for
> deploying this soon though, this SPAM is annoying.

Yes, if we add admin notification, we should also prevent notification of normal users (who has 'Inform me on new topics" activated).

Thanks for the changes :-)

Revision history for this message
kaputtnik (franku) wrote :

There is a akismet Pypi package, so we could easily add this to pip_requirements.txt.

> > 5) You are marking any post of a user with 'ji' in its name as SPAM. Is that
> > not a bit too broad?
>
> Yes, maybe change it into ' ji ' or ' ji'

Correcting: 'ji' is only used for topic names (subject). And i think it is better to have a difference between topic name and post text.

Looking at my gathered spam posts, we should also add ' molvi' to the keywords. Both in topic name and post text.

I will make the following changes tomorrow:
- Add ' molvi' to keywords
- Using of keyword lists in local_settings.py (instead of hardcoded in pybb/views.py)
- merge into trunk and deploy

For me this is quite an interesting problem and i would be interested to make an own 'anti_spam' app of it (at least because there is a lot to learn). But since akismet makes his/here job, for widelands it would be better to use this service, because it is maybe more effective in a shorter time. But if we use this we may need additional views and buttons to submit ham or spam to akismet.

433. By kaputtnik

moved list of keywords to local_settings.py

Revision history for this message
kaputtnik (franku) wrote :

Merged and deployed.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'pybb/admin.py'
2--- pybb/admin.py 2012-03-18 21:06:49 +0000
3+++ pybb/admin.py 2016-10-09 11:17:41 +0000
4@@ -39,7 +39,7 @@
5 ),
6 (_('Additional options'), {
7 'classes': ('collapse',),
8- 'fields': (('views',), ('sticky', 'closed'), 'subscribers')
9+ 'fields': (('views',), ('sticky', 'closed', 'hidden'), 'subscribers')
10 }
11 ),
12 )
13@@ -57,7 +57,7 @@
14 ),
15 (_('Additional options'), {
16 'classes': ('collapse',),
17- 'fields' : (('created', 'updated'), 'user_ip')
18+ 'fields' : (('created', 'updated'), 'user_ip', 'hidden')
19 }
20 ),
21 (_('Message'), {
22
23=== modified file 'pybb/forms.py'
24--- pybb/forms.py 2016-06-22 21:02:53 +0000
25+++ pybb/forms.py 2016-10-09 11:17:41 +0000
26@@ -66,7 +66,6 @@
27 post = Post(topic=topic, user=self.user, user_ip=self.ip,
28 markup=self.cleaned_data['markup'],
29 body=self.cleaned_data['body'])
30-
31 post.save(*args, **kwargs)
32
33 if pybb_settings.ATTACHMENT_ENABLE:
34
35=== added file 'pybb/migrations/0002_auto_20161001_2046.py'
36--- pybb/migrations/0002_auto_20161001_2046.py 1970-01-01 00:00:00 +0000
37+++ pybb/migrations/0002_auto_20161001_2046.py 2016-10-09 11:17:41 +0000
38@@ -0,0 +1,19 @@
39+# -*- coding: utf-8 -*-
40+from __future__ import unicode_literals
41+
42+from django.db import models, migrations
43+
44+
45+class Migration(migrations.Migration):
46+
47+ dependencies = [
48+ ('pybb', '0001_initial'),
49+ ]
50+
51+ operations = [
52+ migrations.AddField(
53+ model_name='post',
54+ name='hidden',
55+ field=models.BooleanField(default=False, verbose_name='Hidden'),
56+ ),
57+ ]
58
59=== modified file 'pybb/models.py'
60--- pybb/models.py 2016-06-22 21:02:53 +0000
61+++ pybb/models.py 2016-10-09 11:17:41 +0000
62@@ -95,6 +95,13 @@
63 except IndexError:
64 return None
65
66+ @property
67+ def last_nonhidden_post(self):
68+ posts = self.posts.order_by('-created').filter(hidden=False).select_related()
69+ try:
70+ return posts[0]
71+ except IndexError:
72+ return None
73
74 class Topic(models.Model):
75 forum = models.ForeignKey(Forum, related_name='topics', verbose_name=_('Forum'))
76@@ -132,6 +139,22 @@
77 return self.posts.all().order_by('-created').select_related()[0]
78
79 @property
80+ def last_nonhidden_post(self):
81+ try:
82+ return self.posts.all().order_by('-created').filter(hidden=False).select_related()[0]
83+ except IndexError:
84+ return self.posts.all().order_by('-created').select_related()[0]
85+
86+ # If the first post of this topic is hidden, the topic is hidden
87+ @property
88+ def is_hidden(self):
89+ try:
90+ p = self.posts.all().order_by('created').filter(hidden=False).select_related()[0]
91+ except IndexError:
92+ return True
93+ return False
94+
95+ @property
96 def post_count(self):
97 return Post.objects.filter(topic=self).count()
98
99@@ -193,6 +216,7 @@
100 body_html = models.TextField(_('HTML version'))
101 body_text = models.TextField(_('Text version'))
102 user_ip = models.GenericIPAddressField(_('User IP'), default='')
103+ hidden = models.BooleanField(_('Hidden'), blank=True, default=False)
104
105 # Django sphinx
106 if settings.USE_SPHINX:
107
108=== modified file 'pybb/urls.py'
109--- pybb/urls.py 2016-06-04 14:17:40 +0000
110+++ pybb/urls.py 2016-10-09 11:17:41 +0000
111@@ -30,7 +30,8 @@
112 url('^post/(?P<post_id>\d+)/$', views.show_post, name='pybb_post'),
113 url('^post/(?P<post_id>\d+)/edit/$', views.edit_post, name='pybb_edit_post'),
114 url('^post/(?P<post_id>\d+)/delete/$', views.delete_post, name='pybb_delete_post'),
115-
116+ url('pybb_moderate_info/$', views.pybb_moderate_info),
117+
118 # Attachment
119 url('^attachment/(?P<hash>\w+)/$', views.show_attachment, name='pybb_attachment'),
120
121
122=== modified file 'pybb/views.py'
123--- pybb/views.py 2016-06-15 19:20:24 +0000
124+++ pybb/views.py 2016-10-09 11:17:41 +0000
125@@ -1,6 +1,6 @@
126 import math
127 from mainpage.templatetags.wl_markdown import do_wl_markdown
128-from pybb.markups import mypostmarkup
129+from pybb.markups import mypostmarkup
130
131 from django.shortcuts import get_object_or_404
132 from django.http import HttpResponseRedirect, HttpResponse, HttpResponseNotFound, Http404
133@@ -10,13 +10,15 @@
134 from django.core.urlresolvers import reverse
135 from django.db import connection
136 from django.utils import translation
137+from django.shortcuts import render
138
139 from pybb.util import render_to, paged, build_form, quote_text, paginate, set_language, ajax, urlize
140 from pybb.models import Category, Forum, Topic, Post, PrivateMessage, Attachment,\
141 MARKUP_CHOICES
142-from pybb.forms import AddPostForm, EditPostForm, UserSearchForm
143+from pybb.forms import AddPostForm, EditPostForm, UserSearchForm
144 from pybb import settings as pybb_settings
145 from pybb.orm import load_related
146+from django.conf import settings
147
148 try:
149 from notification import models as notification
150@@ -74,7 +76,7 @@
151 }
152 show_forum = render_to('pybb/forum.html')(show_forum_ctx)
153
154-
155+
156 def show_topic_ctx(request, topic_id):
157
158 try:
159@@ -112,7 +114,7 @@
160 # profiles = Profile.objects.filter(user__pk__in=
161 # set(x.user.id for x in page.object_list))
162 # profiles = dict((x.user_id, x) for x in profiles)
163-
164+
165 # for post in page.object_list:
166 # post.user.pybb_profile = profiles[post.user.id]
167
168@@ -159,7 +161,29 @@
169 initial={'markup': "markdown", 'body': quote})
170
171 if form.is_valid():
172- post = form.save();
173+ # TODO: Add akismet check here
174+ spam = False
175+
176+ # Check in post text.
177+ text = form.cleaned_data['body']
178+ if any(x in text.lower() for x in settings.ANTI_SPAM_BODY):
179+ spam = True
180+
181+ # Check in topic subject ('name' is empty if this a post to an existing topic)
182+ text = form.cleaned_data['name']
183+ if text != '':
184+ # This is a new topic
185+ if any(x in text.lower() for x in settings.ANTI_SPAM_TOPIC):
186+ spam = True
187+
188+ post = form.save()
189+ if spam:
190+ # Hide the post against normal users
191+ post.hidden = True
192+ post.save()
193+ # Redirect to an info page to inform the user
194+ return HttpResponseRedirect('pybb_moderate_info')
195+
196 if not topic:
197 post.topic.subscribers.add(request.user)
198 return HttpResponseRedirect(post.get_absolute_url())
199@@ -353,3 +377,6 @@
200
201 html = urlize(html)
202 return {'content': html}
203+
204+def pybb_moderate_info(request):
205+ return render(request, 'pybb/pybb_moderate_info.html')
206
207=== modified file 'templates/pybb/forum.html'
208--- templates/pybb/forum.html 2016-04-29 07:28:16 +0000
209+++ templates/pybb/forum.html 2016-10-09 11:17:41 +0000
210@@ -40,7 +40,8 @@
211 </tr>
212 </thead>
213 <tbody>
214- {% for topic in topics %}
215+ {% for topic in topics %}
216+ {% if not topic.is_hidden %}
217 <tr class="{% cycle 'odd' 'even' %}">
218 <td class="forumIcon center">
219 {% if topic|pybb_has_unreads:user %}
220@@ -60,13 +61,43 @@
221 Views: {{ topic.views }}
222 </td>
223 <td class="lastPost">
224- {%if topic.last_post %}
225+ {% if user.is_superuser %}
226+ {% if topic.last_post %}
227+ {{ topic.last_post.user|user_link }} <a href="{{ topic.last_post.get_absolute_url }}">&#187;</a><br />
228+ <span class="small">on {{ topic.last_post.created|custom_date:user }}</span>
229+ {% endif %}
230+ {% else %}
231+ {{ topic.last_nonhidden_post.user|user_link }} <a href="{{ topic.last_nonhidden_post.get_absolute_url }}">&#187;</a><br />
232+ <span class="small">on {{ topic.last_nonhidden_post.created|custom_date:user }}</span>
233+ {% endif %}
234+ </td>
235+ </tr>
236+ {% elif user.is_superuser %}
237+ <tr class="{% cycle 'odd' 'even' %}">
238+ <td class="forumIcon center">
239+ {% if topic|pybb_has_unreads:user %}
240+ <img src="{{ MEDIA_URL }}forum/img/doc_big_work_star.png" style="margin: 0px;" alt="" class="middle" />
241+ {% else %}
242+ <img src="{{ MEDIA_URL }}forum/img/doc_big_work.png" style="margin: 0px;" alt="" class="middle" />
243+ {% endif %}
244+ </td>
245+ <td class="forumTitle">
246+ {% if topic.sticky %}<img src="{{ MEDIA_URL }}forum/img/sticky.png" alt="Sticky" title="Sticky" />{% endif %}
247+ {% if topic.closed %}<img src="{{ MEDIA_URL }}forum/img/closed.png" alt="Closed" title="Closed" />{% endif %}
248+ <a href="{{ topic.get_absolute_url }}">{{ topic.name }}</a><br />
249+ <span class="small">Created by {{ topic.user|user_link }} on {{ topic.created|custom_date:user }}</span>
250+ </td>
251+ <td class="forumCount center small" style="width: 120px;">
252+ Posts: {{ topic.post_count }}<br/>
253+ Views: {{ topic.views }}
254+ </td>
255+ <td class="lastPost">
256 {{ topic.last_post.user|user_link }} <a href="{{ topic.last_post.get_absolute_url }}">&#187;</a><br />
257 <span class="small">on {{ topic.last_post.created|custom_date:user }}</span>
258- {% endif %}
259 </td>
260 </tr>
261- {% endfor %}
262+ {% endif %} {# topic.is_hidden #}
263+ {% endfor %} {# topic #}
264 </tbody>
265 </table>
266
267
268=== modified file 'templates/pybb/inlines/display_category.html'
269--- templates/pybb/inlines/display_category.html 2016-03-02 21:02:38 +0000
270+++ templates/pybb/inlines/display_category.html 2016-10-09 11:17:41 +0000
271@@ -29,13 +29,22 @@
272 Topics: {{ forum.topics.count }}<br/>
273 Posts: {{ forum.posts.count }}
274 </td>
275- <td class="lastPost">
276- {%if forum.last_post %}
277- <a href="{{forum.last_post.get_absolute_url}}">{{ forum.last_post.topic.name }}</a><br />
278- <span class="small">by {{ forum.last_post.user|user_link }}<br />
279- on {{ forum.last_post.created|custom_date:user}}</span>
280- {% else %}
281- &nbsp;
282+ {% if user.is_superuser %} {# Show all to superuser #}
283+ {% if forum.last_post %}
284+ <td class="lastPost">
285+ <a href="{{forum.last_post.get_absolute_url}}">{{ forum.last_post.topic.name }}</a><br />
286+ <span class="small">by {{ forum.last_post.user|user_link }}<br />
287+ on {{ forum.last_post.created|custom_date:user}}</span>
288+ </td>
289+ {% endif %}
290+ {% else %} {# no super_user: Show only nonhidden posts#}
291+ {% if forum.last_nonhidden_post %}
292+ <td class="lastPost">
293+ <a href="{{forum.last_nonhidden_post.get_absolute_url}}">{{ forum.last_nonhidden_post.topic.name }}</a><br />
294+ <span class="small">by {{ forum.last_nonhidden_post.user|user_link }}<br />
295+ on {{ forum.last_nonhidden_post.created|custom_date:user}}</span>
296+ </td>
297+ {% endif %}
298 {% endif %}
299 </td>
300 </tr>
301
302=== modified file 'templates/pybb/last_posts.html'
303--- templates/pybb/last_posts.html 2016-03-02 21:02:38 +0000
304+++ templates/pybb/last_posts.html 2016-10-09 11:17:41 +0000
305@@ -9,11 +9,13 @@
306 <div class="columnModuleBox">
307 <ul>
308 {% for post in posts %}
309- <li>
310- {{ post.topic.forum.name }}<br />
311- <a href="{{ post.get_absolute_url }}" title="{{ post.topic.name }}">{{ post.topic.name|pybb_cut_string:30 }}</a><br />
312- by <a href="{% url 'profile_view' post.user %}">{{post.user.username}}</a> {{ post.created|minutes }} ago
313- </li>
314+ {% if not post.hidden %}
315+ <li>
316+ {{ post.topic.forum.name }}<br />
317+ <a href="{{ post.get_absolute_url }}" title="{{ post.topic.name }}">{{ post.topic.name|pybb_cut_string:30 }}</a><br />
318+ by <a href="{% url 'profile_view' post.user %}">{{post.user.username}}</a> {{ post.created|minutes }} ago
319+ </li>
320+ {% endif %}
321 {% endfor %}
322 </ul>
323 </div>
324
325=== added file 'templates/pybb/pybb_moderate_info.html'
326--- templates/pybb/pybb_moderate_info.html 1970-01-01 00:00:00 +0000
327+++ templates/pybb/pybb_moderate_info.html 2016-10-09 11:17:41 +0000
328@@ -0,0 +1,12 @@
329+{% extends 'pybb/base.html' %}
330+
331+{% block content %}
332+
333+<h1>All comments have to be moderated</h1>
334+
335+<div class="blogEntry">
336+ <p>Your comment has been saved but hidden to normal users. A moderator
337+ will take a look at it and review it as soon as possible.</p>
338+</div>
339+
340+{% endblock %}
341
342=== modified file 'templates/pybb/topic.html'
343--- templates/pybb/topic.html 2016-07-31 08:44:48 +0000
344+++ templates/pybb/topic.html 2016-10-09 11:17:41 +0000
345@@ -157,6 +157,7 @@
346 <table class="forum">
347 <tbody>
348 {% for post in posts %}
349+ {% if not post.hidden or user.is_superuser %}
350 <tr class="{% cycle 'odd' 'even' %}">
351 <td class="author">
352 {{ post.user|user_link }}<br />
353@@ -228,6 +229,7 @@
354 {% endif %}
355 </td>
356 </tr>
357+ {% endif %}
358 <tr class="spacer">
359 <td></td>
360 <td></td>

Subscribers

People subscribed via source and target branches