Merge lp:~widelands-dev/widelands-website/anti_spam_app into lp:widelands-website

Proposed by kaputtnik
Status: Merged
Merged at revision: 477
Proposed branch: lp:~widelands-dev/widelands-website/anti_spam_app
Merge into: lp:widelands-website
Diff against target: 434 lines (+207/-64)
14 files modified
check_input/__init__.py (+1/-0)
check_input/admin.py (+18/-0)
check_input/apps.py (+5/-0)
check_input/management/commands/send_suspicious_mail.py (+15/-13)
check_input/migrations/0001_initial.py (+30/-0)
check_input/models.py (+50/-0)
check_input/urls.py (+6/-0)
check_input/views.py (+36/-0)
local_settings.py.sample (+1/-2)
pybb/forms.py (+1/-30)
pybb/views.py (+34/-16)
settings.py (+1/-0)
templates/check_input/moderate_info.html (+8/-3)
urls.py (+1/-0)
To merge this branch: bzr merge lp:~widelands-dev/widelands-website/anti_spam_app
Reviewer Review Type Date Requested Status
SirVer Approve
Review via email: mp+334232@code.launchpad.net

Description of the change

Decouple check for spam (suspicious user content) from pybb. Main goal is to add the possibility to add such checks to different apps, e.g. wiki or comments.

For pybb we have already the hidden property, so i used it. For other apps it is maybe possible to add a template filter or a middleware to hide suspicious content. This is work for the future...

- Implement a new app 'check_input' containing:
- A new model SuspiciousInput which is responsible for checking users input and collecting data about suspicious content. Beside the username and suspicious text, the model stores the applications model where this suspicious text comes from. E.g. pybb/Post or pybb/Topic with the appropriate ID
- Moved management command for sending hidden_posts_mail into the new app and renamed it
- Added an admin page for the new model
- Removed the http 403 response, instead reworked the informational view of moderating users input which is prompted to the user who wrote suspicious content.

- Make pybb use of the new app, including a check when editing posts

Deleting a user removes also the entries in model SuspiciousInput.
An entry in SuspiciousInput get not deleted when a post get unhidden. So an admin is responsible for removing the entry in this case.

To post a comment you must log in.
485. By kaputtnik

renamed managment command

486. By kaputtnik

forgot to adjust local_settings.py.sample

487. By kaputtnik

a semantical better way of invoking the suspicious check

Revision history for this message
SirVer (sirver) wrote :

lgtm!

review: Approve
Revision history for this message
kaputtnik (franku) wrote :

Thanks for the review :-)

Will merge it at Thursday. I want to make backup of the database before merging this. Making a backup of the database prevent saving anything. So i am thinking about having an additional noticeable row below the navigation showing something like

"The website is currently set to read only!"

So there is no need to set the website into maintenance mode when making a backup, and visitors are informed. What do you think?

Revision history for this message
GunChleoc (gunchleoc) wrote :

I think that's a good idea - maybe still switch the forums off though, since navigating them also needs changes to the database.

Revision history for this message
kaputtnik (franku) wrote :

After reading your post and rethinking my suggestion it is much problematic as it seems. Beside the forums, there are also other things which write to the database while using the website, e.g. Downloading a map (Increasing the download number).

It is also not clear how to handle active sessions. Say a user with an active session visits the website when it is read only and then he tries to log out.

Better set the website status to maintenance when making a backup. Currently this seems saver to me :-)

Revision history for this message
SirVer (sirver) wrote :

Yes, setting it to maintance is surely saver - but you could run a backup once on the live system just to see how long it takes and if it is feasible to set the site to maintance for this.

Revision history for this message
kaputtnik (franku) wrote :

> but you could run a backup once on the live system just to see how long it takes

I remember the last time i did this it was not clear to me how long it takes. Say if the terminal lost the ssh connection it stays in a frozen way (no response from the server). That time it takes incredible long time... but i am not sure if the time was really used by mysqldump because i cant see if the connection was lost. Locally making a mysqldump is very fast, faster than with mixed InnoDB and MyIsam tables.

488. By kaputtnik

corrected the doc string of model SuspiciousInput

Revision history for this message
SirVer (sirver) wrote :

You can check the duration by using /usr/bin/time <command>. You can avoid terminal connection drop by running tmux.

Basically:

$ tmux
$ /usr/bin/time mysqldump <more options>
Ups connection to the server dropped here at some point.

ssh again, now

$ tmux attach
here you are, back in your old session.

Revision history for this message
kaputtnik (franku) wrote :

Merged and deployed.

Making a backup of the database was done in a few seconds... I have to admit that i only made a backup of the productive database.

User only having staff status (and are not superusers), may not see the entry "Check Input/Suspicious Input" on the main admin page. I gave GunCheloc the appropriate rights so she can see and enter the sub page. Don't know if some one else should also have the rights.

Hopefully all works as intended and testet locally ...

I use screen as a terminal multiplexer, i think your approach with tmux works also with screen?

Revision history for this message
SirVer (sirver) wrote :

> I use screen as a terminal multiplexer, i think your approach with tmux works also with screen?

Yes, it does.

Thanks for deploying!

Revision history for this message
GunChleoc (gunchleoc) wrote :

I think the rights are sufficient - I don't think that anybody else will want to change this at the moment.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added directory 'check_input'
2=== added file 'check_input/__init__.py'
3--- check_input/__init__.py 1970-01-01 00:00:00 +0000
4+++ check_input/__init__.py 2017-11-29 22:46:31 +0000
5@@ -0,0 +1,1 @@
6+default_app_config = 'check_input.apps.CheckInput'
7
8=== added file 'check_input/admin.py'
9--- check_input/admin.py 1970-01-01 00:00:00 +0000
10+++ check_input/admin.py 2017-11-29 22:46:31 +0000
11@@ -0,0 +1,18 @@
12+from check_input.models import SuspiciousInput
13+from django.contrib import admin
14+from django.contrib.contenttypes.models import ContentType
15+
16+
17+class SuspiciousInputAdmin(admin.ModelAdmin):
18+ list_display = ('text', 'user', 'get_app')
19+ readonly_fields = ('text', 'user', 'get_app',)
20+ exclude = ('content_type', 'object_id', )
21+
22+ def get_app(self, obj):
23+ app = ContentType.objects.get_for_id(
24+ obj.content_type_id)
25+
26+ return '%s/%s' % (app.app_label, app.name)
27+ get_app.short_description = 'Found in App/Model'
28+
29+admin.site.register(SuspiciousInput, SuspiciousInputAdmin)
30
31=== added file 'check_input/apps.py'
32--- check_input/apps.py 1970-01-01 00:00:00 +0000
33+++ check_input/apps.py 2017-11-29 22:46:31 +0000
34@@ -0,0 +1,5 @@
35+from django.apps import AppConfig
36+
37+class CheckInput(AppConfig):
38+ name = 'check_input'
39+ verbose_name = "Check Input"
40
41=== added directory 'check_input/management'
42=== added file 'check_input/management/__init__.py'
43=== added directory 'check_input/management/commands'
44=== added file 'check_input/management/commands/__init__.py'
45=== renamed file 'pybb/management/commands/send_hidden_post_mail.py' => 'check_input/management/commands/send_suspicious_mail.py'
46--- pybb/management/commands/send_hidden_post_mail.py 2016-12-13 18:28:51 +0000
47+++ check_input/management/commands/send_suspicious_mail.py 2017-11-29 22:46:31 +0000
48@@ -1,24 +1,26 @@
49 from django.core.management.base import BaseCommand
50-from pybb.models import Post
51+from check_input.models import SuspiciousInput
52 from django.core.mail import send_mail
53 from django.conf import settings
54 from django.contrib.sites.models import Site
55+from django.contrib.contenttypes.models import ContentType
56
57
58 class Command(BaseCommand):
59- help = 'Send emails if hidden posts are found'
60+ help = 'Send email if suspicious content is found'
61
62 def handle(self, *args, **options):
63- hidden_posts = Post.objects.filter(hidden=True)
64-
65- if hidden_posts:
66- message = 'There were %d hidden posts found:' % len(hidden_posts)
67- for post in hidden_posts:
68- message += '\n' + post.user.username + \
69- ': ' + post.body_text[:70]
70-
71- message += '\n\nAdmin page: ' + Site.objects.get_current().domain + \
72- '/admin/pybb/post/'
73+ spams = SuspiciousInput.objects.all()
74+ if spams:
75+ message = 'There were %d hidden posts found:' % len(spams)
76+
77+ for spam in spams:
78+ app = ContentType.objects.get_for_id(
79+ spam.content_type_id)
80+ message += '\nIn %s/%s: ' % (app.app_label, app.model)
81+ message += '\n User \'%s\' wrote: %s' % (spam.user, spam.text)
82+
83+ message += '\n\nAdmin page: https://%s/admin/pybb/post/' % Site.objects.get_current().domain
84 recipients = [addr[1] for addr in settings.ADMINS]
85- send_mail('Hidden posts were found', message, 'pybb@widelands.org',
86+ send_mail('Hidden posts were found', message, 'admins@widelands.org',
87 recipients, fail_silently=False)
88
89=== added directory 'check_input/migrations'
90=== added file 'check_input/migrations/0001_initial.py'
91--- check_input/migrations/0001_initial.py 1970-01-01 00:00:00 +0000
92+++ check_input/migrations/0001_initial.py 2017-11-29 22:46:31 +0000
93@@ -0,0 +1,30 @@
94+# -*- coding: utf-8 -*-
95+from __future__ import unicode_literals
96+
97+from django.db import models, migrations
98+from django.conf import settings
99+
100+
101+class Migration(migrations.Migration):
102+
103+ dependencies = [
104+ ('contenttypes', '0002_remove_content_type_name'),
105+ migrations.swappable_dependency(settings.AUTH_USER_MODEL),
106+ ]
107+
108+ operations = [
109+ migrations.CreateModel(
110+ name='SuspiciousInput',
111+ fields=[
112+ ('id', models.AutoField(verbose_name='ID', serialize=False, auto_created=True, primary_key=True)),
113+ ('text', models.CharField(max_length=200, verbose_name=b'suspicious user input')),
114+ ('object_id', models.PositiveIntegerField()),
115+ ('content_type', models.ForeignKey(verbose_name=b'related model', to='contenttypes.ContentType')),
116+ ('user', models.ForeignKey(verbose_name=b'related user', to=settings.AUTH_USER_MODEL)),
117+ ],
118+ options={
119+ 'ordering': ['content_type_id'],
120+ 'default_permissions': ('change', 'delete'),
121+ },
122+ ),
123+ ]
124
125=== added file 'check_input/migrations/__init__.py'
126=== added file 'check_input/models.py'
127--- check_input/models.py 1970-01-01 00:00:00 +0000
128+++ check_input/models.py 2017-11-29 22:46:31 +0000
129@@ -0,0 +1,50 @@
130+from django.db import models
131+from django.contrib.auth.models import User
132+from django.contrib.contenttypes.models import ContentType
133+from django.contrib.contenttypes.fields import GenericForeignKey
134+from django.conf import settings
135+import re
136+
137+
138+class SuspiciousInput(models.Model):
139+ """Model for collecting suspicios user input.
140+
141+ Call the check_input method with this attributes:
142+ content_object = Model instance of a saved(!) object
143+ user = user
144+ text = text to check for suspicious content
145+
146+ Example:
147+ is_suspicous = SuspiciousInput.check_input(content_object=post,
148+user=post.user, text=post.body)
149+
150+ """
151+
152+ text = models.CharField(
153+ max_length=200, verbose_name='suspicious user input')
154+ user = models.ForeignKey(User, verbose_name='related user')
155+ content_type = models.ForeignKey(ContentType, verbose_name='related model')
156+ object_id = models.PositiveIntegerField()
157+ content_object = GenericForeignKey('content_type', 'object_id')
158+
159+ class Meta:
160+ ordering = ['content_type_id']
161+ default_permissions = ('change', 'delete',)
162+
163+ def __unicode__(self):
164+ return self.text
165+
166+ def is_suspicious(self):
167+ if any(x in self.text.lower() for x in settings.ANTI_SPAM_KWRDS):
168+ return True
169+ if re.search(settings.ANTI_SPAM_PHONE_NR, self.text):
170+ return True
171+ return False
172+
173+ @classmethod
174+ def check_input(cls, *args, **kwargs):
175+ user_input = cls(*args, **kwargs)
176+ is_spam = user_input.is_suspicious()
177+ if is_spam:
178+ user_input.save()
179+ return is_spam
180
181=== added file 'check_input/urls.py'
182--- check_input/urls.py 1970-01-01 00:00:00 +0000
183+++ check_input/urls.py 2017-11-29 22:46:31 +0000
184@@ -0,0 +1,6 @@
185+from django.conf.urls import *
186+from check_input import views
187+
188+urlpatterns = [
189+ url(r'^$', views.moderate_info, name='found_spam'),
190+ ]
191
192=== added file 'check_input/views.py'
193--- check_input/views.py 1970-01-01 00:00:00 +0000
194+++ check_input/views.py 2017-11-29 22:46:31 +0000
195@@ -0,0 +1,36 @@
196+from django.shortcuts import render
197+from django.shortcuts import get_object_or_404
198+from django.http import HttpResponseRedirect
199+from django.conf import settings
200+from django.contrib.auth import logout
201+from django.contrib.auth.models import User
202+from check_input.models import SuspiciousInput
203+
204+
205+def moderate_info(request):
206+ """Redirect to the moderate comments info page."""
207+
208+ # We need the try to catch logged out users
209+ try:
210+ hidden_posts_count = SuspiciousInput.objects.filter(
211+ user=request.user).count()
212+ except TypeError:
213+ return HttpResponseRedirect('/')
214+
215+ # Don't make the page accesible through browsers addressbar
216+ if hidden_posts_count == 0:
217+ return HttpResponseRedirect('/')
218+
219+ if hidden_posts_count >= settings.MAX_HIDDEN_POSTS:
220+ user = get_object_or_404(User, username=request.user)
221+ # Set the user inactive so he can't login
222+ user.is_active = False
223+ user.save()
224+ # Log the user out
225+ logout(request)
226+
227+ context={
228+ 'max_count': settings.MAX_HIDDEN_POSTS,
229+ 'act_count': hidden_posts_count,
230+ }
231+ return render(request, 'check_input/moderate_info.html', context=context)
232
233=== modified file 'local_settings.py.sample'
234--- local_settings.py.sample 2017-11-01 16:29:00 +0000
235+++ local_settings.py.sample 2017-11-29 22:46:31 +0000
236@@ -48,8 +48,7 @@
237
238 # Anti spam keywords
239 # If these are found, the posts/topics in forum get hidden
240-ANTI_SPAM_BODY = ['spam']
241-ANTI_SPAM_TOPIC = ['spam']
242+ANTI_SPAM_KWRDS = ['spam']
243 ANTI_SPAM_PHONE_NR = re.compile('\d{8,16}')
244 MAX_HIDDEN_POSTS = 5
245
246
247=== modified file 'pybb/forms.py'
248--- pybb/forms.py 2017-04-23 21:00:01 +0000
249+++ pybb/forms.py 2017-11-29 22:46:31 +0000
250@@ -10,7 +10,6 @@
251 from pybb.models import Topic, Post, PrivateMessage, Attachment
252 from pybb import settings as pybb_settings
253 from django.conf import settings
254-from notification.models import send, get_observers_for
255
256
257 class AddPostForm(forms.ModelForm):
258@@ -62,42 +61,14 @@
259 topic_is_new = False
260 topic = self.topic
261
262- # Check for spam and hide the post
263- # TODO(Franku): This is currently a simple keyword search. Maybe add akismet check here
264- # could be improved...
265- # The admins get informed of hidden post(s) over
266- # a Django command. See pybb/management/commands
267- hidden = False
268- text = self.cleaned_data['body']
269- if any(x in text.lower() for x in settings.ANTI_SPAM_BODY):
270- hidden = True
271-
272- if re.search(settings.ANTI_SPAM_PHONE_NR, text):
273- hidden = True
274-
275- if topic_is_new:
276- text = self.cleaned_data['name']
277- if any(x in text.lower() for x in settings.ANTI_SPAM_TOPIC):
278- hidden = True
279- if re.search(settings.ANTI_SPAM_PHONE_NR, text):
280- hidden = True
281-
282 post = Post(topic=topic, user=self.user, user_ip=self.ip,
283 markup=self.cleaned_data['markup'],
284- body=self.cleaned_data['body'], hidden=hidden)
285+ body=self.cleaned_data['body'])
286 post.save(*args, **kwargs)
287
288 if pybb_settings.ATTACHMENT_ENABLE:
289 self.save_attachment(post, self.cleaned_data['attachment'])
290
291- if not hidden:
292- if topic_is_new:
293- send(get_observers_for('forum_new_topic'), 'forum_new_topic',
294- {'topic': topic, 'post': post, 'user': topic.user}, queue = True)
295- else:
296- send(self.topic.subscribers.all(), 'forum_new_post',
297- {'post': post, 'topic': topic, 'user': post.user}, queue = True)
298-
299 return post
300
301 def save_attachment(self, post, memfile):
302
303=== modified file 'pybb/views.py'
304--- pybb/views.py 2017-10-27 17:30:08 +0000
305+++ pybb/views.py 2017-11-29 22:46:31 +0000
306@@ -11,7 +11,7 @@
307 from django.db import connection
308 from django.utils import translation
309 from django.shortcuts import render
310-from django.contrib.auth import logout
311+
312
313 from pybb.util import render_to, paged, build_form, quote_text, ajax, urlize
314 from pybb.models import Category, Forum, Topic, Post, PrivateMessage, Attachment,\
315@@ -21,6 +21,7 @@
316 from pybb.orm import load_related
317
318 from wl_utils import get_real_ip
319+from check_input.models import SuspiciousInput
320
321 try:
322 from notification import models as notification
323@@ -158,22 +159,34 @@
324
325 if form.is_valid():
326 post = form.save()
327+
328+ is_spam = False
329+ # Check for spam in topics name for new topics
330 if not topic:
331- post.topic.subscribers.add(request.user)
332-
333- if post.hidden:
334- hidden_posts_count = Post.objects.filter(
335- user=request.user, hidden=True).count()
336-
337- if hidden_posts_count >= settings.MAX_HIDDEN_POSTS:
338- user = get_object_or_404(User, username=request.user)
339- # Set the user inactive so he can't login
340- user.is_active = False
341- user.save()
342- # Log the user out
343- logout(request)
344- return HttpResponse(status=403)
345- return HttpResponseRedirect('pybb_moderate_info')
346+ is_spam = SuspiciousInput.check_input(
347+ content_object=post.topic, user=post.topic.user, text=post.topic.name)
348+ # Check for spam in Post
349+ if not is_spam:
350+ is_spam = SuspiciousInput.check_input(
351+ content_object=post, user=post.user, text=post.body)
352+
353+ if is_spam:
354+ post.hidden = is_spam
355+ post.save(update_fields=['hidden'])
356+ return HttpResponseRedirect('/moderated/')
357+
358+ if notification:
359+ if not topic:
360+ # Inform subscribers of a new topic
361+ notification.send(notification.get_observers_for('forum_new_topic'), 'forum_new_topic',
362+ {'topic': post.topic, 'post': post, 'user': post.topic.user}, queue = True)
363+ # Topics author is subscriber for all new posts in his topic
364+ post.topic.subscribers.add(request.user)
365+
366+ else:
367+ # Send mails about a new post to topic subscribers
368+ notification.send(post.topic.subscribers.all(), 'forum_new_post',
369+ {'post': post, 'topic': topic, 'user': post.user}, queue = True)
370
371 return HttpResponseRedirect(post.get_absolute_url())
372
373@@ -220,6 +233,11 @@
374
375 if form.is_valid():
376 post = form.save()
377+ is_spam = SuspiciousInput.check_input(content_object=post, user=post.user, text=post.body)
378+ if is_spam:
379+ post.hidden = is_spam
380+ post.save()
381+ return HttpResponseRedirect('/moderated/')
382 return HttpResponseRedirect(post.get_absolute_url())
383
384 return {'form': form,
385
386=== modified file 'settings.py'
387--- settings.py 2017-09-29 21:24:37 +0000
388+++ settings.py 2017-11-29 22:46:31 +0000
389@@ -94,6 +94,7 @@
390 'wlmaps',
391 'wlscreens',
392 'wlggz',
393+ 'check_input',
394 'haystack', # search engine; see option HAYSTACK_CONNECTIONS
395
396 # Modified 3rd party apps
397
398=== added directory 'templates/check_input'
399=== renamed file 'templates/pybb/pybb_moderate_info.html' => 'templates/check_input/moderate_info.html'
400--- templates/pybb/pybb_moderate_info.html 2016-10-08 09:30:34 +0000
401+++ templates/check_input/moderate_info.html 2017-11-29 22:46:31 +0000
402@@ -1,12 +1,17 @@
403-{% extends 'pybb/base.html' %}
404+{% extends 'base.html' %}
405
406 {% block content %}
407
408 <h1>All comments have to be moderated</h1>
409
410 <div class="blogEntry">
411- <p>Your comment has been saved but hidden to normal users. A moderator
412- will take a look at it and review it as soon as possible.</p>
413+ <p>Your comment has to be reviewd by a moderator.</p>
414+ {% if act_count == max_count|add:"-1" %}
415+ <p><strong>Attention! The next time you will get logged out and deactivated!</strong></p>
416+ {% endif %}
417+ {% if act_count == max_count %}
418+ <p><strong>You can't login anymore...</strong></p>
419+ {% endif %}
420 </div>
421
422 {% endblock %}
423
424=== modified file 'urls.py'
425--- urls.py 2017-08-24 16:19:01 +0000
426+++ urls.py 2017-11-29 22:46:31 +0000
427@@ -65,6 +65,7 @@
428 url(r'^maps/', include('wlmaps.urls')),
429 url(r'^screenshots/', include('wlscreens.urls')),
430 url(r'^ggz/', include('wlggz.urls')),
431+ url(r'^moderated/', include('check_input.urls')),
432 ]
433
434 try:

Subscribers

People subscribed via source and target branches