Merge lp:~milo/linaro-patchmetrics/parsemail-fixes into lp:linaro-patchmetrics

Proposed by Milo Casagrande
Status: Merged
Merged at revision: 401
Proposed branch: lp:~milo/linaro-patchmetrics/parsemail-fixes
Merge into: lp:linaro-patchmetrics
Diff against target: 236 lines (+91/-38)
2 files modified
apps/patchwork/bin/parsemail.py (+64/-37)
apps/patchwork/utils.py (+27/-1)
To merge this branch: bzr merge lp:~milo/linaro-patchmetrics/parsemail-fixes
Reviewer Review Type Date Requested Status
Paul Sokolovsky Needs Fixing
Review via email: mp+197241@code.launchpad.net

Description of the change

This is a fix for the script that parses the emails searching for patches. With the new changes, if the submitter of a patch nor the author of a patch are using Linaro email address, we do not consider the patch and we move on.

Another change made, is to when the saving action on a new Person (in this case a new author) happens. Previously, if a new author was found, it was saved immediately, now we save it only after making sure it is a valid Linaro engineer. In this way we do the exact same thing as it was done with the Submitter person, and we do not save authors that we do not want to track.

To post a comment you must log in.
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Logic looks good. And I usually don't go at such details with stylistic changes, but for few quick looks at linaro-patchmetrics codebase, I felt confused and saw few less-than-ideal practices and hardcodings. So, let's try to do better.

1. Can "new_author" var be named "is_new_auhor"?

2.

8 - return None, None
9 + return None, None, None

From the rest of patch it's possible to figure out what 3rd return value is, and I'm not sure if that func had a docstring, but maybe it might?

3.

57 + # If submitter nor author of a patch are not Linaro, we move on.
58 + # TODO: may be worth to integrate with crowd.
59 + if "@linaro.org" not in submitter.email:
60 + if author is not None and "@linaro.org" not in author.email:
61 + return 0
62 + return 0

There're 2 issues here: one is that hardcoded "@linaro.org". We can do better ;-) (yeah, I know it takes much more patch lines to use config settings, and sometimes with my changes I wonder if I should bother either; friendly answer looking from outside - yes ;-) ). And we should anticipate need to support multiple patterns - either as array, or better just as a regexp.

Another is that there's superfluous/erroneous logic - if first "if" fired, it's "return 0" regardless of second
"if". If code should look like that, there should be a comment (like, "one sweet day, we may want to handle these conditions separately"). Otherwise...

review: Needs Fixing
Revision history for this message
Milo Casagrande (milo) wrote :

Thanks Paul for looking into it!

On Wed, Dec 4, 2013 at 2:25 PM, Paul Sokolovsky
<email address hidden> wrote:
> Review: Needs Fixing
>
> Logic looks good. And I usually don't go at such details with stylistic changes, but for few quick looks at linaro-patchmetrics codebase, I felt confused and saw few less-than-ideal practices and hardcodings. So, let's try to do better.
>
> 1. Can "new_author" var be named "is_new_auhor"?
>
> 2.
>
> 8 - return None, None
> 9 + return None, None, None
>
> >From the rest of patch it's possible to figure out what 3rd return value is, and I'm not sure if that func had a docstring, but maybe it might?

I added info in the function doc string.

>
> 57 + # If submitter nor author of a patch are not Linaro, we move on.
> 58 + # TODO: may be worth to integrate with crowd.
> 59 + if "@linaro.org" not in submitter.email:
> 60 + if author is not None and "@linaro.org" not in author.email:
> 61 + return 0
> 62 + return 0
>
> There're 2 issues here: one is that hardcoded "@linaro.org". We can do better ;-) (yeah, I know it takes much more patch lines to use config settings, and sometimes with my changes I wonder if I should bother either; friendly answer looking from outside - yes ;-) ). And we should anticipate need to support multiple patterns - either as array, or better just as a regexp.

I know about the hardcoded @linaro.org values (there are a few more in
Linaro patchwork code), but I'm not going to fix them here. The code
cannot go in production like that, I'm aware of it, that's why I would
like the other MP in first, to be able to address this issue as well.
The other MP, that is bigger, adds some integration with Crowd that we
can use here as well, that's why there is that TODO.

I do not like the regexp approach, at least in this case: we can do
even better than that, and it is taking data out of Crowd.

The problem cannot be solved by simply querying Crowd though:
switching to a subscription model for patches.l.o (we subscribe to the
mailing list, they do not send the patches directly anymore), it means
that we can receive even more than 500 mails per day, that will lead
to 500 Crowd requests. When I tested the old patch, IT were already
looking at who was hitting the server (and it was with far less
requests). We need to locally cache those requests: the other MP tries
to solve that.

> Another is that there's superfluous/erroneous logic - if first "if" fired, it's "return 0" regardless of second
> "if". If code should look like that, there should be a comment (like, "one sweet day, we may want to handle these conditions separately"). Otherwise...

There was one "return 0" too much, fixed.

--
Milo Casagrande | Automation Engineer
Linaro.org <www.linaro.org> │ Open source software for ARM SoCs

401. By Milo Casagrande

Fixed doc string, refactored variable, fixed if-logic.

402. By Milo Casagrande

Fixed if clause.

403. By Milo Casagrande

Merged from trunk.

404. By Milo Casagrande

Added crowd integration for email checking.

405. By Milo Casagrande

Fixed if clause.

406. By Milo Casagrande

Refactored code.

  * Moved one function from the parsemail script into the utils.py file.
    Had to import one function from utils.py in parsemail.py, but apparently
    the import was breaking something (probably circular import).

Revision history for this message
Milo Casagrande (milo) wrote :

Added the Crowd integration also on the check for valid patch submitter or authors.
Had to refactor some code: I needed to import a function form utils.py, but utils.py was importing a function from parsemail.py, resulting in an import error when the script is executed.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'apps/patchwork/bin/parsemail.py'
2--- apps/patchwork/bin/parsemail.py 2013-11-29 16:20:37 +0000
3+++ apps/patchwork/bin/parsemail.py 2013-12-05 18:37:48 +0000
4@@ -20,6 +20,7 @@
5 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
6
7 import _pythonpath
8+import atexit
9 import chardet
10 import datetime
11 import operator
12@@ -37,8 +38,14 @@
13 from email.Header import Header, decode_header
14 from email.Utils import parsedate_tz, mktime_tz
15
16+from patchmetrics.crowd import Crowd
17+from patchwork.db import PatchworkDB
18 from patchwork.parser import parse_patch
19 from patchwork.models import Patch, Project, Person, Comment, State
20+from patchwork.utils import (
21+ is_linaro_user,
22+ parse_name_and_email,
23+)
24 from django.contrib.auth.models import User
25
26 default_patch_state = 'New'
27@@ -118,33 +125,6 @@
28 return project
29
30
31-def parse_name_and_email(text):
32- # tuple of (regex, fn)
33- # - where fn returns a (name, email) tuple from the match groups resulting
34- # from re.match().groups()
35- from_res = [
36- # for "Firstname Lastname" <example@example.com> style addresses
37- (re.compile('"?(.*?)"?\s*<([^>]+)>'), (lambda g: (g[0], g[1]))),
38-
39- # for example@example.com (Firstname Lastname) style addresses
40- (re.compile('"?(.*?)"?\s*\(([^\)]+)\)'), (lambda g: (g[1], g[0]))),
41-
42- # everything else
43- (re.compile('(.*)'), (lambda g: (None, g[0]))),
44- ]
45-
46- for regex, fn in from_res:
47- match = regex.match(text)
48- if match:
49- name, email = fn(match.groups())
50- if name is not None:
51- name = name.strip()
52- if email is not None:
53- email = email.strip()
54- return name, email
55- return None, None
56-
57-
58 def find_submitter(mail):
59
60 from_header = clean_header(mail.get('From'))
61@@ -282,18 +262,22 @@
62
63
64 def find_patch_name_and_author(comment):
65- """Return the patch name and author as specified in the given comment.
66+ """Finds the patch name, and the patch author.
67
68 The name and author of a patch are sometimes different from the Subject
69 and From header fields and so are specified in the body of the message,
70 before the actual patch. This is identified by "From: " and "Subject: "
71 lines starting the body, which is the format understood by 'git-am'.
72+
73+ :return A tuple consisting of: the name of the patch, the author of the
74+ patch, and a boolean indicating if the author needs to be saved in the
75+ database.
76 """
77 if not comment:
78- return None, None
79+ return None, None, None
80 lines = comment.content.split('\n')
81 if len(lines) == 0:
82- return None, None
83+ return None, None, None
84 # Append an extra line at the end so that our code can assume there will
85 # always be at least two lines, simplifying things significantly.
86 lines.append('')
87@@ -308,25 +292,26 @@
88 if second_line.startswith('From:'):
89 author_line = second_line
90 else:
91- return None, None
92+ return None, None, None
93
94 subject = None
95 if subject_line is not None:
96 subject = subject_line.replace('Subject:', '').strip()
97
98 if author_line is None:
99- return subject, None
100+ return subject, None, None
101
102 name, email = parse_name_and_email(author_line.replace('From:', ''))
103 if email is None:
104- return subject, None
105+ return subject, None, None
106
107 try:
108 person = Person.objects.get(email__iexact=email)
109+ new_person = False
110 except Person.DoesNotExist:
111 person = Person(name=name, email=email)
112- person.save()
113- return subject, person
114+ new_person = True
115+ return subject, person, new_person
116
117
118 def find_patch_for_comment(project, mail):
119@@ -493,6 +478,39 @@
120 return None
121
122
123+def is_valid_patch(submitter, author):
124+ """A patch is valid if submitter or author are valid Linaro users.
125+
126+ :param submitter: The submitter to check, a Person object.
127+ :param author: The author to check, a Person object.
128+ :return True if one between submitter and author is a valid Linaro user.
129+ """
130+ valid_patch = False
131+
132+ if (settings.AUTH_CROWD_APPLICATION_USER is None and
133+ settings.CROWD_USERS_DB_FILE is None):
134+ # If none of the above settings is defined, fallback to an hardoceded
135+ # check and accept only linaro.org email addresses.
136+ if "@linaro.org" in submitter.email:
137+ valid_patch |= True
138+
139+ if author is not None and "@linaro.org" in author.email:
140+ valid_patch |= True
141+ else:
142+ crwd = Crowd(settings.AUTH_CROWD_APPLICATION_USER,
143+ settings.AUTH_CROWD_APPLICATION_PASSWORD,
144+ settings.AUTH_CROWD_SERVER_REST_URI)
145+
146+ cache_db = PatchworkDB(settings.CROWD_USERS_DB_FILE)
147+ atexit.register(cache_db.close)
148+
149+ valid_patch |= is_linaro_user(submitter.email, cache_db, crwd)
150+ if author is not None:
151+ valid_patch |= is_linaro_user(author.email, cache_db, crwd)
152+
153+ return valid_patch
154+
155+
156 def parse_mail(mail):
157
158 # some basic sanity checks
159@@ -517,8 +535,14 @@
160 msgid = mail.get('Message-Id').strip()
161
162 (submitter, save_required) = find_submitter(mail)
163-
164 (patch, comment) = find_content(project, mail)
165+ (name, author, is_new_author) = find_patch_name_and_author(comment)
166+
167+ # If submitter nor author of a patch are not Linaro, we move on.
168+ valid_patch = is_valid_patch(submitter, author)
169+
170+ if not valid_patch:
171+ return 0
172
173 if patch:
174 # we delay the saving until we know we have a patch.
175@@ -526,11 +550,14 @@
176 submitter.save()
177 save_required = False
178 patch.submitter = submitter
179- name, author = find_patch_name_and_author(comment)
180+
181+ if is_new_author:
182+ author.save()
183 if author is not None:
184 patch.author = author
185 if name is not None:
186 patch.name = name
187+
188 patch.msgid = msgid
189 patch.project = project
190 patch.state = get_state(mail.get('X-Patchwork-State', '').strip())
191
192=== modified file 'apps/patchwork/utils.py'
193--- apps/patchwork/utils.py 2013-10-16 13:25:16 +0000
194+++ apps/patchwork/utils.py 2013-12-05 18:37:48 +0000
195@@ -29,7 +29,6 @@
196 from difflib import SequenceMatcher
197 from django.conf import settings
198 from django.shortcuts import get_object_or_404
199-from patchwork.bin.parsemail import parse_name_and_email
200 from patchwork.models import Bundle, BundlePatch, Patch, Person, State
201 from patchwork.parser import hash_patch, parse_patch
202
203@@ -83,6 +82,33 @@
204 return ids
205
206
207+def parse_name_and_email(text):
208+ # tuple of (regex, fn)
209+ # - where fn returns a (name, email) tuple from the match groups resulting
210+ # from re.match().groups()
211+ from_res = [
212+ # for "Firstname Lastname" <example@example.com> style addresses
213+ (re.compile('"?(.*?)"?\s*<([^>]+)>'), (lambda g: (g[0], g[1]))),
214+
215+ # for example@example.com (Firstname Lastname) style addresses
216+ (re.compile('"?(.*?)"?\s*\(([^\)]+)\)'), (lambda g: (g[1], g[0]))),
217+
218+ # everything else
219+ (re.compile('(.*)'), (lambda g: (None, g[0]))),
220+ ]
221+
222+ for regex, fn in from_res:
223+ match = regex.match(text)
224+ if match:
225+ name, email = fn(match.groups())
226+ if name is not None:
227+ name = name.strip()
228+ if email is not None:
229+ email = email.strip()
230+ return name, email
231+ return None, None
232+
233+
234 class Order(object):
235 order_map = {
236 'date': 'date',

Subscribers

People subscribed via source and target branches