Merge lp:~milo/linaro-patchmetrics/parsemail-fixes into lp:linaro-patchmetrics

Proposed by Milo Casagrande
Status: Merged
Merged at revision: 401
Proposed branch: lp:~milo/linaro-patchmetrics/parsemail-fixes
Merge into: lp:linaro-patchmetrics
Diff against target: 236 lines (+91/-38)
2 files modified
apps/patchwork/bin/parsemail.py (+64/-37)
apps/patchwork/utils.py (+27/-1)
To merge this branch: bzr merge lp:~milo/linaro-patchmetrics/parsemail-fixes
Reviewer Review Type Date Requested Status
Paul Sokolovsky Needs Fixing
Review via email: mp+197241@code.launchpad.net

Description of the change

This is a fix for the script that parses the emails searching for patches. With the new changes, if the submitter of a patch nor the author of a patch are using Linaro email address, we do not consider the patch and we move on.

Another change made, is to when the saving action on a new Person (in this case a new author) happens. Previously, if a new author was found, it was saved immediately, now we save it only after making sure it is a valid Linaro engineer. In this way we do the exact same thing as it was done with the Submitter person, and we do not save authors that we do not want to track.

To post a comment you must log in.
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Logic looks good. And I usually don't go at such details with stylistic changes, but for few quick looks at linaro-patchmetrics codebase, I felt confused and saw few less-than-ideal practices and hardcodings. So, let's try to do better.

1. Can "new_author" var be named "is_new_auhor"?

2.

8 - return None, None
9 + return None, None, None

From the rest of patch it's possible to figure out what 3rd return value is, and I'm not sure if that func had a docstring, but maybe it might?

3.

57 + # If submitter nor author of a patch are not Linaro, we move on.
58 + # TODO: may be worth to integrate with crowd.
59 + if "@linaro.org" not in submitter.email:
60 + if author is not None and "@linaro.org" not in author.email:
61 + return 0
62 + return 0

There're 2 issues here: one is that hardcoded "@linaro.org". We can do better ;-) (yeah, I know it takes much more patch lines to use config settings, and sometimes with my changes I wonder if I should bother either; friendly answer looking from outside - yes ;-) ). And we should anticipate need to support multiple patterns - either as array, or better just as a regexp.

Another is that there's superfluous/erroneous logic - if first "if" fired, it's "return 0" regardless of second
"if". If code should look like that, there should be a comment (like, "one sweet day, we may want to handle these conditions separately"). Otherwise...

review: Needs Fixing
Revision history for this message
Milo Casagrande (milo) wrote :

Thanks Paul for looking into it!

On Wed, Dec 4, 2013 at 2:25 PM, Paul Sokolovsky
<email address hidden> wrote:
> Review: Needs Fixing
>
> Logic looks good. And I usually don't go at such details with stylistic changes, but for few quick looks at linaro-patchmetrics codebase, I felt confused and saw few less-than-ideal practices and hardcodings. So, let's try to do better.
>
> 1. Can "new_author" var be named "is_new_auhor"?
>
> 2.
>
> 8 - return None, None
> 9 + return None, None, None
>
> >From the rest of patch it's possible to figure out what 3rd return value is, and I'm not sure if that func had a docstring, but maybe it might?

I added info in the function doc string.

>
> 57 + # If submitter nor author of a patch are not Linaro, we move on.
> 58 + # TODO: may be worth to integrate with crowd.
> 59 + if "@linaro.org" not in submitter.email:
> 60 + if author is not None and "@linaro.org" not in author.email:
> 61 + return 0
> 62 + return 0
>
> There're 2 issues here: one is that hardcoded "@linaro.org". We can do better ;-) (yeah, I know it takes much more patch lines to use config settings, and sometimes with my changes I wonder if I should bother either; friendly answer looking from outside - yes ;-) ). And we should anticipate need to support multiple patterns - either as array, or better just as a regexp.

I know about the hardcoded @linaro.org values (there are a few more in
Linaro patchwork code), but I'm not going to fix them here. The code
cannot go in production like that, I'm aware of it, that's why I would
like the other MP in first, to be able to address this issue as well.
The other MP, that is bigger, adds some integration with Crowd that we
can use here as well, that's why there is that TODO.

I do not like the regexp approach, at least in this case: we can do
even better than that, and it is taking data out of Crowd.

The problem cannot be solved by simply querying Crowd though:
switching to a subscription model for patches.l.o (we subscribe to the
mailing list, they do not send the patches directly anymore), it means
that we can receive even more than 500 mails per day, that will lead
to 500 Crowd requests. When I tested the old patch, IT were already
looking at who was hitting the server (and it was with far less
requests). We need to locally cache those requests: the other MP tries
to solve that.

> Another is that there's superfluous/erroneous logic - if first "if" fired, it's "return 0" regardless of second
> "if". If code should look like that, there should be a comment (like, "one sweet day, we may want to handle these conditions separately"). Otherwise...

There was one "return 0" too much, fixed.

--
Milo Casagrande | Automation Engineer
Linaro.org <www.linaro.org> │ Open source software for ARM SoCs

401. By Milo Casagrande

Fixed doc string, refactored variable, fixed if-logic.

402. By Milo Casagrande

Fixed if clause.

403. By Milo Casagrande

Merged from trunk.

404. By Milo Casagrande

Added crowd integration for email checking.

405. By Milo Casagrande

Fixed if clause.

406. By Milo Casagrande

Refactored code.

  * Moved one function from the parsemail script into the utils.py file.
    Had to import one function from utils.py in parsemail.py, but apparently
    the import was breaking something (probably circular import).

Revision history for this message
Milo Casagrande (milo) wrote :

Added the Crowd integration also on the check for valid patch submitter or authors.
Had to refactor some code: I needed to import a function form utils.py, but utils.py was importing a function from parsemail.py, resulting in an import error when the script is executed.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'apps/patchwork/bin/parsemail.py'
--- apps/patchwork/bin/parsemail.py 2013-11-29 16:20:37 +0000
+++ apps/patchwork/bin/parsemail.py 2013-12-05 18:37:48 +0000
@@ -20,6 +20,7 @@
20# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA20# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
2121
22import _pythonpath22import _pythonpath
23import atexit
23import chardet24import chardet
24import datetime25import datetime
25import operator26import operator
@@ -37,8 +38,14 @@
37 from email.Header import Header, decode_header38 from email.Header import Header, decode_header
38 from email.Utils import parsedate_tz, mktime_tz39 from email.Utils import parsedate_tz, mktime_tz
3940
41from patchmetrics.crowd import Crowd
42from patchwork.db import PatchworkDB
40from patchwork.parser import parse_patch43from patchwork.parser import parse_patch
41from patchwork.models import Patch, Project, Person, Comment, State44from patchwork.models import Patch, Project, Person, Comment, State
45from patchwork.utils import (
46 is_linaro_user,
47 parse_name_and_email,
48)
42from django.contrib.auth.models import User49from django.contrib.auth.models import User
4350
44default_patch_state = 'New'51default_patch_state = 'New'
@@ -118,33 +125,6 @@
118 return project125 return project
119126
120127
121def parse_name_and_email(text):
122 # tuple of (regex, fn)
123 # - where fn returns a (name, email) tuple from the match groups resulting
124 # from re.match().groups()
125 from_res = [
126 # for "Firstname Lastname" <example@example.com> style addresses
127 (re.compile('"?(.*?)"?\s*<([^>]+)>'), (lambda g: (g[0], g[1]))),
128
129 # for example@example.com (Firstname Lastname) style addresses
130 (re.compile('"?(.*?)"?\s*\(([^\)]+)\)'), (lambda g: (g[1], g[0]))),
131
132 # everything else
133 (re.compile('(.*)'), (lambda g: (None, g[0]))),
134 ]
135
136 for regex, fn in from_res:
137 match = regex.match(text)
138 if match:
139 name, email = fn(match.groups())
140 if name is not None:
141 name = name.strip()
142 if email is not None:
143 email = email.strip()
144 return name, email
145 return None, None
146
147
148def find_submitter(mail):128def find_submitter(mail):
149129
150 from_header = clean_header(mail.get('From'))130 from_header = clean_header(mail.get('From'))
@@ -282,18 +262,22 @@
282262
283263
284def find_patch_name_and_author(comment):264def find_patch_name_and_author(comment):
285 """Return the patch name and author as specified in the given comment.265 """Finds the patch name, and the patch author.
286266
287 The name and author of a patch are sometimes different from the Subject267 The name and author of a patch are sometimes different from the Subject
288 and From header fields and so are specified in the body of the message,268 and From header fields and so are specified in the body of the message,
289 before the actual patch. This is identified by "From: " and "Subject: "269 before the actual patch. This is identified by "From: " and "Subject: "
290 lines starting the body, which is the format understood by 'git-am'.270 lines starting the body, which is the format understood by 'git-am'.
271
272 :return A tuple consisting of: the name of the patch, the author of the
273 patch, and a boolean indicating if the author needs to be saved in the
274 database.
291 """275 """
292 if not comment:276 if not comment:
293 return None, None277 return None, None, None
294 lines = comment.content.split('\n')278 lines = comment.content.split('\n')
295 if len(lines) == 0:279 if len(lines) == 0:
296 return None, None280 return None, None, None
297 # Append an extra line at the end so that our code can assume there will281 # Append an extra line at the end so that our code can assume there will
298 # always be at least two lines, simplifying things significantly.282 # always be at least two lines, simplifying things significantly.
299 lines.append('')283 lines.append('')
@@ -308,25 +292,26 @@
308 if second_line.startswith('From:'):292 if second_line.startswith('From:'):
309 author_line = second_line293 author_line = second_line
310 else:294 else:
311 return None, None295 return None, None, None
312296
313 subject = None297 subject = None
314 if subject_line is not None:298 if subject_line is not None:
315 subject = subject_line.replace('Subject:', '').strip()299 subject = subject_line.replace('Subject:', '').strip()
316300
317 if author_line is None:301 if author_line is None:
318 return subject, None302 return subject, None, None
319303
320 name, email = parse_name_and_email(author_line.replace('From:', ''))304 name, email = parse_name_and_email(author_line.replace('From:', ''))
321 if email is None:305 if email is None:
322 return subject, None306 return subject, None, None
323307
324 try:308 try:
325 person = Person.objects.get(email__iexact=email)309 person = Person.objects.get(email__iexact=email)
310 new_person = False
326 except Person.DoesNotExist:311 except Person.DoesNotExist:
327 person = Person(name=name, email=email)312 person = Person(name=name, email=email)
328 person.save()313 new_person = True
329 return subject, person314 return subject, person, new_person
330315
331316
332def find_patch_for_comment(project, mail):317def find_patch_for_comment(project, mail):
@@ -493,6 +478,39 @@
493 return None478 return None
494479
495480
481def is_valid_patch(submitter, author):
482 """A patch is valid if submitter or author are valid Linaro users.
483
484 :param submitter: The submitter to check, a Person object.
485 :param author: The author to check, a Person object.
486 :return True if one between submitter and author is a valid Linaro user.
487 """
488 valid_patch = False
489
490 if (settings.AUTH_CROWD_APPLICATION_USER is None and
491 settings.CROWD_USERS_DB_FILE is None):
492 # If none of the above settings is defined, fallback to an hardoceded
493 # check and accept only linaro.org email addresses.
494 if "@linaro.org" in submitter.email:
495 valid_patch |= True
496
497 if author is not None and "@linaro.org" in author.email:
498 valid_patch |= True
499 else:
500 crwd = Crowd(settings.AUTH_CROWD_APPLICATION_USER,
501 settings.AUTH_CROWD_APPLICATION_PASSWORD,
502 settings.AUTH_CROWD_SERVER_REST_URI)
503
504 cache_db = PatchworkDB(settings.CROWD_USERS_DB_FILE)
505 atexit.register(cache_db.close)
506
507 valid_patch |= is_linaro_user(submitter.email, cache_db, crwd)
508 if author is not None:
509 valid_patch |= is_linaro_user(author.email, cache_db, crwd)
510
511 return valid_patch
512
513
496def parse_mail(mail):514def parse_mail(mail):
497515
498 # some basic sanity checks516 # some basic sanity checks
@@ -517,8 +535,14 @@
517 msgid = mail.get('Message-Id').strip()535 msgid = mail.get('Message-Id').strip()
518536
519 (submitter, save_required) = find_submitter(mail)537 (submitter, save_required) = find_submitter(mail)
520
521 (patch, comment) = find_content(project, mail)538 (patch, comment) = find_content(project, mail)
539 (name, author, is_new_author) = find_patch_name_and_author(comment)
540
541 # If submitter nor author of a patch are not Linaro, we move on.
542 valid_patch = is_valid_patch(submitter, author)
543
544 if not valid_patch:
545 return 0
522546
523 if patch:547 if patch:
524 # we delay the saving until we know we have a patch.548 # we delay the saving until we know we have a patch.
@@ -526,11 +550,14 @@
526 submitter.save()550 submitter.save()
527 save_required = False551 save_required = False
528 patch.submitter = submitter552 patch.submitter = submitter
529 name, author = find_patch_name_and_author(comment)553
554 if is_new_author:
555 author.save()
530 if author is not None:556 if author is not None:
531 patch.author = author557 patch.author = author
532 if name is not None:558 if name is not None:
533 patch.name = name559 patch.name = name
560
534 patch.msgid = msgid561 patch.msgid = msgid
535 patch.project = project562 patch.project = project
536 patch.state = get_state(mail.get('X-Patchwork-State', '').strip())563 patch.state = get_state(mail.get('X-Patchwork-State', '').strip())
537564
=== modified file 'apps/patchwork/utils.py'
--- apps/patchwork/utils.py 2013-10-16 13:25:16 +0000
+++ apps/patchwork/utils.py 2013-12-05 18:37:48 +0000
@@ -29,7 +29,6 @@
29from difflib import SequenceMatcher29from difflib import SequenceMatcher
30from django.conf import settings30from django.conf import settings
31from django.shortcuts import get_object_or_40431from django.shortcuts import get_object_or_404
32from patchwork.bin.parsemail import parse_name_and_email
33from patchwork.models import Bundle, BundlePatch, Patch, Person, State32from patchwork.models import Bundle, BundlePatch, Patch, Person, State
34from patchwork.parser import hash_patch, parse_patch33from patchwork.parser import hash_patch, parse_patch
3534
@@ -83,6 +82,33 @@
83 return ids82 return ids
8483
8584
85def parse_name_and_email(text):
86 # tuple of (regex, fn)
87 # - where fn returns a (name, email) tuple from the match groups resulting
88 # from re.match().groups()
89 from_res = [
90 # for "Firstname Lastname" <example@example.com> style addresses
91 (re.compile('"?(.*?)"?\s*<([^>]+)>'), (lambda g: (g[0], g[1]))),
92
93 # for example@example.com (Firstname Lastname) style addresses
94 (re.compile('"?(.*?)"?\s*\(([^\)]+)\)'), (lambda g: (g[1], g[0]))),
95
96 # everything else
97 (re.compile('(.*)'), (lambda g: (None, g[0]))),
98 ]
99
100 for regex, fn in from_res:
101 match = regex.match(text)
102 if match:
103 name, email = fn(match.groups())
104 if name is not None:
105 name = name.strip()
106 if email is not None:
107 email = email.strip()
108 return name, email
109 return None, None
110
111
86class Order(object):112class Order(object):
87 order_map = {113 order_map = {
88 'date': 'date',114 'date': 'date',

Subscribers

People subscribed via source and target branches