Merge lp:~milo/linaro-patchmetrics/update-patches-fix into lp:linaro-patchmetrics

Proposed by Milo Casagrande
Status: Merged
Merged at revision: 400
Proposed branch: lp:~milo/linaro-patchmetrics/update-patches-fix
Merge into: lp:linaro-patchmetrics
Prerequisite: lp:~milo/linaro-patchmetrics/crowd-teams
Diff against target: 517 lines (+309/-28)
7 files modified
apps/patchmetrics/bin/get-linaro-membership.py (+0/-1)
apps/patchmetrics/crowd.py (+22/-2)
apps/patchwork/bin/update-committed-patches.py (+28/-7)
apps/patchwork/db.py (+120/-0)
apps/patchwork/parser.py (+17/-18)
apps/patchwork/utils.py (+119/-0)
apps/settings.py (+3/-0)
To merge this branch: bzr merge lp:~milo/linaro-patchmetrics/update-patches-fix
Reviewer Review Type Date Requested Status
Paul Sokolovsky Approve
Review via email: mp+186357@code.launchpad.net

Description of the change

This MP aims at fixing small issues experienced when running the update-committed-patches.py script from patchwork.
For general information about the changes, please refer to the commit message.

What was not captured in that message is:
- Some PEP8 fixes
- New settings introduced used to define a file path where to store the SQLite DB

To post a comment you must log in.
383. By Milo Casagrande

Fixed output collection for rev-list git command.

384. By Milo Casagrande

Fixed missing exception.

385. By Milo Casagrande

Fixed formatting.

386. By Milo Casagrande

Merged from trunk.

387. By Milo Casagrande

Removed unused imports, added comment.

388. By Milo Casagrande

Merged from trunk.

389. By Milo Casagrande

Updated to use new Crowd code.

390. By Milo Casagrande

Merged from trunk.

391. By Milo Casagrande

Merged from trunk.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

r383: It says it "Fixed output collection for rev-list git command.", but it's unclear what exactly is fixed. Replacement code doesn't look to be better then original, it actually gotta have stdout/stderr read deadlock bug. But it's definitely more verbose and magic-ridden (like use of 2-arg iter, "b" string prefix etc.) Please consider these rhetoric remarks - if such changes are done, it's nice to give more details, though this specific case is minor.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

84 + if settings.AUTH_CROWD_APPLICATION_USER is None:
85 + print >> sys.stderr, ("CROWD_CREDENTIALS_FILE not defined in "
86 + "settings.py.\n")

apparently cut&paste error in msg.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Otherwise, logic looks good, but there really should be expiration of old cached users.

review: Approve
Revision history for this message
Milo Casagrande (milo) wrote :

On Wed, Dec 4, 2013 at 6:18 PM, Paul Sokolovsky
<email address hidden> wrote:
> Review: Approve
>
> Otherwise, logic looks good, but there really should be expiration of old cached users.

Thanks Paul,

I'll work on that and propose another merge.

--
Milo Casagrande | Automation Engineer
Linaro.org <www.linaro.org> │ Open source software for ARM SoCsThanks Paul,

392. By Milo Casagrande

Fixed print statement.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'apps/patchmetrics/bin/get-linaro-membership.py'
--- apps/patchmetrics/bin/get-linaro-membership.py 2013-10-17 15:53:51 +0000
+++ apps/patchmetrics/bin/get-linaro-membership.py 2013-12-05 10:08:30 +0000
@@ -48,7 +48,6 @@
48 linaro = lp.people['linaro']48 linaro = lp.people['linaro']
4949
50 if settings.AUTH_CROWD_APPLICATION_USER:50 if settings.AUTH_CROWD_APPLICATION_USER:
51 print "Using Crowd groups"
52 whitelisted_groups = []51 whitelisted_groups = []
53 if settings.CROWD_GROUPS_WHITELIST is None:52 if settings.CROWD_GROUPS_WHITELIST is None:
54 print "No CROWD_GROUPS_WHITELIST defined, all groups will be used."53 print "No CROWD_GROUPS_WHITELIST defined, all groups will be used."
5554
=== modified file 'apps/patchmetrics/crowd.py'
--- apps/patchmetrics/crowd.py 2013-09-27 16:32:30 +0000
+++ apps/patchmetrics/crowd.py 2013-12-05 10:08:30 +0000
@@ -20,7 +20,6 @@
20import base6420import base64
21import httplib21import httplib
22import urllib22import urllib
23import ConfigParser
24import json23import json
25import types24import types
2625
@@ -182,6 +181,26 @@
182181
183 return user182 return user
184183
184 def is_valid_user(self, email):
185 """Handy function to check if a user exists or not.
186
187 :param email: The user email.
188 :return True or False.
189 """
190 params = {"username": email}
191
192 resource = "/user?{0}".format(urllib.urlencode(params))
193 api_url = "/crowd/rest/usermanagement/1{0}".format(resource)
194
195 valid = True
196 try:
197 self._get(api_url)
198 except CrowdNotFoundException:
199 # In case of other exceptions, raise them.
200 valid = False
201
202 return valid
203
185 def _get_rest_usermanagement(self, resource):204 def _get_rest_usermanagement(self, resource):
186 api_url = "/{0}{1}".format(self._uri, resource)205 api_url = "/{0}{1}".format(self._uri, resource)
187 return self._get(api_url)206 return self._get(api_url)
@@ -211,4 +230,5 @@
211 raise CrowdForbiddenException(230 raise CrowdForbiddenException(
212 "Access forbidden to fulfill the request")231 "Access forbidden to fulfill the request")
213 else:232 else:
214 raise CrowdException("Unknown Crowd status {0}".format(resp.status))233 raise CrowdException(
234 "Unknown Crowd status {0}".format(resp.status))
215235
=== modified file 'apps/patchwork/bin/update-committed-patches.py'
--- apps/patchwork/bin/update-committed-patches.py 2013-10-17 12:22:58 +0000
+++ apps/patchwork/bin/update-committed-patches.py 2013-12-05 10:08:30 +0000
@@ -10,26 +10,47 @@
10# accepted once they reach the master branch.10# accepted once they reach the master branch.
1111
12import _pythonpath12import _pythonpath
13import atexit
13import sys14import sys
15
14from datetime import datetime16from datetime import datetime
15from operator import attrgetter17from django.conf import settings
16
17from lockfile import FileLock, LockFailed, LockTimeout
18
19from django.db.models import Q18from django.db.models import Q
20from dulwich.repo import Repo19from dulwich.repo import Repo
20from lockfile import FileLock, LockFailed, LockTimeout
21from operator import attrgetter
2122
23from patchmetrics.crowd import Crowd
24from patchwork.db import PatchworkDB
22from patchwork.models import Project, State25from patchwork.models import Project, State
23from patchwork.utils import (26from patchwork.utils import (
24 ensure_source_checkout_for_project,27 ensure_source_checkout_for_project,
25 get_commits_to_parse,28 get_commits_to_parse,
26 get_diff_for_commit,29 get_diff_for_linaro_commit,
27 get_patches_matching_commit,30 get_patches_matching_commit,
28 GitException31 GitException
29 )32)
3033
3134
32def main():35def main():
36
37 if settings.AUTH_CROWD_APPLICATION_USER is None:
38 print >> sys.stderr, ("AUTH_CROWD_APPLICATION_USER not defined in "
39 "settings.py.\n")
40 sys.exit(1)
41 else:
42 crwd = Crowd(settings.AUTH_CROWD_APPLICATION_USER,
43 settings.AUTH_CROWD_APPLICATION_PASSWORD,
44 settings.AUTH_CROWD_SERVER_REST_URI)
45
46 if settings.CROWD_USERS_DB_FILE is None:
47 print >> sys.stderr, ("CROWD_USERS_DB_FILE not defined in "
48 "settings.py.\n")
49 sys.exit(1)
50
51 cache_db = PatchworkDB(settings.CROWD_USERS_DB_FILE)
52 atexit.register(cache_db.close)
53
33 projects = Project.objects.exclude(54 projects = Project.objects.exclude(
34 Q(source_tree__isnull=True) | Q(source_tree=''))55 Q(source_tree__isnull=True) | Q(source_tree=''))
3556
@@ -66,7 +87,7 @@
66 for commit_id in commit_ids:87 for commit_id in commit_ids:
67 project.last_seen_commit_ref = commit_id88 project.last_seen_commit_ref = commit_id
68 project.save()89 project.save()
69 diff = get_diff_for_commit(root, commit_id)90 diff = get_diff_for_linaro_commit(root, commit_id, crwd, cache_db)
70 if diff is None:91 if diff is None:
71 # Some commits result in empty diffs (e.g. only changing file92 # Some commits result in empty diffs (e.g. only changing file
72 # permissions); in these cases we just skip them.93 # permissions); in these cases we just skip them.
7394
=== added file 'apps/patchwork/db.py'
--- apps/patchwork/db.py 1970-01-01 00:00:00 +0000
+++ apps/patchwork/db.py 2013-12-05 10:08:30 +0000
@@ -0,0 +1,120 @@
1# Copyright (C) 2013 Linaro
2#
3# Author: Milo Casagrande <milo.casagrande@linaro.org>
4# This file is part of the Patchmetrics package.
5#
6# Patchmetrics is free software; you can redistribute it and/or modify
7# it under the terms of the GNU General Public License as published by
8# the Free Software Foundation; either version 2 of the License, or
9# (at your option) any later version.
10#
11# Patchmetrics is distributed in the hope that it will be useful,
12# but WITHOUT ANY WARRANTY; without even the implied warranty of
13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14# GNU General Public License for more details.
15#
16# You should have received a copy of the GNU General Public License
17# along with Patchwork; if not, write to the Free Software
18# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
19
20import sqlite3
21
22
23class PatchworkDB(object):
24 """A cache backed by a SQLite DB to store users.
25
26 This is used to provide a users cache in order to speed up the process
27 of analyzing patches from a git repository.
28
29 Users are stored with their email address, and a boolean value indicating
30 if the user is a valid Linaro user. A valid Linaro user is a user that
31 exists in Linaro Login (Crowd).
32
33 There is no user logic here, it is just used as a cache backend.
34 """
35
36 def __init__(self, db_file):
37 self.db_file = db_file
38 self._connection = None
39 self._cursor = None
40
41 self._init_db()
42
43 def _init_db(self):
44 if not self.table_exists('users'):
45 self.cursor.execute('''CREATE TABLE users
46 (id TEXT PRIMARY KEY, valid INTEGER, ts TIMESTAMP)''')
47
48 def close(self):
49 """Closes the DB connection."""
50 # Force a commit, even if we are in autocommit mode.
51 self.save()
52 self.connection.close()
53
54 def save(self):
55 """Performs a commit on the DB."""
56 self.connection.commit()
57
58 @property
59 def connection(self):
60 """The connection to the DB."""
61 if self._connection is None:
62 self._connection = sqlite3.connect(self.db_file)
63 return self._connection
64
65 @property
66 def cursor(self):
67 """The connection cursor."""
68 if self._cursor is None:
69 self._cursor = self.connection.cursor()
70 return self._cursor
71
72 def table_exists(self, table):
73 """Checks if a table exists in the DB.
74
75 :param table: The name of the table to check.
76 """
77 self.cursor.execute('''SELECT name FROM sqlite_master WHERE
78 type='table' AND name=?''', [table])
79
80 exists = False
81 if self.cursor.fetchone():
82 exists = True
83 return exists
84
85 def insert_user(self, email, valid, timestamp):
86 """Inserts a user in the DB.
87
88 :param email: The user email, this is the unique key in the DB.
89 :param valid: If the user is a valid Linaro user.
90 :param timestamp: When the user was added in the cache.
91 """
92 self.cursor.execute('''INSERT INTO users(id, valid, ts)
93 VALUES (?, ?, ?)''', (email, valid, timestamp))
94
95 def update_user(self, email, valid, timestamp):
96 """Updates a user in the DB."""
97 self.cursor.execute('''UPDATE users SET valid=?, ts=? WHERE id=?''',
98 (valid, timestamp, email))
99
100 def get_user(self, email):
101 """Retrieves a user from the DB.
102
103 :param email: The user email.
104 :return A tuple with the data found, None otherwise.
105 """
106 self.cursor.execute('''SELECT * FROM users where id=?''', [email])
107 return self.cursor.fetchone()
108
109 def user_exists(self, email):
110 """Verifies if a user exists in the DB.
111
112 :param email: The user email.
113 :return True or False.
114 """
115 result = self.get_user(email)
116
117 exists = False
118 if result:
119 exists = True
120 return exists
0121
=== modified file 'apps/patchwork/parser.py'
--- apps/patchwork/parser.py 2012-10-18 09:28:17 +0000
+++ apps/patchwork/parser.py 2013-12-05 10:08:30 +0000
@@ -20,7 +20,9 @@
20# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA20# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
2121
2222
23import StringIO
23import re24import re
25import types
2426
25try:27try:
26 import hashlib28 import hashlib
@@ -32,7 +34,14 @@
32_hunk_re = re.compile('^\@\@ -\d+(?:,(\d+))? \+\d+(?:,(\d+))? \@\@')34_hunk_re = re.compile('^\@\@ -\d+(?:,(\d+))? \+\d+(?:,(\d+))? \@\@')
33_filename_re = re.compile('^(---|\+\+\+) (\S+)')35_filename_re = re.compile('^(---|\+\+\+) (\S+)')
3436
37
35def parse_patch(text):38def parse_patch(text):
39 text_file = None
40 if isinstance(text, types.StringTypes):
41 text_file = StringIO.StringIO(text)
42 else:
43 text_file = text
44
36 patchbuf = ''45 patchbuf = ''
37 commentbuf = ''46 commentbuf = ''
38 buf = ''47 buf = ''
@@ -63,20 +72,15 @@
63 lc = (0, 0)72 lc = (0, 0)
64 hunk = 073 hunk = 0
6574
6675 for line in text_file:
67 for line in text.split('\n'):
68 line += '\n'
69
70 if state == 0:76 if state == 0:
71 if line.startswith('diff ') or line.startswith('===') \77 if line.startswith('diff ') or line.startswith('===') \
72 or line.startswith('Index: '):78 or line.startswith('Index: '):
73 state = 179 state = 1
74 buf += line80 buf += line
75
76 elif line.startswith('--- '):81 elif line.startswith('--- '):
77 state = 282 state = 2
78 buf += line83 buf += line
79
80 else:84 else:
81 commentbuf += line85 commentbuf += line
8286
@@ -89,11 +93,9 @@
89 if line.startswith('+++ '):93 if line.startswith('+++ '):
90 state = 394 state = 3
91 buf += line95 buf += line
92
93 elif hunk:96 elif hunk:
94 state = 197 state = 1
95 buf += line98 buf += line
96
97 else:99 else:
98 state = 0100 state = 0
99 commentbuf += buf + line101 commentbuf += buf + line
@@ -102,7 +104,6 @@
102 elif state == 3:104 elif state == 3:
103 match = _hunk_re.match(line)105 match = _hunk_re.match(line)
104 if match:106 if match:
105
106 def fn(x):107 def fn(x):
107 if not x:108 if not x:
108 return 1109 return 1
@@ -113,16 +114,13 @@
113 state = 4114 state = 4
114 patchbuf += buf + line115 patchbuf += buf + line
115 buf = ''116 buf = ''
116
117 elif line.startswith('--- '):117 elif line.startswith('--- '):
118 patchbuf += buf + line118 patchbuf += buf + line
119 buf = ''119 buf = ''
120 state = 2120 state = 2
121
122 elif hunk:121 elif hunk:
123 state = 1122 state = 1
124 buf += line123 buf += line
125
126 else:124 else:
127 state = 0125 state = 0
128 commentbuf += buf + line126 commentbuf += buf + line
@@ -161,6 +159,7 @@
161159
162 return (patchbuf, commentbuf)160 return (patchbuf, commentbuf)
163161
162
164def hash_patch(str):163def hash_patch(str):
165 # normalise spaces164 # normalise spaces
166 str = str.replace('\r', '')165 str = str.replace('\r', '')
@@ -213,12 +212,12 @@
213 from optparse import OptionParser212 from optparse import OptionParser
214213
215 parser = OptionParser()214 parser = OptionParser()
216 parser.add_option('-p', '--patch', action = 'store_true',215 parser.add_option('-p', '--patch', action='store_true',
217 dest = 'print_patch', help = 'print parsed patch')216 dest='print_patch', help='print parsed patch')
218 parser.add_option('-c', '--comment', action = 'store_true',217 parser.add_option('-c', '--comment', action='store_true',
219 dest = 'print_comment', help = 'print parsed comment')218 dest='print_comment', help='print parsed comment')
220 parser.add_option('-#', '--hash', action = 'store_true',219 parser.add_option('-#', '--hash', action='store_true',
221 dest = 'print_hash', help = 'print patch hash')220 dest='print_hash', help='print patch hash')
222221
223 (options, args) = parser.parse_args()222 (options, args) = parser.parse_args()
224223
225224
=== modified file 'apps/patchwork/utils.py'
--- apps/patchwork/utils.py 2013-10-16 12:29:19 +0000
+++ apps/patchwork/utils.py 2013-12-05 10:08:30 +0000
@@ -17,8 +17,10 @@
17# along with Patchwork; if not, write to the Free Software17# along with Patchwork; if not, write to the Free Software
18# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA18# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
1919
20import StringIO
20import atexit21import atexit
21import chardet22import chardet
23import datetime
22import os24import os
23import re25import re
24import subprocess26import subprocess
@@ -395,6 +397,123 @@
395 return patch397 return patch
396398
397399
400def get_diff_for_linaro_commit(root, commit_id, crowd, db):
401 """Gets a diff from a commit id for a Linaro made change.
402
403 If the author or the committer of the commit is not a Linaro user, then
404 the commit will not be considered.
405
406 This is a slightly different implementation than `get_diff_for_commit`,
407 where here it checks if the user is a valid Linaro account before taking
408 in account the patch.
409
410 :param root: The path on the file system where the repository is stored.
411 :param commit_id: The commmit ID to parse.
412 :param crowd: A Crowd object used to determine if the user is from Linaro.
413 :param db: A PatchworkDB object used as a cache for users.
414 """
415 stdout = ""
416 patch = None
417 author = None
418 committer = None
419
420 # We create a custom view on the commit id diff. We extract the author
421 # and committer emails, and check if they are from Linaro people, otherwise
422 # we do not even consider the commit and move one.
423 # Most of the time patchwork/patchmetrics spends a lot of time calculating
424 # diffs for patches that might not even be from Linaro people.
425 # The first format prints on stdout the author and commiter emails.
426 # The rest of the format string is exactly as what is shown with a simple
427 # `git show COMMIT_ID`.
428 format_string = "%ae%n%ce%n"
429 format_string += "commit %H%n"
430 format_string += "Author: %aN <%aE>%n"
431 format_string += "Date: %ad%n"
432
433 cmd_args = ['git', 'show', '--format=format:{0}'.format(format_string),
434 commit_id]
435 proc = subprocess.Popen(cmd_args, cwd=root, bufsize=4096,
436 stdout=subprocess.PIPE, stderr=DEVNULL)
437
438 while True:
439 for line in iter(proc.stdout.readline, b''):
440 stdout += line
441 if proc.poll() is not None:
442 break
443
444 if proc.returncode != 0:
445 print ("Error retrieving diff for commit {0}".format(commit_id))
446 else:
447 # Since with big diffs a lot of time is spent decoding the output from
448 # the git command, we first read two lines from the output (author and
449 # commiter emails), perform validations on those, and in case one of
450 # them is a valid Linaro user we continue with output decoding and
451 # diff parsing.
452 string_file = StringIO.StringIO(stdout)
453
454 author = string_file.readline().strip().decode('utf-8')
455 committer = string_file.readline().strip().decode('utf-8')
456
457 valid_patch = False
458 if author == committer:
459 valid_patch |= is_linaro_user(author, db, crowd)
460 else:
461 valid_patch |= is_linaro_user(author, db, crowd)
462 valid_patch |= is_linaro_user(committer, db, crowd)
463
464 if valid_patch:
465 try:
466 diff = stdout.decode('utf-8')
467 except UnicodeDecodeError:
468 try:
469 # chardet.detect is rather slow so we only use it when we
470 # fail # to decode from utf-8.
471 encoding = chardet.detect(stdout)['encoding']
472 diff = stdout.decode(encoding)
473 except UnicodeDecodeError:
474 print ("Skipping commit {0} as we can't find the "
475 "appropriate charset to decode "
476 "it".format(commit_id))
477 return None
478
479 # This is a double action, but it is still faster then decoding
480 # everytime the output even when the patch does not come from a
481 # Linaro user.
482 string_file = StringIO.StringIO(diff)
483 # We need to pass the StringIO object pointing to the 3rd line
484 # since the first two have already been read, and they contains
485 # information not needed for the patch.
486 for i in range(0, 2):
487 string_file.readline()
488 patch, _ = parse_patch(string_file)
489
490 return patch
491
492
493def is_linaro_user(user, db, crowd):
494 """Verifies if a user is a valid Linaro account.
495
496 :param user: The email address to verify.
497 :param db: A PatchworkDB instance for the cache backend.
498 :param crowd: A Crowd instance to retrieve user data if necessary.
499 """
500 cached_user = db.get_user(user)
501
502 is_valid = False
503 if cached_user:
504 # TODO: we need to set a timeout and check again even if the user
505 # is cached, it might have been removed from Crowd.
506 is_valid |= cached_user[1]
507 else:
508 # XXX: The first time this runs, it might hit the Crowd server hard,
509 # since none of the users will be cached.
510 valid_user = crowd.is_valid_user(user)
511 is_valid |= valid_user
512 db.insert_user(user, valid_user, datetime.datetime.now())
513
514 return is_valid
515
516
398def get_patches_matching_commit(project, commit_diff, commit_msg,517def get_patches_matching_commit(project, commit_diff, commit_msg,
399 commit_author):518 commit_author):
400 """Return the Patch objects that are likely matches to the given commit.519 """Return the Patch objects that are likely matches to the given commit.
401520
=== modified file 'apps/settings.py'
--- apps/settings.py 2013-10-23 19:18:06 +0000
+++ apps/settings.py 2013-12-05 10:08:30 +0000
@@ -147,6 +147,9 @@
147# File with a list of groups that should be created in the DB. If other groups147# File with a list of groups that should be created in the DB. If other groups
148# are taken out from Crowd, and they are not here, they will be skipped.148# are taken out from Crowd, and they are not here, they will be skipped.
149CROWD_GROUPS_WHITELIST = None149CROWD_GROUPS_WHITELIST = None
150# Full path to a DB file.
151# This is used to have a cache of which users exists in the Crowd database.
152CROWD_USERS_DB_FILE = None
150153
151# Where to store the git repos that we scan to identify which patches have154# Where to store the git repos that we scan to identify which patches have
152# been committed.155# been committed.

Subscribers

People subscribed via source and target branches