Merge lp:~milo/linaro-patchmetrics/update-patches-fix into lp:linaro-patchmetrics

Proposed by Milo Casagrande
Status: Merged
Merged at revision: 400
Proposed branch: lp:~milo/linaro-patchmetrics/update-patches-fix
Merge into: lp:linaro-patchmetrics
Prerequisite: lp:~milo/linaro-patchmetrics/crowd-teams
Diff against target: 517 lines (+309/-28)
7 files modified
apps/patchmetrics/bin/get-linaro-membership.py (+0/-1)
apps/patchmetrics/crowd.py (+22/-2)
apps/patchwork/bin/update-committed-patches.py (+28/-7)
apps/patchwork/db.py (+120/-0)
apps/patchwork/parser.py (+17/-18)
apps/patchwork/utils.py (+119/-0)
apps/settings.py (+3/-0)
To merge this branch: bzr merge lp:~milo/linaro-patchmetrics/update-patches-fix
Reviewer Review Type Date Requested Status
Paul Sokolovsky Approve
Review via email: mp+186357@code.launchpad.net

Description of the change

This MP aims at fixing small issues experienced when running the update-committed-patches.py script from patchwork.
For general information about the changes, please refer to the commit message.

What was not captured in that message is:
- Some PEP8 fixes
- New settings introduced used to define a file path where to store the SQLite DB

To post a comment you must log in.
383. By Milo Casagrande

Fixed output collection for rev-list git command.

384. By Milo Casagrande

Fixed missing exception.

385. By Milo Casagrande

Fixed formatting.

386. By Milo Casagrande

Merged from trunk.

387. By Milo Casagrande

Removed unused imports, added comment.

388. By Milo Casagrande

Merged from trunk.

389. By Milo Casagrande

Updated to use new Crowd code.

390. By Milo Casagrande

Merged from trunk.

391. By Milo Casagrande

Merged from trunk.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

r383: It says it "Fixed output collection for rev-list git command.", but it's unclear what exactly is fixed. Replacement code doesn't look to be better then original, it actually gotta have stdout/stderr read deadlock bug. But it's definitely more verbose and magic-ridden (like use of 2-arg iter, "b" string prefix etc.) Please consider these rhetoric remarks - if such changes are done, it's nice to give more details, though this specific case is minor.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

84 + if settings.AUTH_CROWD_APPLICATION_USER is None:
85 + print >> sys.stderr, ("CROWD_CREDENTIALS_FILE not defined in "
86 + "settings.py.\n")

apparently cut&paste error in msg.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Otherwise, logic looks good, but there really should be expiration of old cached users.

review: Approve
Revision history for this message
Milo Casagrande (milo) wrote :

On Wed, Dec 4, 2013 at 6:18 PM, Paul Sokolovsky
<email address hidden> wrote:
> Review: Approve
>
> Otherwise, logic looks good, but there really should be expiration of old cached users.

Thanks Paul,

I'll work on that and propose another merge.

--
Milo Casagrande | Automation Engineer
Linaro.org <www.linaro.org> │ Open source software for ARM SoCsThanks Paul,

392. By Milo Casagrande

Fixed print statement.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'apps/patchmetrics/bin/get-linaro-membership.py'
2--- apps/patchmetrics/bin/get-linaro-membership.py 2013-10-17 15:53:51 +0000
3+++ apps/patchmetrics/bin/get-linaro-membership.py 2013-12-05 10:08:30 +0000
4@@ -48,7 +48,6 @@
5 linaro = lp.people['linaro']
6
7 if settings.AUTH_CROWD_APPLICATION_USER:
8- print "Using Crowd groups"
9 whitelisted_groups = []
10 if settings.CROWD_GROUPS_WHITELIST is None:
11 print "No CROWD_GROUPS_WHITELIST defined, all groups will be used."
12
13=== modified file 'apps/patchmetrics/crowd.py'
14--- apps/patchmetrics/crowd.py 2013-09-27 16:32:30 +0000
15+++ apps/patchmetrics/crowd.py 2013-12-05 10:08:30 +0000
16@@ -20,7 +20,6 @@
17 import base64
18 import httplib
19 import urllib
20-import ConfigParser
21 import json
22 import types
23
24@@ -182,6 +181,26 @@
25
26 return user
27
28+ def is_valid_user(self, email):
29+ """Handy function to check if a user exists or not.
30+
31+ :param email: The user email.
32+ :return True or False.
33+ """
34+ params = {"username": email}
35+
36+ resource = "/user?{0}".format(urllib.urlencode(params))
37+ api_url = "/crowd/rest/usermanagement/1{0}".format(resource)
38+
39+ valid = True
40+ try:
41+ self._get(api_url)
42+ except CrowdNotFoundException:
43+ # In case of other exceptions, raise them.
44+ valid = False
45+
46+ return valid
47+
48 def _get_rest_usermanagement(self, resource):
49 api_url = "/{0}{1}".format(self._uri, resource)
50 return self._get(api_url)
51@@ -211,4 +230,5 @@
52 raise CrowdForbiddenException(
53 "Access forbidden to fulfill the request")
54 else:
55- raise CrowdException("Unknown Crowd status {0}".format(resp.status))
56+ raise CrowdException(
57+ "Unknown Crowd status {0}".format(resp.status))
58
59=== modified file 'apps/patchwork/bin/update-committed-patches.py'
60--- apps/patchwork/bin/update-committed-patches.py 2013-10-17 12:22:58 +0000
61+++ apps/patchwork/bin/update-committed-patches.py 2013-12-05 10:08:30 +0000
62@@ -10,26 +10,47 @@
63 # accepted once they reach the master branch.
64
65 import _pythonpath
66+import atexit
67 import sys
68+
69 from datetime import datetime
70-from operator import attrgetter
71-
72-from lockfile import FileLock, LockFailed, LockTimeout
73-
74+from django.conf import settings
75 from django.db.models import Q
76 from dulwich.repo import Repo
77+from lockfile import FileLock, LockFailed, LockTimeout
78+from operator import attrgetter
79
80+from patchmetrics.crowd import Crowd
81+from patchwork.db import PatchworkDB
82 from patchwork.models import Project, State
83 from patchwork.utils import (
84 ensure_source_checkout_for_project,
85 get_commits_to_parse,
86- get_diff_for_commit,
87+ get_diff_for_linaro_commit,
88 get_patches_matching_commit,
89 GitException
90- )
91+)
92
93
94 def main():
95+
96+ if settings.AUTH_CROWD_APPLICATION_USER is None:
97+ print >> sys.stderr, ("AUTH_CROWD_APPLICATION_USER not defined in "
98+ "settings.py.\n")
99+ sys.exit(1)
100+ else:
101+ crwd = Crowd(settings.AUTH_CROWD_APPLICATION_USER,
102+ settings.AUTH_CROWD_APPLICATION_PASSWORD,
103+ settings.AUTH_CROWD_SERVER_REST_URI)
104+
105+ if settings.CROWD_USERS_DB_FILE is None:
106+ print >> sys.stderr, ("CROWD_USERS_DB_FILE not defined in "
107+ "settings.py.\n")
108+ sys.exit(1)
109+
110+ cache_db = PatchworkDB(settings.CROWD_USERS_DB_FILE)
111+ atexit.register(cache_db.close)
112+
113 projects = Project.objects.exclude(
114 Q(source_tree__isnull=True) | Q(source_tree=''))
115
116@@ -66,7 +87,7 @@
117 for commit_id in commit_ids:
118 project.last_seen_commit_ref = commit_id
119 project.save()
120- diff = get_diff_for_commit(root, commit_id)
121+ diff = get_diff_for_linaro_commit(root, commit_id, crwd, cache_db)
122 if diff is None:
123 # Some commits result in empty diffs (e.g. only changing file
124 # permissions); in these cases we just skip them.
125
126=== added file 'apps/patchwork/db.py'
127--- apps/patchwork/db.py 1970-01-01 00:00:00 +0000
128+++ apps/patchwork/db.py 2013-12-05 10:08:30 +0000
129@@ -0,0 +1,120 @@
130+# Copyright (C) 2013 Linaro
131+#
132+# Author: Milo Casagrande <milo.casagrande@linaro.org>
133+# This file is part of the Patchmetrics package.
134+#
135+# Patchmetrics is free software; you can redistribute it and/or modify
136+# it under the terms of the GNU General Public License as published by
137+# the Free Software Foundation; either version 2 of the License, or
138+# (at your option) any later version.
139+#
140+# Patchmetrics is distributed in the hope that it will be useful,
141+# but WITHOUT ANY WARRANTY; without even the implied warranty of
142+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
143+# GNU General Public License for more details.
144+#
145+# You should have received a copy of the GNU General Public License
146+# along with Patchwork; if not, write to the Free Software
147+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
148+
149+import sqlite3
150+
151+
152+class PatchworkDB(object):
153+ """A cache backed by a SQLite DB to store users.
154+
155+ This is used to provide a users cache in order to speed up the process
156+ of analyzing patches from a git repository.
157+
158+ Users are stored with their email address, and a boolean value indicating
159+ if the user is a valid Linaro user. A valid Linaro user is a user that
160+ exists in Linaro Login (Crowd).
161+
162+ There is no user logic here, it is just used as a cache backend.
163+ """
164+
165+ def __init__(self, db_file):
166+ self.db_file = db_file
167+ self._connection = None
168+ self._cursor = None
169+
170+ self._init_db()
171+
172+ def _init_db(self):
173+ if not self.table_exists('users'):
174+ self.cursor.execute('''CREATE TABLE users
175+ (id TEXT PRIMARY KEY, valid INTEGER, ts TIMESTAMP)''')
176+
177+ def close(self):
178+ """Closes the DB connection."""
179+ # Force a commit, even if we are in autocommit mode.
180+ self.save()
181+ self.connection.close()
182+
183+ def save(self):
184+ """Performs a commit on the DB."""
185+ self.connection.commit()
186+
187+ @property
188+ def connection(self):
189+ """The connection to the DB."""
190+ if self._connection is None:
191+ self._connection = sqlite3.connect(self.db_file)
192+ return self._connection
193+
194+ @property
195+ def cursor(self):
196+ """The connection cursor."""
197+ if self._cursor is None:
198+ self._cursor = self.connection.cursor()
199+ return self._cursor
200+
201+ def table_exists(self, table):
202+ """Checks if a table exists in the DB.
203+
204+ :param table: The name of the table to check.
205+ """
206+ self.cursor.execute('''SELECT name FROM sqlite_master WHERE
207+ type='table' AND name=?''', [table])
208+
209+ exists = False
210+ if self.cursor.fetchone():
211+ exists = True
212+ return exists
213+
214+ def insert_user(self, email, valid, timestamp):
215+ """Inserts a user in the DB.
216+
217+ :param email: The user email, this is the unique key in the DB.
218+ :param valid: If the user is a valid Linaro user.
219+ :param timestamp: When the user was added in the cache.
220+ """
221+ self.cursor.execute('''INSERT INTO users(id, valid, ts)
222+ VALUES (?, ?, ?)''', (email, valid, timestamp))
223+
224+ def update_user(self, email, valid, timestamp):
225+ """Updates a user in the DB."""
226+ self.cursor.execute('''UPDATE users SET valid=?, ts=? WHERE id=?''',
227+ (valid, timestamp, email))
228+
229+ def get_user(self, email):
230+ """Retrieves a user from the DB.
231+
232+ :param email: The user email.
233+ :return A tuple with the data found, None otherwise.
234+ """
235+ self.cursor.execute('''SELECT * FROM users where id=?''', [email])
236+ return self.cursor.fetchone()
237+
238+ def user_exists(self, email):
239+ """Verifies if a user exists in the DB.
240+
241+ :param email: The user email.
242+ :return True or False.
243+ """
244+ result = self.get_user(email)
245+
246+ exists = False
247+ if result:
248+ exists = True
249+ return exists
250
251=== modified file 'apps/patchwork/parser.py'
252--- apps/patchwork/parser.py 2012-10-18 09:28:17 +0000
253+++ apps/patchwork/parser.py 2013-12-05 10:08:30 +0000
254@@ -20,7 +20,9 @@
255 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
256
257
258+import StringIO
259 import re
260+import types
261
262 try:
263 import hashlib
264@@ -32,7 +34,14 @@
265 _hunk_re = re.compile('^\@\@ -\d+(?:,(\d+))? \+\d+(?:,(\d+))? \@\@')
266 _filename_re = re.compile('^(---|\+\+\+) (\S+)')
267
268+
269 def parse_patch(text):
270+ text_file = None
271+ if isinstance(text, types.StringTypes):
272+ text_file = StringIO.StringIO(text)
273+ else:
274+ text_file = text
275+
276 patchbuf = ''
277 commentbuf = ''
278 buf = ''
279@@ -63,20 +72,15 @@
280 lc = (0, 0)
281 hunk = 0
282
283-
284- for line in text.split('\n'):
285- line += '\n'
286-
287+ for line in text_file:
288 if state == 0:
289 if line.startswith('diff ') or line.startswith('===') \
290 or line.startswith('Index: '):
291 state = 1
292 buf += line
293-
294 elif line.startswith('--- '):
295 state = 2
296 buf += line
297-
298 else:
299 commentbuf += line
300
301@@ -89,11 +93,9 @@
302 if line.startswith('+++ '):
303 state = 3
304 buf += line
305-
306 elif hunk:
307 state = 1
308 buf += line
309-
310 else:
311 state = 0
312 commentbuf += buf + line
313@@ -102,7 +104,6 @@
314 elif state == 3:
315 match = _hunk_re.match(line)
316 if match:
317-
318 def fn(x):
319 if not x:
320 return 1
321@@ -113,16 +114,13 @@
322 state = 4
323 patchbuf += buf + line
324 buf = ''
325-
326 elif line.startswith('--- '):
327 patchbuf += buf + line
328 buf = ''
329 state = 2
330-
331 elif hunk:
332 state = 1
333 buf += line
334-
335 else:
336 state = 0
337 commentbuf += buf + line
338@@ -161,6 +159,7 @@
339
340 return (patchbuf, commentbuf)
341
342+
343 def hash_patch(str):
344 # normalise spaces
345 str = str.replace('\r', '')
346@@ -213,12 +212,12 @@
347 from optparse import OptionParser
348
349 parser = OptionParser()
350- parser.add_option('-p', '--patch', action = 'store_true',
351- dest = 'print_patch', help = 'print parsed patch')
352- parser.add_option('-c', '--comment', action = 'store_true',
353- dest = 'print_comment', help = 'print parsed comment')
354- parser.add_option('-#', '--hash', action = 'store_true',
355- dest = 'print_hash', help = 'print patch hash')
356+ parser.add_option('-p', '--patch', action='store_true',
357+ dest='print_patch', help='print parsed patch')
358+ parser.add_option('-c', '--comment', action='store_true',
359+ dest='print_comment', help='print parsed comment')
360+ parser.add_option('-#', '--hash', action='store_true',
361+ dest='print_hash', help='print patch hash')
362
363 (options, args) = parser.parse_args()
364
365
366=== modified file 'apps/patchwork/utils.py'
367--- apps/patchwork/utils.py 2013-10-16 12:29:19 +0000
368+++ apps/patchwork/utils.py 2013-12-05 10:08:30 +0000
369@@ -17,8 +17,10 @@
370 # along with Patchwork; if not, write to the Free Software
371 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
372
373+import StringIO
374 import atexit
375 import chardet
376+import datetime
377 import os
378 import re
379 import subprocess
380@@ -395,6 +397,123 @@
381 return patch
382
383
384+def get_diff_for_linaro_commit(root, commit_id, crowd, db):
385+ """Gets a diff from a commit id for a Linaro made change.
386+
387+ If the author or the committer of the commit is not a Linaro user, then
388+ the commit will not be considered.
389+
390+ This is a slightly different implementation than `get_diff_for_commit`,
391+ where here it checks if the user is a valid Linaro account before taking
392+ in account the patch.
393+
394+ :param root: The path on the file system where the repository is stored.
395+ :param commit_id: The commmit ID to parse.
396+ :param crowd: A Crowd object used to determine if the user is from Linaro.
397+ :param db: A PatchworkDB object used as a cache for users.
398+ """
399+ stdout = ""
400+ patch = None
401+ author = None
402+ committer = None
403+
404+ # We create a custom view on the commit id diff. We extract the author
405+ # and committer emails, and check if they are from Linaro people, otherwise
406+ # we do not even consider the commit and move one.
407+ # Most of the time patchwork/patchmetrics spends a lot of time calculating
408+ # diffs for patches that might not even be from Linaro people.
409+ # The first format prints on stdout the author and commiter emails.
410+ # The rest of the format string is exactly as what is shown with a simple
411+ # `git show COMMIT_ID`.
412+ format_string = "%ae%n%ce%n"
413+ format_string += "commit %H%n"
414+ format_string += "Author: %aN <%aE>%n"
415+ format_string += "Date: %ad%n"
416+
417+ cmd_args = ['git', 'show', '--format=format:{0}'.format(format_string),
418+ commit_id]
419+ proc = subprocess.Popen(cmd_args, cwd=root, bufsize=4096,
420+ stdout=subprocess.PIPE, stderr=DEVNULL)
421+
422+ while True:
423+ for line in iter(proc.stdout.readline, b''):
424+ stdout += line
425+ if proc.poll() is not None:
426+ break
427+
428+ if proc.returncode != 0:
429+ print ("Error retrieving diff for commit {0}".format(commit_id))
430+ else:
431+ # Since with big diffs a lot of time is spent decoding the output from
432+ # the git command, we first read two lines from the output (author and
433+ # commiter emails), perform validations on those, and in case one of
434+ # them is a valid Linaro user we continue with output decoding and
435+ # diff parsing.
436+ string_file = StringIO.StringIO(stdout)
437+
438+ author = string_file.readline().strip().decode('utf-8')
439+ committer = string_file.readline().strip().decode('utf-8')
440+
441+ valid_patch = False
442+ if author == committer:
443+ valid_patch |= is_linaro_user(author, db, crowd)
444+ else:
445+ valid_patch |= is_linaro_user(author, db, crowd)
446+ valid_patch |= is_linaro_user(committer, db, crowd)
447+
448+ if valid_patch:
449+ try:
450+ diff = stdout.decode('utf-8')
451+ except UnicodeDecodeError:
452+ try:
453+ # chardet.detect is rather slow so we only use it when we
454+ # fail # to decode from utf-8.
455+ encoding = chardet.detect(stdout)['encoding']
456+ diff = stdout.decode(encoding)
457+ except UnicodeDecodeError:
458+ print ("Skipping commit {0} as we can't find the "
459+ "appropriate charset to decode "
460+ "it".format(commit_id))
461+ return None
462+
463+ # This is a double action, but it is still faster then decoding
464+ # everytime the output even when the patch does not come from a
465+ # Linaro user.
466+ string_file = StringIO.StringIO(diff)
467+ # We need to pass the StringIO object pointing to the 3rd line
468+ # since the first two have already been read, and they contains
469+ # information not needed for the patch.
470+ for i in range(0, 2):
471+ string_file.readline()
472+ patch, _ = parse_patch(string_file)
473+
474+ return patch
475+
476+
477+def is_linaro_user(user, db, crowd):
478+ """Verifies if a user is a valid Linaro account.
479+
480+ :param user: The email address to verify.
481+ :param db: A PatchworkDB instance for the cache backend.
482+ :param crowd: A Crowd instance to retrieve user data if necessary.
483+ """
484+ cached_user = db.get_user(user)
485+
486+ is_valid = False
487+ if cached_user:
488+ # TODO: we need to set a timeout and check again even if the user
489+ # is cached, it might have been removed from Crowd.
490+ is_valid |= cached_user[1]
491+ else:
492+ # XXX: The first time this runs, it might hit the Crowd server hard,
493+ # since none of the users will be cached.
494+ valid_user = crowd.is_valid_user(user)
495+ is_valid |= valid_user
496+ db.insert_user(user, valid_user, datetime.datetime.now())
497+
498+ return is_valid
499+
500+
501 def get_patches_matching_commit(project, commit_diff, commit_msg,
502 commit_author):
503 """Return the Patch objects that are likely matches to the given commit.
504
505=== modified file 'apps/settings.py'
506--- apps/settings.py 2013-10-23 19:18:06 +0000
507+++ apps/settings.py 2013-12-05 10:08:30 +0000
508@@ -147,6 +147,9 @@
509 # File with a list of groups that should be created in the DB. If other groups
510 # are taken out from Crowd, and they are not here, they will be skipped.
511 CROWD_GROUPS_WHITELIST = None
512+# Full path to a DB file.
513+# This is used to have a cache of which users exists in the Crowd database.
514+CROWD_USERS_DB_FILE = None
515
516 # Where to store the git repos that we scan to identify which patches have
517 # been committed.

Subscribers

People subscribed via source and target branches