Merge lp:~milo/linaro-patchmetrics/update-patches-fix into lp:linaro-patchmetrics
- update-patches-fix
- Merge into trunk
Status: | Merged |
---|---|
Merged at revision: | 400 |
Proposed branch: | lp:~milo/linaro-patchmetrics/update-patches-fix |
Merge into: | lp:linaro-patchmetrics |
Prerequisite: | lp:~milo/linaro-patchmetrics/crowd-teams |
Diff against target: |
517 lines (+309/-28) 7 files modified
apps/patchmetrics/bin/get-linaro-membership.py (+0/-1) apps/patchmetrics/crowd.py (+22/-2) apps/patchwork/bin/update-committed-patches.py (+28/-7) apps/patchwork/db.py (+120/-0) apps/patchwork/parser.py (+17/-18) apps/patchwork/utils.py (+119/-0) apps/settings.py (+3/-0) |
To merge this branch: | bzr merge lp:~milo/linaro-patchmetrics/update-patches-fix |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Paul Sokolovsky | Approve | ||
Review via email:
|
Commit message
Description of the change
This MP aims at fixing small issues experienced when running the update-
For general information about the changes, please refer to the commit message.
What was not captured in that message is:
- Some PEP8 fixes
- New settings introduced used to define a file path where to store the SQLite DB
- 383. By Milo Casagrande
-
Fixed output collection for rev-list git command.
- 384. By Milo Casagrande
-
Fixed missing exception.
- 385. By Milo Casagrande
-
Fixed formatting.
- 386. By Milo Casagrande
-
Merged from trunk.
- 387. By Milo Casagrande
-
Removed unused imports, added comment.
- 388. By Milo Casagrande
-
Merged from trunk.
- 389. By Milo Casagrande
-
Updated to use new Crowd code.
- 390. By Milo Casagrande
-
Merged from trunk.
- 391. By Milo Casagrande
-
Merged from trunk.

Paul Sokolovsky (pfalcon) wrote : | # |

Paul Sokolovsky (pfalcon) wrote : | # |
84 + if settings.
85 + print >> sys.stderr, ("CROWD_
86 + "settings.py.\n")
apparently cut&paste error in msg.

Paul Sokolovsky (pfalcon) wrote : | # |
Otherwise, logic looks good, but there really should be expiration of old cached users.

Milo Casagrande (milo) wrote : | # |
On Wed, Dec 4, 2013 at 6:18 PM, Paul Sokolovsky
<email address hidden> wrote:
> Review: Approve
>
> Otherwise, logic looks good, but there really should be expiration of old cached users.
Thanks Paul,
I'll work on that and propose another merge.
--
Milo Casagrande | Automation Engineer
Linaro.org <www.linaro.org> │ Open source software for ARM SoCsThanks Paul,
- 392. By Milo Casagrande
-
Fixed print statement.
Preview Diff
1 | === modified file 'apps/patchmetrics/bin/get-linaro-membership.py' |
2 | --- apps/patchmetrics/bin/get-linaro-membership.py 2013-10-17 15:53:51 +0000 |
3 | +++ apps/patchmetrics/bin/get-linaro-membership.py 2013-12-05 10:08:30 +0000 |
4 | @@ -48,7 +48,6 @@ |
5 | linaro = lp.people['linaro'] |
6 | |
7 | if settings.AUTH_CROWD_APPLICATION_USER: |
8 | - print "Using Crowd groups" |
9 | whitelisted_groups = [] |
10 | if settings.CROWD_GROUPS_WHITELIST is None: |
11 | print "No CROWD_GROUPS_WHITELIST defined, all groups will be used." |
12 | |
13 | === modified file 'apps/patchmetrics/crowd.py' |
14 | --- apps/patchmetrics/crowd.py 2013-09-27 16:32:30 +0000 |
15 | +++ apps/patchmetrics/crowd.py 2013-12-05 10:08:30 +0000 |
16 | @@ -20,7 +20,6 @@ |
17 | import base64 |
18 | import httplib |
19 | import urllib |
20 | -import ConfigParser |
21 | import json |
22 | import types |
23 | |
24 | @@ -182,6 +181,26 @@ |
25 | |
26 | return user |
27 | |
28 | + def is_valid_user(self, email): |
29 | + """Handy function to check if a user exists or not. |
30 | + |
31 | + :param email: The user email. |
32 | + :return True or False. |
33 | + """ |
34 | + params = {"username": email} |
35 | + |
36 | + resource = "/user?{0}".format(urllib.urlencode(params)) |
37 | + api_url = "/crowd/rest/usermanagement/1{0}".format(resource) |
38 | + |
39 | + valid = True |
40 | + try: |
41 | + self._get(api_url) |
42 | + except CrowdNotFoundException: |
43 | + # In case of other exceptions, raise them. |
44 | + valid = False |
45 | + |
46 | + return valid |
47 | + |
48 | def _get_rest_usermanagement(self, resource): |
49 | api_url = "/{0}{1}".format(self._uri, resource) |
50 | return self._get(api_url) |
51 | @@ -211,4 +230,5 @@ |
52 | raise CrowdForbiddenException( |
53 | "Access forbidden to fulfill the request") |
54 | else: |
55 | - raise CrowdException("Unknown Crowd status {0}".format(resp.status)) |
56 | + raise CrowdException( |
57 | + "Unknown Crowd status {0}".format(resp.status)) |
58 | |
59 | === modified file 'apps/patchwork/bin/update-committed-patches.py' |
60 | --- apps/patchwork/bin/update-committed-patches.py 2013-10-17 12:22:58 +0000 |
61 | +++ apps/patchwork/bin/update-committed-patches.py 2013-12-05 10:08:30 +0000 |
62 | @@ -10,26 +10,47 @@ |
63 | # accepted once they reach the master branch. |
64 | |
65 | import _pythonpath |
66 | +import atexit |
67 | import sys |
68 | + |
69 | from datetime import datetime |
70 | -from operator import attrgetter |
71 | - |
72 | -from lockfile import FileLock, LockFailed, LockTimeout |
73 | - |
74 | +from django.conf import settings |
75 | from django.db.models import Q |
76 | from dulwich.repo import Repo |
77 | +from lockfile import FileLock, LockFailed, LockTimeout |
78 | +from operator import attrgetter |
79 | |
80 | +from patchmetrics.crowd import Crowd |
81 | +from patchwork.db import PatchworkDB |
82 | from patchwork.models import Project, State |
83 | from patchwork.utils import ( |
84 | ensure_source_checkout_for_project, |
85 | get_commits_to_parse, |
86 | - get_diff_for_commit, |
87 | + get_diff_for_linaro_commit, |
88 | get_patches_matching_commit, |
89 | GitException |
90 | - ) |
91 | +) |
92 | |
93 | |
94 | def main(): |
95 | + |
96 | + if settings.AUTH_CROWD_APPLICATION_USER is None: |
97 | + print >> sys.stderr, ("AUTH_CROWD_APPLICATION_USER not defined in " |
98 | + "settings.py.\n") |
99 | + sys.exit(1) |
100 | + else: |
101 | + crwd = Crowd(settings.AUTH_CROWD_APPLICATION_USER, |
102 | + settings.AUTH_CROWD_APPLICATION_PASSWORD, |
103 | + settings.AUTH_CROWD_SERVER_REST_URI) |
104 | + |
105 | + if settings.CROWD_USERS_DB_FILE is None: |
106 | + print >> sys.stderr, ("CROWD_USERS_DB_FILE not defined in " |
107 | + "settings.py.\n") |
108 | + sys.exit(1) |
109 | + |
110 | + cache_db = PatchworkDB(settings.CROWD_USERS_DB_FILE) |
111 | + atexit.register(cache_db.close) |
112 | + |
113 | projects = Project.objects.exclude( |
114 | Q(source_tree__isnull=True) | Q(source_tree='')) |
115 | |
116 | @@ -66,7 +87,7 @@ |
117 | for commit_id in commit_ids: |
118 | project.last_seen_commit_ref = commit_id |
119 | project.save() |
120 | - diff = get_diff_for_commit(root, commit_id) |
121 | + diff = get_diff_for_linaro_commit(root, commit_id, crwd, cache_db) |
122 | if diff is None: |
123 | # Some commits result in empty diffs (e.g. only changing file |
124 | # permissions); in these cases we just skip them. |
125 | |
126 | === added file 'apps/patchwork/db.py' |
127 | --- apps/patchwork/db.py 1970-01-01 00:00:00 +0000 |
128 | +++ apps/patchwork/db.py 2013-12-05 10:08:30 +0000 |
129 | @@ -0,0 +1,120 @@ |
130 | +# Copyright (C) 2013 Linaro |
131 | +# |
132 | +# Author: Milo Casagrande <milo.casagrande@linaro.org> |
133 | +# This file is part of the Patchmetrics package. |
134 | +# |
135 | +# Patchmetrics is free software; you can redistribute it and/or modify |
136 | +# it under the terms of the GNU General Public License as published by |
137 | +# the Free Software Foundation; either version 2 of the License, or |
138 | +# (at your option) any later version. |
139 | +# |
140 | +# Patchmetrics is distributed in the hope that it will be useful, |
141 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of |
142 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
143 | +# GNU General Public License for more details. |
144 | +# |
145 | +# You should have received a copy of the GNU General Public License |
146 | +# along with Patchwork; if not, write to the Free Software |
147 | +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA |
148 | + |
149 | +import sqlite3 |
150 | + |
151 | + |
152 | +class PatchworkDB(object): |
153 | + """A cache backed by a SQLite DB to store users. |
154 | + |
155 | + This is used to provide a users cache in order to speed up the process |
156 | + of analyzing patches from a git repository. |
157 | + |
158 | + Users are stored with their email address, and a boolean value indicating |
159 | + if the user is a valid Linaro user. A valid Linaro user is a user that |
160 | + exists in Linaro Login (Crowd). |
161 | + |
162 | + There is no user logic here, it is just used as a cache backend. |
163 | + """ |
164 | + |
165 | + def __init__(self, db_file): |
166 | + self.db_file = db_file |
167 | + self._connection = None |
168 | + self._cursor = None |
169 | + |
170 | + self._init_db() |
171 | + |
172 | + def _init_db(self): |
173 | + if not self.table_exists('users'): |
174 | + self.cursor.execute('''CREATE TABLE users |
175 | + (id TEXT PRIMARY KEY, valid INTEGER, ts TIMESTAMP)''') |
176 | + |
177 | + def close(self): |
178 | + """Closes the DB connection.""" |
179 | + # Force a commit, even if we are in autocommit mode. |
180 | + self.save() |
181 | + self.connection.close() |
182 | + |
183 | + def save(self): |
184 | + """Performs a commit on the DB.""" |
185 | + self.connection.commit() |
186 | + |
187 | + @property |
188 | + def connection(self): |
189 | + """The connection to the DB.""" |
190 | + if self._connection is None: |
191 | + self._connection = sqlite3.connect(self.db_file) |
192 | + return self._connection |
193 | + |
194 | + @property |
195 | + def cursor(self): |
196 | + """The connection cursor.""" |
197 | + if self._cursor is None: |
198 | + self._cursor = self.connection.cursor() |
199 | + return self._cursor |
200 | + |
201 | + def table_exists(self, table): |
202 | + """Checks if a table exists in the DB. |
203 | + |
204 | + :param table: The name of the table to check. |
205 | + """ |
206 | + self.cursor.execute('''SELECT name FROM sqlite_master WHERE |
207 | + type='table' AND name=?''', [table]) |
208 | + |
209 | + exists = False |
210 | + if self.cursor.fetchone(): |
211 | + exists = True |
212 | + return exists |
213 | + |
214 | + def insert_user(self, email, valid, timestamp): |
215 | + """Inserts a user in the DB. |
216 | + |
217 | + :param email: The user email, this is the unique key in the DB. |
218 | + :param valid: If the user is a valid Linaro user. |
219 | + :param timestamp: When the user was added in the cache. |
220 | + """ |
221 | + self.cursor.execute('''INSERT INTO users(id, valid, ts) |
222 | + VALUES (?, ?, ?)''', (email, valid, timestamp)) |
223 | + |
224 | + def update_user(self, email, valid, timestamp): |
225 | + """Updates a user in the DB.""" |
226 | + self.cursor.execute('''UPDATE users SET valid=?, ts=? WHERE id=?''', |
227 | + (valid, timestamp, email)) |
228 | + |
229 | + def get_user(self, email): |
230 | + """Retrieves a user from the DB. |
231 | + |
232 | + :param email: The user email. |
233 | + :return A tuple with the data found, None otherwise. |
234 | + """ |
235 | + self.cursor.execute('''SELECT * FROM users where id=?''', [email]) |
236 | + return self.cursor.fetchone() |
237 | + |
238 | + def user_exists(self, email): |
239 | + """Verifies if a user exists in the DB. |
240 | + |
241 | + :param email: The user email. |
242 | + :return True or False. |
243 | + """ |
244 | + result = self.get_user(email) |
245 | + |
246 | + exists = False |
247 | + if result: |
248 | + exists = True |
249 | + return exists |
250 | |
251 | === modified file 'apps/patchwork/parser.py' |
252 | --- apps/patchwork/parser.py 2012-10-18 09:28:17 +0000 |
253 | +++ apps/patchwork/parser.py 2013-12-05 10:08:30 +0000 |
254 | @@ -20,7 +20,9 @@ |
255 | # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA |
256 | |
257 | |
258 | +import StringIO |
259 | import re |
260 | +import types |
261 | |
262 | try: |
263 | import hashlib |
264 | @@ -32,7 +34,14 @@ |
265 | _hunk_re = re.compile('^\@\@ -\d+(?:,(\d+))? \+\d+(?:,(\d+))? \@\@') |
266 | _filename_re = re.compile('^(---|\+\+\+) (\S+)') |
267 | |
268 | + |
269 | def parse_patch(text): |
270 | + text_file = None |
271 | + if isinstance(text, types.StringTypes): |
272 | + text_file = StringIO.StringIO(text) |
273 | + else: |
274 | + text_file = text |
275 | + |
276 | patchbuf = '' |
277 | commentbuf = '' |
278 | buf = '' |
279 | @@ -63,20 +72,15 @@ |
280 | lc = (0, 0) |
281 | hunk = 0 |
282 | |
283 | - |
284 | - for line in text.split('\n'): |
285 | - line += '\n' |
286 | - |
287 | + for line in text_file: |
288 | if state == 0: |
289 | if line.startswith('diff ') or line.startswith('===') \ |
290 | or line.startswith('Index: '): |
291 | state = 1 |
292 | buf += line |
293 | - |
294 | elif line.startswith('--- '): |
295 | state = 2 |
296 | buf += line |
297 | - |
298 | else: |
299 | commentbuf += line |
300 | |
301 | @@ -89,11 +93,9 @@ |
302 | if line.startswith('+++ '): |
303 | state = 3 |
304 | buf += line |
305 | - |
306 | elif hunk: |
307 | state = 1 |
308 | buf += line |
309 | - |
310 | else: |
311 | state = 0 |
312 | commentbuf += buf + line |
313 | @@ -102,7 +104,6 @@ |
314 | elif state == 3: |
315 | match = _hunk_re.match(line) |
316 | if match: |
317 | - |
318 | def fn(x): |
319 | if not x: |
320 | return 1 |
321 | @@ -113,16 +114,13 @@ |
322 | state = 4 |
323 | patchbuf += buf + line |
324 | buf = '' |
325 | - |
326 | elif line.startswith('--- '): |
327 | patchbuf += buf + line |
328 | buf = '' |
329 | state = 2 |
330 | - |
331 | elif hunk: |
332 | state = 1 |
333 | buf += line |
334 | - |
335 | else: |
336 | state = 0 |
337 | commentbuf += buf + line |
338 | @@ -161,6 +159,7 @@ |
339 | |
340 | return (patchbuf, commentbuf) |
341 | |
342 | + |
343 | def hash_patch(str): |
344 | # normalise spaces |
345 | str = str.replace('\r', '') |
346 | @@ -213,12 +212,12 @@ |
347 | from optparse import OptionParser |
348 | |
349 | parser = OptionParser() |
350 | - parser.add_option('-p', '--patch', action = 'store_true', |
351 | - dest = 'print_patch', help = 'print parsed patch') |
352 | - parser.add_option('-c', '--comment', action = 'store_true', |
353 | - dest = 'print_comment', help = 'print parsed comment') |
354 | - parser.add_option('-#', '--hash', action = 'store_true', |
355 | - dest = 'print_hash', help = 'print patch hash') |
356 | + parser.add_option('-p', '--patch', action='store_true', |
357 | + dest='print_patch', help='print parsed patch') |
358 | + parser.add_option('-c', '--comment', action='store_true', |
359 | + dest='print_comment', help='print parsed comment') |
360 | + parser.add_option('-#', '--hash', action='store_true', |
361 | + dest='print_hash', help='print patch hash') |
362 | |
363 | (options, args) = parser.parse_args() |
364 | |
365 | |
366 | === modified file 'apps/patchwork/utils.py' |
367 | --- apps/patchwork/utils.py 2013-10-16 12:29:19 +0000 |
368 | +++ apps/patchwork/utils.py 2013-12-05 10:08:30 +0000 |
369 | @@ -17,8 +17,10 @@ |
370 | # along with Patchwork; if not, write to the Free Software |
371 | # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA |
372 | |
373 | +import StringIO |
374 | import atexit |
375 | import chardet |
376 | +import datetime |
377 | import os |
378 | import re |
379 | import subprocess |
380 | @@ -395,6 +397,123 @@ |
381 | return patch |
382 | |
383 | |
384 | +def get_diff_for_linaro_commit(root, commit_id, crowd, db): |
385 | + """Gets a diff from a commit id for a Linaro made change. |
386 | + |
387 | + If the author or the committer of the commit is not a Linaro user, then |
388 | + the commit will not be considered. |
389 | + |
390 | + This is a slightly different implementation than `get_diff_for_commit`, |
391 | + where here it checks if the user is a valid Linaro account before taking |
392 | + in account the patch. |
393 | + |
394 | + :param root: The path on the file system where the repository is stored. |
395 | + :param commit_id: The commmit ID to parse. |
396 | + :param crowd: A Crowd object used to determine if the user is from Linaro. |
397 | + :param db: A PatchworkDB object used as a cache for users. |
398 | + """ |
399 | + stdout = "" |
400 | + patch = None |
401 | + author = None |
402 | + committer = None |
403 | + |
404 | + # We create a custom view on the commit id diff. We extract the author |
405 | + # and committer emails, and check if they are from Linaro people, otherwise |
406 | + # we do not even consider the commit and move one. |
407 | + # Most of the time patchwork/patchmetrics spends a lot of time calculating |
408 | + # diffs for patches that might not even be from Linaro people. |
409 | + # The first format prints on stdout the author and commiter emails. |
410 | + # The rest of the format string is exactly as what is shown with a simple |
411 | + # `git show COMMIT_ID`. |
412 | + format_string = "%ae%n%ce%n" |
413 | + format_string += "commit %H%n" |
414 | + format_string += "Author: %aN <%aE>%n" |
415 | + format_string += "Date: %ad%n" |
416 | + |
417 | + cmd_args = ['git', 'show', '--format=format:{0}'.format(format_string), |
418 | + commit_id] |
419 | + proc = subprocess.Popen(cmd_args, cwd=root, bufsize=4096, |
420 | + stdout=subprocess.PIPE, stderr=DEVNULL) |
421 | + |
422 | + while True: |
423 | + for line in iter(proc.stdout.readline, b''): |
424 | + stdout += line |
425 | + if proc.poll() is not None: |
426 | + break |
427 | + |
428 | + if proc.returncode != 0: |
429 | + print ("Error retrieving diff for commit {0}".format(commit_id)) |
430 | + else: |
431 | + # Since with big diffs a lot of time is spent decoding the output from |
432 | + # the git command, we first read two lines from the output (author and |
433 | + # commiter emails), perform validations on those, and in case one of |
434 | + # them is a valid Linaro user we continue with output decoding and |
435 | + # diff parsing. |
436 | + string_file = StringIO.StringIO(stdout) |
437 | + |
438 | + author = string_file.readline().strip().decode('utf-8') |
439 | + committer = string_file.readline().strip().decode('utf-8') |
440 | + |
441 | + valid_patch = False |
442 | + if author == committer: |
443 | + valid_patch |= is_linaro_user(author, db, crowd) |
444 | + else: |
445 | + valid_patch |= is_linaro_user(author, db, crowd) |
446 | + valid_patch |= is_linaro_user(committer, db, crowd) |
447 | + |
448 | + if valid_patch: |
449 | + try: |
450 | + diff = stdout.decode('utf-8') |
451 | + except UnicodeDecodeError: |
452 | + try: |
453 | + # chardet.detect is rather slow so we only use it when we |
454 | + # fail # to decode from utf-8. |
455 | + encoding = chardet.detect(stdout)['encoding'] |
456 | + diff = stdout.decode(encoding) |
457 | + except UnicodeDecodeError: |
458 | + print ("Skipping commit {0} as we can't find the " |
459 | + "appropriate charset to decode " |
460 | + "it".format(commit_id)) |
461 | + return None |
462 | + |
463 | + # This is a double action, but it is still faster then decoding |
464 | + # everytime the output even when the patch does not come from a |
465 | + # Linaro user. |
466 | + string_file = StringIO.StringIO(diff) |
467 | + # We need to pass the StringIO object pointing to the 3rd line |
468 | + # since the first two have already been read, and they contains |
469 | + # information not needed for the patch. |
470 | + for i in range(0, 2): |
471 | + string_file.readline() |
472 | + patch, _ = parse_patch(string_file) |
473 | + |
474 | + return patch |
475 | + |
476 | + |
477 | +def is_linaro_user(user, db, crowd): |
478 | + """Verifies if a user is a valid Linaro account. |
479 | + |
480 | + :param user: The email address to verify. |
481 | + :param db: A PatchworkDB instance for the cache backend. |
482 | + :param crowd: A Crowd instance to retrieve user data if necessary. |
483 | + """ |
484 | + cached_user = db.get_user(user) |
485 | + |
486 | + is_valid = False |
487 | + if cached_user: |
488 | + # TODO: we need to set a timeout and check again even if the user |
489 | + # is cached, it might have been removed from Crowd. |
490 | + is_valid |= cached_user[1] |
491 | + else: |
492 | + # XXX: The first time this runs, it might hit the Crowd server hard, |
493 | + # since none of the users will be cached. |
494 | + valid_user = crowd.is_valid_user(user) |
495 | + is_valid |= valid_user |
496 | + db.insert_user(user, valid_user, datetime.datetime.now()) |
497 | + |
498 | + return is_valid |
499 | + |
500 | + |
501 | def get_patches_matching_commit(project, commit_diff, commit_msg, |
502 | commit_author): |
503 | """Return the Patch objects that are likely matches to the given commit. |
504 | |
505 | === modified file 'apps/settings.py' |
506 | --- apps/settings.py 2013-10-23 19:18:06 +0000 |
507 | +++ apps/settings.py 2013-12-05 10:08:30 +0000 |
508 | @@ -147,6 +147,9 @@ |
509 | # File with a list of groups that should be created in the DB. If other groups |
510 | # are taken out from Crowd, and they are not here, they will be skipped. |
511 | CROWD_GROUPS_WHITELIST = None |
512 | +# Full path to a DB file. |
513 | +# This is used to have a cache of which users exists in the Crowd database. |
514 | +CROWD_USERS_DB_FILE = None |
515 | |
516 | # Where to store the git repos that we scan to identify which patches have |
517 | # been committed. |
r383: It says it "Fixed output collection for rev-list git command.", but it's unclear what exactly is fixed. Replacement code doesn't look to be better then original, it actually gotta have stdout/stderr read deadlock bug. But it's definitely more verbose and magic-ridden (like use of 2-arg iter, "b" string prefix etc.) Please consider these rhetoric remarks - if such changes are done, it's nice to give more details, though this specific case is minor.