Merge lp:~allenap/maas/faster-import-checks into lp:~maas-committers/maas/trunk

Proposed by Gavin Panella
Status: Merged
Approved by: Gavin Panella
Approved revision: no longer in the source branch.
Merged at revision: 5705
Proposed branch: lp:~allenap/maas/faster-import-checks
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 83 lines (+48/-10)
1 file modified
utilities/check-imports (+48/-10)
To merge this branch: bzr merge lp:~allenap/maas/faster-import-checks
Reviewer Review Type Date Requested Status
Blake Rouse (community) Approve
Review via email: mp+316824@code.launchpad.net

Commit message

Use a multiprocessing pool to make import checks faster.

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

For me this reduces wall clock time from ~6s to ~2.5s, at the expense of more CPU time.

Revision history for this message
Gavin Panella (allenap) wrote :

I realised an optimisation and it's now down to ~1.5s for me, and with overall CPU time reduced from my earlier attempt (though still higher than with no multiprocessing).

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Nice improvement. Looks good.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'utilities/check-imports'
2--- utilities/check-imports 2017-02-09 11:34:44 +0000
3+++ utilities/check-imports 2017-02-09 14:17:28 +0000
4@@ -4,7 +4,11 @@
5 import ast
6 from collections import Iterable
7 from glob import iglob
8-from itertools import chain
9+from itertools import (
10+ chain,
11+ islice,
12+)
13+import multiprocessing
14 from pathlib import Path
15 import re
16 import sys
17@@ -432,6 +436,7 @@
18
19
20 def extract(node):
21+ """Extract all imports from the given AST node."""
22 for node in ast.walk(node):
23 if isinstance(node, ast.Import):
24 for alias in node.names:
25@@ -446,16 +451,49 @@
26 pass # Not an import.
27
28
29+def _batch(objects, size):
30+ """Generate batches of `size` elements from `objects`.
31+
32+ Each batch is a list of `size` elements exactly, except for the last which
33+ may contain fewer than `size` elements.
34+ """
35+ objects = iter(objects)
36+ batch = lambda: list(islice(objects, size))
37+ return iter(batch, [])
38+
39+
40+def _expand(checks):
41+ """Generate `(rule, batch-of-filenames)` for the given checks.
42+
43+ It batches filenames to reduce the serialise/unserialise overhead when
44+ calling out to a pooled process.
45+ """
46+ for filenames, rule in checks:
47+ rule.compile() # Compile or it's slow.
48+ for filenames in _batch(filenames, 100):
49+ yield rule, filenames
50+
51+
52+def _scan1(rule, filename):
53+ """Scan one file and check against the given rule."""
54+ with tokenize.open(filename) as fd:
55+ module = ast.parse(fd.read())
56+ imports = set(extract(module))
57+ allowed = set(filter(rule.check, imports))
58+ denied = imports.difference(allowed)
59+ return filename, allowed, denied
60+
61+
62+def _scan(rule, filenames):
63+ """Scan the files and check against the given rule."""
64+ return [_scan1(rule, filename) for filename in filenames]
65+
66+
67 def scan(checks):
68- for files, rule in checks:
69- rule.compile()
70- for filename in sorted(files):
71- with tokenize.open(filename) as fd:
72- module = ast.parse(fd.read())
73- imports = set(extract(module))
74- allowed = set(filter(rule.check, imports))
75- denied = imports.difference(allowed)
76- yield filename, allowed, denied
77+ """Scan many files and check against the given rules."""
78+ with multiprocessing.Pool() as pool:
79+ for results in pool.starmap(_scan, _expand(checks)):
80+ yield from results
81
82
83 if sys.stdout.isatty():