Merge lp:~justin-fathomdb/nova/termie-bot into lp:~hudson-openstack/nova/trunk

Proposed by justinsb
Status: Work in progress
Proposed branch: lp:~justin-fathomdb/nova/termie-bot
Merge into: lp:~hudson-openstack/nova/trunk
Diff against target: 334 lines (+329/-0)
1 file modified
contrib/rules/do_checks.py (+329/-0)
To merge this branch: bzr merge lp:~justin-fathomdb/nova/termie-bot
Reviewer Review Type Date Requested Status
Vish Ishaya (community) Approve
Brian Lamar (community) Abstain
termie (community) Needs Fixing
Nova Core security contacts Pending
Review via email: mp+55031@code.launchpad.net

Description of the change

Simple style-checker that can check some of our style rules, and is easy to extend

To post a comment you must log in.
Revision history for this message
Thierry Carrez (ttx) wrote :

I like the idea, but does this need to be shipped in the core code ? It looks a bit developer-only to me.

Revision history for this message
justinsb (justin-fathomdb) wrote :

Sorry Thierry, I don't really understand. Are you saying that contrib is
shipped, and so it's in the wrong place? Where should it be if so?

Otherwise, we have lots of code for developers-only - this is like pylintrc
or the unit tests. Eventually, we might be able to make this part of the
pre-merge checks.

Revision history for this message
Thierry Carrez (ttx) wrote :

Maybe I'm overzealous in trying to keep the nova core code focused :)

My point is that the termie-bot is generally useful for openstack projects and is not nova-specific, so maybe it could live outside the nova source tree. Like pep8.

That said, I don't really care, so if the nova-core consensus is that it should live in core code, i'm fine with it.

Revision history for this message
justinsb (justin-fathomdb) wrote :

I think we have different style rules in nova than in some of the other
projects; let's start this here and see if other projects want to grab and
adapt it to their own styles. We can pull it out in future!

Revision history for this message
Jay Pipes (jaypipes) wrote :

89 +class TwoLinesAfterLog(SimpleRegexChecker):
90 + def _build_regex(self):
91 + self.regex = '\nLOG(\s)*=[^\n]*\n[^\n][^\n]'
92 + # We matched a leading \n
93 + self.skip_chars = 1

I know Andy has stated this on your reviews, but this isn't actually part of pep8 nor the HACKING style guide. This:

LOG = logging.getLogger(...)
FLAGS = flags.FLAGS

class Blah():

is perfectly acceptable according to both pep8 and HACKING, as far as I read it. In fact, a brief search through many Nova files shows tons of examples like the above.

-jay

Revision history for this message
justinsb (justin-fathomdb) wrote :

Jay - I'm guessing termie would say it comes under this:

"- thou shalt put two newlines twixt toplevel code (funcs, classes, etc)"

But half the point of this is to get consistency by codifying these rules, as well as to let code reviewers focus on higher level concerns (like the pep8 checker does).

And, yes, this checker does indeed flag a large number of issues in the current code.

Revision history for this message
Jay Pipes (jaypipes) wrote :

On Mon, Mar 28, 2011 at 12:50 PM, justinsb <email address hidden> wrote:
> Jay - I'm guessing termie would say it comes under this:
>
> "- thou shalt put two newlines twixt toplevel code (funcs, classes, etc)"
>
> But half the point of this is to get consistency by codifying these rules, as well as to let code reviewers focus on higher level concerns (like the pep8 checker does).

If that were what termie meant, I don't think he would have written
half the code in Nova to group top-level class constants together...

Do you REALLY want to see every module-level constant separated by 2 newlines?

-jay

Revision history for this message
termie (termie) wrote :

> On Mon, Mar 28, 2011 at 12:50 PM, justinsb <email address hidden> wrote:
> > Jay - I'm guessing termie would say it comes under this:
> >
> > "- thou shalt put two newlines twixt toplevel code (funcs, classes, etc)"
> >
> > But half the point of this is to get consistency by codifying these rules,
> as well as to let code reviewers focus on higher level concerns (like the pep8
> checker does).
>
> If that were what termie meant, I don't think he would have written
> half the code in Nova to group top-level class constants together...
>
> Do you REALLY want to see every module-level constant separated by 2 newlines?
>

Yes, though it is more specifically different types of objects, LOG and FLAGS are pretty different where as FLAGS and flags.DEFINE_boolean are not. We shouldn't actually have many constants, however, as most actual constants are defined as FLAGS which allows them to be described.

I can go either way on LOG specifically, I don't actually like how it is used to begin with, but my interpretation would put two lines between it and FLAGS.

> -jay

Revision history for this message
termie (termie) wrote :

> Maybe I'm overzealous in trying to keep the nova core code focused :)
>
> My point is that the termie-bot is generally useful for openstack projects and
> is not nova-specific, so maybe it could live outside the nova source tree.
> Like pep8.
>
> That said, I don't really care, so if the nova-core consensus is that it
> should live in core code, i'm fine with it.

I think the definition of what is "core" will shift greatly after the next summit, so I think contrib or tools is a fine place for this at the moment.

Revision history for this message
Jay Pipes (jaypipes) wrote :

On Mon, Mar 28, 2011 at 1:16 PM, termie <email address hidden> wrote:
>> Maybe I'm overzealous in trying to keep the nova core code focused :)
>>
>> My point is that the termie-bot is generally useful for openstack projects and
>> is not nova-specific, so maybe it could live outside the nova source tree.
>> Like pep8.
>>
>> That said, I don't really care, so if the nova-core consensus is that it
>> should live in core code, i'm fine with it.
>
> I think the definition of what is "core" will shift greatly after the next summit, so I think contrib or tools is a fine place for this at the moment.

Sorry, could you explain this? Statements like the above make it seem
like things have been decided by people, and I'm not familiar with
these decisions.

-jay

Revision history for this message
Jay Pipes (jaypipes) wrote :

On Mon, Mar 28, 2011 at 1:16 PM, termie <email address hidden> wrote:
>> On Mon, Mar 28, 2011 at 12:50 PM, justinsb <email address hidden> wrote:
>> > Jay - I'm guessing termie would say it comes under this:
>> >
>> > "- thou shalt put two newlines twixt toplevel code (funcs, classes, etc)"
>> >
>> > But half the point of this is to get consistency by codifying these rules,
>> as well as to let code reviewers focus on higher level concerns (like the pep8
>> checker does).
>>
>> If that were what termie meant, I don't think he would have written
>> half the code in Nova to group top-level class constants together...
>>
>> Do you REALLY want to see every module-level constant separated by 2 newlines?
>>
>
> Yes, though it is more specifically different types of objects, LOG and FLAGS are pretty different where as FLAGS and flags.DEFINE_boolean are not. We shouldn't actually have many constants, however, as most actual constants are defined as FLAGS which allows them to be described.
>
> I can go either way on LOG specifically, I don't actually like how it is used to begin with, but my interpretation would put two lines between it and FLAGS.

I agree that separating different "types of objects" is beneficial.
But LOG and FLAGS are both merely aliases for objects somewhere else.
So their pretty much the same thing in my book.

Something to discuss at the summit, I guess, since it's clear from
looking at the Nova code that the HACKING text is interpreted
(ignored?) differently by different devs.

-jay

Revision history for this message
termie (termie) wrote :

- copyright and license declarations
- i'd put the CHECKER stuff at the bottom of the file if it needs to be built after stuff. Alternatively you can make a small decorator for the class that automatically adds the class and define the CHECKER global at the top
- you need an if __name__ == '__main__': at the bottom so that people can import this without executing it
- should we make the output look more like pylint?

nova/api/openstack/_id_translator.py:24:19: E261 at least two spaces before inline comment

and it also has the description underneath, though that is an easy extension to make after the fact (just add a method that prints the appropriate rule)

- i can imagine some issues with the 'raise' vs 'raise e' check, while it is usually the case we have some explicit cases right now where we need to 'raise e'
- the docstring for RethrowExceptionsCarefully needs some adjustment :p

review: Needs Fixing
Revision history for this message
justinsb (justin-fathomdb) wrote :

>
> Something to discuss at the summit, I guess, since it's clear from
> looking at the Nova code that the HACKING text is interpreted
> (ignored?) differently by different devs.

Absolutely... If pep8 finds a violation, it takes me < 1 minute to fix it.
 With our rules that pep8 doesn't fix, if I have to go through the push /
propose / "needs fixing' review / move to WIP / fix / push / comment / move
to 'Needs Review' process, it probably takes 10 minutes, and that's ignoring
the reviewers time (unnecessary email to nova-core), and the fact that some
of these violations are likely overlooked. The more we can get into the
automated category, the better in my book - that's what I'm trying to
achieve here.

There's a secondary issue which is that we don't yet agree on all the rules.
 When do we get our PTLs? - hopefully they can just make these decisions by
dictat.

Revision history for this message
termie (termie) wrote :

> On Mon, Mar 28, 2011 at 1:16 PM, termie <email address hidden> wrote:
> >> Maybe I'm overzealous in trying to keep the nova core code focused :)
> >>
> >> My point is that the termie-bot is generally useful for openstack projects
> and
> >> is not nova-specific, so maybe it could live outside the nova source tree.
> >> Like pep8.
> >>
> >> That said, I don't really care, so if the nova-core consensus is that it
> >> should live in core code, i'm fine with it.
> >
> > I think the definition of what is "core" will shift greatly after the next
> summit, so I think contrib or tools is a fine place for this at the moment.
>
> Sorry, could you explain this? Statements like the above make it seem
> like things have been decided by people, and I'm not familiar with
> these decisions.
>

Just talking about a push for openstack-common, which will probably start separating out what is considered nova core, what is openstack common (not required to be used by all but available to all) and what are things built on top of those libraries. Where this stuff ends up after that discussion is up for grabs, either it will remain here or move somewhere else, but here is a good enough place until the discussion occurs.

> -jay

Revision history for this message
Brian Lamar (blamar) wrote :

Does this validate for itself, out of curiosity?

review: Needs Information
Revision history for this message
justinsb (justin-fathomdb) wrote :

I thought it did, but I can't remember if I ran it on the final version.
Obviously it should :-)

Revision history for this message
justinsb (justin-fathomdb) wrote :

I think this is looking pretty good for a v1:

I added a docstring checker to check some of our rules on docstrings

I added a format flag so that we can output in 'pretty', 'pep8', or 'pylint' style. Pylint should be parseable by the Violations Jenkins plugin (hopefully!).

I left in the RethrowExceptionsCarefully rule, because it seems to catch some real issues. We may have to use the trick that the glance guys found to rethrow an exception. We should try not to lose the stack traces if we're going to be able to support this in production.

Revision history for this message
justinsb (justin-fathomdb) wrote :

termie: Sorry missed one of your review points. I don't think I really understand this one?

> - i'd put the CHECKER stuff at the bottom of the file if it needs to be built after stuff. Alternatively you can make a small decorator for the class that automatically adds the class and define the CHECKER global at the top

I though Python defined things top-to-bottom, so I think CHECKER_BASE_CLASSES does work. Is your concern that the position is fragile?

Revision history for this message
termie (termie) wrote :

we discussed this a bit in irc, any further thoughts on python-vs-java philosophy?

Revision history for this message
justinsb (justin-fathomdb) wrote :

I think the Java philosophy is better in this case, and you think the
opposite! Let's ask the PTL mediate this one. If we want it before the
result, let's just get Vish to make the call.

Revision history for this message
justinsb (justin-fathomdb) wrote :

Vish - I think we need your help as PTL. The barrier is a difference of opinion between termie and myself over how to discover classes that perform checks.

I want to find checks the "Java way", using typing. We create a checker by deriving from Checker, perhaps indirectly. If we want to know if a class is a checker, we check the type.

Termie wants to be more "Pythonic". A Checker simply implements the correct method e.g. run_check. We catch the missing-method exception and assume that the class isn't a checker.

I feel that the Pythonic way doesn't get us very much, but we have to give up help from our compiler & toolset. For example, if we were to rename the 'run_check' method, we'd need to change every Checker. Or a Checker implementation might fat-finger the method name. In both cases, the Java approach gives us lots of help - runtime & static analysis checkers - while the Pythonic approach offers us no help at all. I just don't see what we get in return.

In general, this is what we see throughout the code base, where we don't have a well-defined contract, so different implementations don't work homogeneously and we have lots of bugs that we wouldn't have if we allowed our toolset to help us.

Revision history for this message
Vish Ishaya (vishvananda) wrote :

I have to say that I find all the logic around BASE_CLASSES a bit ugly. I don't really see the disadvantage of switching to pythonic style here. You can still make all of them subclasses, but it seems a lot easier to go:

try:
   obj.do_check()
except NotImplemented, AttributeError:
   pass

then doing all of the type checking. Getting a list of candidate classes i am less concerned with, but the global variable with skipped classes is a little odd. Since classes could be defined in other files, one possibility:

flags.DEFINE_list('checkers', ['xxx','yyy'], ....)
for checker in checkers:
   obj = utils.import_object(checker) # or equivalent

I'm a big fan of using inheritance for polymorphism and shared code, but basing runtime checks on class information should be avoided where possible.

Revision history for this message
justinsb (justin-fathomdb) wrote :

OK, your call as PTL. I hate any code that hides exceptions, but it's
not me that has to support it...

I'll make it Pythonic.

Revision history for this message
Brian Lamar (blamar) wrote :

I agree that hiding errors are bad, but we can just as easily replace that 'pass' with an informative log statement or relevant raise if needed. I'm pretty sure Vish was just trying to show the general form.

Revision history for this message
justinsb (justin-fathomdb) wrote :

The whole point is that we can't do anything other than swallow the
exception, because we don't know if the class is supposed to be a
Checker or not...

Revision history for this message
Brian Lamar (blamar) :
review: Abstain
Revision history for this message
Vish Ishaya (vishvananda) wrote :

Only 2000+ termie violations!

review: Approve
Revision history for this message
Todd Willey (xtoddx) wrote :

I'd like to have it bitch at me if I don't give it paths. Other than that I think it is good to go.

Unmerged revisions

911. By justinsb

Made discovery of checkers Pythonic

910. By justinsb

Merged with trunk

909. By justinsb

Remove regex for finding docstrings, replace with "my first parser" level code.

The regex was hanging on certain inputs, and also got confused by embedded quotes in docstrings

908. By justinsb

Fixed the checker to be checker-compliant

907. By justinsb

Code style cleanups

906. By justinsb

pep8 fixes

905. By justinsb

Check that multi-line docstrings end in a blank line

904. By justinsb

Fix pep8

903. By justinsb

Basic docstring checking

902. By justinsb

Added pep8, pylint, pretty output formats

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== added directory 'contrib/rules'
=== added file 'contrib/rules/do_checks.py'
--- contrib/rules/do_checks.py 1970-01-01 00:00:00 +0000
+++ contrib/rules/do_checks.py 2011-05-20 05:15:12 +0000
@@ -0,0 +1,329 @@
1#!/usr/bin/env python
2# vim: tabstop=4 shiftwidth=4 softtabstop=4
3
4# Copyright 2011 Justin Santa Barbara
5# All Rights Reserved.
6#
7# Licensed under the Apache License, Version 2.0 (the "License"); you may
8# not use this file except in compliance with the License. You may obtain
9# a copy of the License at
10#
11# http://www.apache.org/licenses/LICENSE-2.0
12#
13# Unless required by applicable law or agreed to in writing, software
14# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
15# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
16# License for the specific language governing permissions and limitations
17# under the License.
18
19
20import gflags
21import imp
22import os
23import re
24import sys
25
26import __main__
27
28
29FLAGS = gflags.FLAGS
30gflags.DEFINE_enum('format', 'pretty', ['pretty', 'pylint', 'pep8'],
31 'Output format to use')
32
33
34class SourceFile(object):
35 """A source code file."""
36
37 def __init__(self, path):
38 self.path = path
39
40 f = open(path, 'r')
41 code = f.read()
42 code = code.replace('\r\n', '\n')
43 code = code.replace('\n\r', '\n')
44 self.code = code
45 self.lines = code.splitlines()
46 f.close()
47 self.module = None
48
49 def get_module(self):
50 # NOTE(justinsb): This would be nice, but causes errors because the
51 # code is actually executed (e.g. flags)
52 if self.module is None:
53 mod_name, _ext = os.path.splitext(os.path.split(self.path)[-1])
54 self.module = imp.load_source(mod_name, self.path)
55 return self.module
56
57 def map_to_line_index(self, index):
58 line_index = 0
59 for i in xrange(index):
60 if self.code[i] == '\n':
61 line_index = line_index + 1
62 return line_index
63
64
65class Violation(object):
66 """Records a rule infraction."""
67
68 def __init__(self, src, index, tag, message):
69 self.src = src
70 self.index = index
71 self.tag = tag
72 self.message = message
73
74 def print_warning(self):
75 if FLAGS.format == 'pylint':
76 self.print_pylint()
77 elif FLAGS.format == 'pep8':
78 self.print_pep8()
79 else:
80 self.print_pretty()
81
82 def print_pylint(self):
83 line = self.src.map_to_line_index(self.index)
84 level = 'C'
85 description = self.tag + ' ' + self.message
86 print "%s:%s: [%s] %s" % (self.src.path, line + 1, level, description)
87
88 def print_pep8(self):
89 line = self.src.map_to_line_index(self.index)
90 level = 'E'
91 column = 1
92 errnum = self.tag
93 description = self.message
94 print ("%s:%s:%s: %s%s %s" %
95 (self.src.path, line + 1, column, level, errnum, description))
96
97 def print_pretty(self):
98 line = self.src.map_to_line_index(self.index)
99 print "%s %s %s" % (self.tag, self.src.path, line + 1)
100 if self.message != self.tag:
101 print "\t%s" % (self.message)
102 for i in xrange(line - 2, line + 3):
103 if i < 0 or i >= len(self.src.lines):
104 continue
105 if i == line:
106 prefix = "* "
107 else:
108 prefix = " "
109 print prefix + self.src.lines[i]
110
111
112class Checker(object):
113 """Base class for all rule-checkers."""
114
115 def __init__(self, src):
116 self.src = src
117
118 def do_check(self):
119 raise NotImplementedError()
120
121 def _build_error(self, index, message=None):
122 tag = self.__class__.__name__
123
124 if message is None:
125 message = self.__class__.__doc__
126 if message is None:
127 message = tag
128 return Violation(self.src, index, tag, message)
129
130
131class DocStringChecker(Checker):
132 """Checks docstrings against our style."""
133
134 def do_check(self):
135 errors = []
136
137 # NOTE(justinsb): We hit a bug in re module: the below regex hangs
138 # on some inputs. In addition, it gets confused by quotes in docs.
139 #regex = ':(\s)*"""((\s|[^"])*?)"""'
140 #matches = re.finditer(regex, self.src.code)
141
142 code = self.src.code
143 start_pos = 0
144
145 while True:
146 docstring_start = code.find('"""', start_pos)
147 if docstring_start == -1:
148 break
149
150 docstring_end = code.find('"""', docstring_start + 3)
151 if docstring_end == -1:
152 break
153 docstring_end += 3
154 docstring = code[docstring_start:docstring_end]
155
156 errors += self._check_docstring(docstring_start, docstring)
157
158 start_pos = docstring_end + 3
159
160 return errors
161
162 def _check_docstring(self, index, text):
163 errors = []
164 assert text.startswith('"""')
165 assert text.endswith('"""')
166 text = text[3:]
167 text = text[:-3]
168
169 lines = text.splitlines()
170
171 if len(lines) == 0:
172 message = "Docstrings should not be empty"
173 errors.append(self._build_error(index, message=message))
174
175 else:
176 if lines[0].startswith(' '):
177 message = "Docstrings do not start with whitespace"
178 errors.append(self._build_error(index, message=message))
179
180 if lines[0].endswith(' '):
181 message = "Docstrings should not end with whitespace"
182 errors.append(self._build_error(index, message=message))
183
184 if len(lines) == 1:
185 # Single line docstring
186 if not lines[0].endswith('.'):
187 message = "A single line docstring ends in a full stop (.)"
188 errors.append(self._build_error(index, message=message))
189
190 if len(lines) > 1:
191 # Multi-line docstring
192 if not lines[0].endswith('.'):
193 message = ("The first line of a multi-line docstring ends in "
194 "a full stop (.)")
195 errors.append(self._build_error(index, message=message))
196
197 if lines[1]:
198 message = ("The second line of a multi-line docstring must "
199 "be blank")
200 errors.append(self._build_error(index, message=message))
201
202 if lines[-2] or lines[-1].strip():
203 message = ("Leave a blank line at the end of a multi-line "
204 "docstring")
205 errors.append(self._build_error(index, message=message))
206
207 # TODO(justinsb): On a class docstring, there should be a blank line
208 # after the closing """
209
210 return errors
211
212
213class SimpleRegexChecker(Checker):
214 """Base class for checkers that find violations based on regex."""
215
216 def __init__(self, src, regex=None, skip_chars=0):
217 super(SimpleRegexChecker, self).__init__(src)
218 self.regex = regex
219 self.skip_chars = skip_chars
220
221 def _build_regex(self):
222 raise NotImplementedError()
223
224 def do_check(self):
225 if not self.regex:
226 self._build_regex()
227
228 errors = []
229 matches = re.finditer(self.regex, self.src.code)
230 for match in matches:
231 index = match.start()
232
233 index = index + self.skip_chars
234
235 errors.append(self._build_error(index))
236
237 return errors
238
239
240class TwoLinesAfterLog(SimpleRegexChecker):
241 """After a LOG statement, leave two blank lines before the FLAGS/code."""
242
243 def _build_regex(self):
244 self.regex = '\nLOG(\s)*=[^\n]*\n[^\n][^\n]'
245 # We matched a leading \n
246 self.skip_chars = 1
247
248
249class RethrowExceptionsCarefully(SimpleRegexChecker):
250 """When rethrowing an exception, do 'raise', not 'raise e'.
251
252 This preserves the stack trace. Note that there are some complexities
253 with eventlet, because eventlet can wipe the 'current' exception.
254
255 """
256
257 def _build_regex(self):
258 self.regex = '\n[ ]*raise([ ]+)([a-z]+)\n'
259 self.skip_chars = 1
260
261
262class NoWhiteSpaceLeadingDocString(SimpleRegexChecker):
263 """Don't start your docstrings with whitespace."""
264
265 def _build_regex(self):
266 self.regex = '\):\n(\s)*"""\s'
267 self.skip_chars = 3
268
269
270class AvoidAggressivePunctuation(SimpleRegexChecker):
271 """Punctuation that includes repeated question marks can upset people."""
272
273 def _build_regex(self):
274 self.regex = '\?\?'
275
276
277class OneSpaceInTodoOrNotes(SimpleRegexChecker):
278 """Use one space in your TODO and NOTE comments."""
279
280 def _build_regex(self):
281 self.regex = '#(TODO|NOTE)'
282
283
284class ClaimYourTodoAndNotes(SimpleRegexChecker):
285 """Put your name in brackets after a TODO/NOTE."""
286
287 def _build_regex(self):
288 self.regex = '(TODO|NOTE)(\s*):'
289
290
291def check_file(src_path):
292 src = SourceFile(src_path)
293
294 for x in dir(__main__):
295 s = getattr(__main__, x)
296 if not hasattr(s, 'do_check'):
297 continue
298
299 try:
300 checker = s(src)
301 errors = checker.do_check()
302 for error in errors:
303 error.print_warning()
304 except NotImplementedError, AttributeError:
305 pass
306
307
308def check_dir(dir_path):
309 for child_name in os.listdir(dir_path):
310 child_path = os.path.join(dir_path, child_name)
311 if os.path.isfile(child_path):
312 if child_path.endswith(".py"):
313 check_file(child_path)
314 else:
315 check_dir(child_path)
316
317
318def do_main():
319 args = FLAGS(sys.argv)
320
321 for path in args[1:]:
322 if os.path.isdir(path):
323 check_dir(path)
324 else:
325 check_file(path)
326
327
328if __name__ == '__main__':
329 do_main()