Merge lp:~jameinel/bzr/2.0.4-pyrex-propagation into lp:bzr/2.0

Proposed by John A Meinel
Status: Merged
Approved by: Andrew Bennetts
Approved revision: not available
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/2.0.4-pyrex-propagation
Merge into: lp:bzr/2.0
Diff against target: 322 lines (+83/-21)
12 files modified
NEWS (+5/-0)
bzrlib/_annotator_pyx.pyx (+8/-1)
bzrlib/_btree_serializer_pyx.pyx (+2/-2)
bzrlib/_chk_map_pyx.pyx (+1/-1)
bzrlib/_dirstate_helpers_pyx.pyx (+11/-9)
bzrlib/_groupcompress_pyx.pyx (+1/-1)
bzrlib/_knit_load_data_pyx.pyx (+2/-1)
bzrlib/_known_graph_pyx.pyx (+1/-1)
bzrlib/_rio_pyx.pyx (+1/-1)
bzrlib/_walkdirs_win32.pyx (+4/-4)
bzrlib/tests/test_selftest.py (+2/-0)
bzrlib/tests/test_source.py (+45/-0)
To merge this branch: bzr merge lp:~jameinel/bzr/2.0.4-pyrex-propagation
Reviewer Review Type Date Requested Status
Andrew Bennetts Approve
Review via email: mp+16832@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

This addresses Andrew's follow up about my earlier fix.

It adds a 'test_source' test that all 'cdef' functions have thought about how they want to handle exceptions. Either they

1) Return an object
2) Return a class defined in local scope (which is an object)
3) Have an 'except ?' clause
4) Add "# cannot_raise" (useful for pure C functions that know they cannot raise)

I ran this over the .pyx files and fixed them all to conform. It does catch
+ cdef int _gather_result_for_consistency(self, result) except -1:

and _knit_load_data's
- cdef void validate(self):
+ cdef int validate(self) except -1:

Otherwise, all other cases are 'cannot_raise'. I expect to find a few more in the 2.1 branch, which will require more work than just merging 2.0 => bzr.dev once this lands. I'll create a branch for it soon.

Revision history for this message
Andrew Bennetts (spiv) wrote :

This looks good to me. I worry a little bit that the # cannot_raise annotations might become stale over time, but this seems to be as good as we can do without improving Pyrex itself.

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Review: Approve
> This looks good to me. I worry a little bit that the # cannot_raise annotations might become stale over time, but this seems to be as good as we can do without improving Pyrex itself.

So I have a block that checks if you have both a "cannot_raise" and a
"except :" clause.

But certainly it is possible that *today* I add a 'cannot_raise' and
then in the future someone adds a line that calls into python code, and
silently traps the exception. I think, as you say, the only way forward
is to improve Pyrex/Cython. I should try asking Cython what they think
it would take to implement this.

cannot_raise such that it can only call other cannot_raise functions,
and no pure-python functions. They already have 'nogil' which does some
of that. I wonder if we could cheat and use 'nogil' to get that
functionality.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktElrAACgkQJdeBCYSNAAOW5QCffahehGhbJeqh8Vz5U9nNBpME
K/gAoNXM/lGOhy+aBj86tRFscLZeUn1S
=+VOe
-----END PGP SIGNATURE-----

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-01-05 00:01:54 +0000
3+++ NEWS 2010-01-06 17:31:11 +0000
4@@ -61,6 +61,11 @@
5 Testing
6 *******
7
8+* We have a new ``test_source`` that ensures all pyrex ``cdef`` functions
9+ handle exceptions somehow. (Possibly by setting ``# cannot_raise``
10+ rather than an ``except ?:`` clause.) This should help prevent bugs like
11+ bug #495023. (John Arbash Meinel)
12+
13
14 bzr 2.0.3
15 #########
16
17=== modified file 'bzrlib/_annotator_pyx.pyx'
18--- bzrlib/_annotator_pyx.pyx 2009-07-08 17:09:03 +0000
19+++ bzrlib/_annotator_pyx.pyx 2010-01-06 17:31:11 +0000
20@@ -83,7 +83,14 @@
21 return 0
22
23
24-cdef PyObject *_next_tuple_entry(object tpl, Py_ssize_t *pos):
25+cdef PyObject *_next_tuple_entry(object tpl, Py_ssize_t *pos): # cannot_raise
26+ """Return the next entry from this tuple.
27+
28+ :param tpl: The tuple we are investigating, *must* be a PyTuple
29+ :param pos: The last item we found. Will be updated to the new position.
30+
31+ This cannot raise an exception, as it does no error checking.
32+ """
33 pos[0] = pos[0] + 1
34 if pos[0] >= PyTuple_GET_SIZE(tpl):
35 return NULL
36
37=== modified file 'bzrlib/_btree_serializer_pyx.pyx'
38--- bzrlib/_btree_serializer_pyx.pyx 2009-10-01 20:50:36 +0000
39+++ bzrlib/_btree_serializer_pyx.pyx 2010-01-06 17:31:11 +0000
40@@ -1,4 +1,4 @@
41-# Copyright (C) 2008 Canonical Ltd
42+# Copyright (C) 2008, 2009 Canonical Ltd
43 #
44 # This program is free software; you can redistribute it and/or modify
45 # it under the terms of the GNU General Public License as published by
46@@ -54,7 +54,7 @@
47
48
49 # TODO: Find some way to import this from _dirstate_helpers
50-cdef void* _my_memrchr(void *s, int c, size_t n):
51+cdef void* _my_memrchr(void *s, int c, size_t n): # cannot_raise
52 # memrchr seems to be a GNU extension, so we have to implement it ourselves
53 # It is not present in any win32 standard library
54 cdef char *pos
55
56=== modified file 'bzrlib/_chk_map_pyx.pyx'
57--- bzrlib/_chk_map_pyx.pyx 2009-06-22 12:52:39 +0000
58+++ bzrlib/_chk_map_pyx.pyx 2010-01-06 17:31:11 +0000
59@@ -65,7 +65,7 @@
60 _unknown = None
61
62 # We shouldn't just copy this from _dirstate_helpers_pyx
63-cdef void* _my_memrchr(void *s, int c, size_t n):
64+cdef void* _my_memrchr(void *s, int c, size_t n): # cannot_raise
65 # memrchr seems to be a GNU extension, so we have to implement it ourselves
66 cdef char *pos
67 cdef char *start
68
69=== modified file 'bzrlib/_dirstate_helpers_pyx.pyx'
70--- bzrlib/_dirstate_helpers_pyx.pyx 2010-01-04 20:06:30 +0000
71+++ bzrlib/_dirstate_helpers_pyx.pyx 2010-01-06 17:31:11 +0000
72@@ -119,7 +119,7 @@
73 # void *memrchr(void *s, int c, size_t len)
74
75
76-cdef void* _my_memrchr(void *s, int c, size_t n):
77+cdef void* _my_memrchr(void *s, int c, size_t n): # cannot_raise
78 # memrchr seems to be a GNU extension, so we have to implement it ourselves
79 cdef char *pos
80 cdef char *start
81@@ -165,7 +165,7 @@
82 return PyString_FromStringAndSize(s, size)
83
84
85-cdef int _is_aligned(void *ptr):
86+cdef int _is_aligned(void *ptr): # cannot_raise
87 """Is this pointer aligned to an integer size offset?
88
89 :return: 1 if this pointer is aligned, 0 otherwise.
90@@ -173,7 +173,7 @@
91 return ((<intptr_t>ptr) & ((sizeof(int))-1)) == 0
92
93
94-cdef int _cmp_by_dirs(char *path1, int size1, char *path2, int size2):
95+cdef int _cmp_by_dirs(char *path1, int size1, char *path2, int size2): # cannot_raise
96 cdef unsigned char *cur1
97 cdef unsigned char *cur2
98 cdef unsigned char *end1
99@@ -295,7 +295,7 @@
100
101
102 cdef int _cmp_path_by_dirblock_intern(char *path1, int path1_len,
103- char *path2, int path2_len):
104+ char *path2, int path2_len): # cannot_raise
105 """Compare two paths by what directory they are in.
106
107 see ``_cmp_path_by_dirblock`` for details.
108@@ -768,7 +768,7 @@
109 state._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
110
111
112-cdef int minikind_from_mode(int mode):
113+cdef int minikind_from_mode(int mode): # cannot_raise
114 # in order of frequency:
115 if S_ISREG(mode):
116 return c"f"
117@@ -915,7 +915,8 @@
118 return link_or_sha1
119
120
121-cdef char _minikind_from_string(object string):
122+# TODO: Do we want to worry about exceptions here?
123+cdef char _minikind_from_string(object string) except? -1:
124 """Convert a python string to a char."""
125 return PyString_AsString(string)[0]
126
127@@ -953,7 +954,7 @@
128 raise KeyError(PyString_FromStringAndSize(_minikind, 1))
129
130
131-cdef int _versioned_minikind(char minikind):
132+cdef int _versioned_minikind(char minikind): # cannot_raise
133 """Return non-zero if minikind is in fltd"""
134 return (minikind == c'f' or
135 minikind == c'd' or
136@@ -1373,7 +1374,7 @@
137 def iter_changes(self):
138 return self
139
140- cdef void _gather_result_for_consistency(self, result):
141+ cdef int _gather_result_for_consistency(self, result) except -1:
142 """Check a result we will yield to make sure we are consistent later.
143
144 This gathers result's parents into a set to output later.
145@@ -1381,7 +1382,7 @@
146 :param result: A result tuple.
147 """
148 if not self.partial or not result[0]:
149- return
150+ return 0
151 self.seen_ids.add(result[0])
152 new_path = result[1][1]
153 if new_path:
154@@ -1391,6 +1392,7 @@
155 # Add the root directory which parent_directories does not
156 # provide.
157 self.search_specific_file_parents.add('')
158+ return 0
159
160 cdef int _update_current_block(self) except -1:
161 if (self.block_index < len(self.state._dirblocks) and
162
163=== modified file 'bzrlib/_groupcompress_pyx.pyx'
164--- bzrlib/_groupcompress_pyx.pyx 2009-06-10 03:56:49 +0000
165+++ bzrlib/_groupcompress_pyx.pyx 2010-01-06 17:31:11 +0000
166@@ -276,7 +276,7 @@
167
168
169 cdef unsigned char *_decode_copy_instruction(unsigned char *bytes,
170- unsigned char cmd, unsigned int *offset, unsigned int *length):
171+ unsigned char cmd, unsigned int *offset, unsigned int *length): # cannot_raise
172 """Decode a copy instruction from the next few bytes.
173
174 A copy instruction is a variable number of bytes, so we will parse the
175
176=== modified file 'bzrlib/_knit_load_data_pyx.pyx'
177--- bzrlib/_knit_load_data_pyx.pyx 2009-06-22 12:52:39 +0000
178+++ bzrlib/_knit_load_data_pyx.pyx 2010-01-06 17:31:11 +0000
179@@ -97,11 +97,12 @@
180 self.end_str = NULL
181 self.history_len = 0
182
183- cdef void validate(self):
184+ cdef int validate(self) except -1:
185 if not PyDict_CheckExact(self.cache):
186 raise TypeError('kndx._cache must be a python dict')
187 if not PyList_CheckExact(self.history):
188 raise TypeError('kndx._history must be a python list')
189+ return 0
190
191 cdef object process_options(self, char *option_str, char *end):
192 """Process the options string into a list."""
193
194=== modified file 'bzrlib/_known_graph_pyx.pyx'
195--- bzrlib/_known_graph_pyx.pyx 2009-09-02 13:32:52 +0000
196+++ bzrlib/_known_graph_pyx.pyx 2010-01-06 17:31:11 +0000
197@@ -591,7 +591,7 @@
198 self._revno_first, self._revno_second, self._revno_last,
199 self.is_first_child, self.seen_by_child)
200
201- cdef int has_pending_parents(self):
202+ cdef int has_pending_parents(self): # cannot_raise
203 if self.left_pending_parent is not None or self.pending_parents:
204 return 1
205 return 0
206
207=== modified file 'bzrlib/_rio_pyx.pyx'
208--- bzrlib/_rio_pyx.pyx 2009-05-15 02:36:03 +0000
209+++ bzrlib/_rio_pyx.pyx 2010-01-06 17:31:11 +0000
210@@ -49,7 +49,7 @@
211
212 from bzrlib.rio import Stanza
213
214-cdef int _valid_tag_char(char c):
215+cdef int _valid_tag_char(char c): # cannot_raise
216 return (c == c'_' or c == c'-' or
217 (c >= c'a' and c <= c'z') or
218 (c >= c'A' and c <= c'Z') or
219
220=== modified file 'bzrlib/_walkdirs_win32.pyx'
221--- bzrlib/_walkdirs_win32.pyx 2009-03-23 14:59:43 +0000
222+++ bzrlib/_walkdirs_win32.pyx 2010-01-06 17:31:11 +0000
223@@ -109,7 +109,7 @@
224 wcslen(data.cFileName))
225
226
227-cdef int _get_mode_bits(WIN32_FIND_DATAW *data):
228+cdef int _get_mode_bits(WIN32_FIND_DATAW *data): # cannot_raise
229 cdef int mode_bits
230
231 mode_bits = 0100666 # writeable file, the most common
232@@ -121,13 +121,13 @@
233 return mode_bits
234
235
236-cdef __int64 _get_size(WIN32_FIND_DATAW *data):
237+cdef __int64 _get_size(WIN32_FIND_DATAW *data): # cannot_raise
238 # Pyrex casts a DWORD into a PyLong anyway, so it is safe to do << 32
239 # on a DWORD
240 return ((<__int64>data.nFileSizeHigh) << 32) + data.nFileSizeLow
241
242
243-cdef double _ftime_to_timestamp(FILETIME *ft):
244+cdef double _ftime_to_timestamp(FILETIME *ft): # cannot_raise
245 """Convert from a FILETIME struct into a floating point timestamp.
246
247 The fields of a FILETIME structure are the hi and lo part
248@@ -147,7 +147,7 @@
249 return (val * 1.0e-7) - 11644473600.0
250
251
252-cdef int _should_skip(WIN32_FIND_DATAW *data):
253+cdef int _should_skip(WIN32_FIND_DATAW *data): # cannot_raise
254 """Is this '.' or '..' so we should skip it?"""
255 if (data.cFileName[0] != c'.'):
256 return 0
257
258=== modified file 'bzrlib/tests/test_selftest.py'
259--- bzrlib/tests/test_selftest.py 2009-10-26 22:03:28 +0000
260+++ bzrlib/tests/test_selftest.py 2010-01-06 17:31:11 +0000
261@@ -892,6 +892,8 @@
262 result.report_unsupported(test, feature)
263 output = result_stream.getvalue()[prefix:]
264 lines = output.splitlines()
265+ # XXX: This is a timing dependent test. I've had it fail because it
266+ # took 6ms to evaluate... :(
267 self.assertEqual(lines, ['NODEP 0ms',
268 " The feature 'Feature' is not available."])
269
270
271=== modified file 'bzrlib/tests/test_source.py'
272--- bzrlib/tests/test_source.py 2009-09-10 03:44:53 +0000
273+++ bzrlib/tests/test_source.py 2010-01-06 17:31:11 +0000
274@@ -369,3 +369,48 @@
275 self.fail(
276 "these files contain an assert statement and should not:\n%s"
277 % '\n'.join(badfiles))
278+
279+ def test_extension_exceptions(self):
280+ """Extension functions should propagate exceptions.
281+
282+ Either they should return an object, have an 'except' clause, or have a
283+ "# cannot_raise" to indicate that we've audited them and defined them as not
284+ raising exceptions.
285+ """
286+ both_exc_and_no_exc = []
287+ missing_except = []
288+ class_re = re.compile(r'^(cdef\s+)?class (\w+).*:', re.MULTILINE)
289+ except_re = re.compile(r'cdef\s*' # start with cdef
290+ r'([\w *]*?)\s*' # this is the return signature
291+ r'(\w+)\s*\(' # the function name
292+ r'[^)]*\)\s*' # parameters
293+ r'(.*)\s*:' # the except clause
294+ r'\s*(#\s*cannot[- _]raise)?' # cannot raise comment
295+ )
296+ for fname, text in self.get_source_file_contents(
297+ extensions=('.pyx',)):
298+ known_classes = set([m[1] for m in class_re.findall(text)])
299+ cdefs = except_re.findall(text)
300+ for sig, func, exc_clause, no_exc_comment in cdefs:
301+ if not sig or sig in known_classes:
302+ sig = 'object'
303+ if exc_clause and no_exc_comment:
304+ both_exc_and_no_exc.append((fname, func))
305+ if sig != 'object' and not (exc_clause or no_exc_comment):
306+ missing_except.append((fname, func))
307+ error_msg = []
308+ if both_exc_and_no_exc:
309+ error_msg.append('The following functions had "cannot raise" comments'
310+ ' but did have an except clause set:')
311+ for fname, func in both_exc_and_no_exc:
312+ error_msg.append('%s:%s' % (fname, func))
313+ error_msg.extend(('', ''))
314+ if missing_except:
315+ error_msg.append('The following functions have fixed return types,'
316+ ' but no except clause.')
317+ error_msg.append('Either add an except or append "# cannot_raise".')
318+ for fname, func in missing_except:
319+ error_msg.append('%s:%s' % (fname, func))
320+ error_msg.extend(('', ''))
321+ if error_msg:
322+ self.fail('\n'.join(error_msg))

Subscribers

People subscribed via source and target branches