Merge lp:~gz/bzr/lazy_import_threadsafe_396819 into lp:bzr

Proposed by Martin Packman
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 6426
Proposed branch: lp:~gz/bzr/lazy_import_threadsafe_396819
Merge into: lp:bzr
Diff against target: 433 lines (+85/-114)
4 files modified
bzrlib/builtins.py (+11/-2)
bzrlib/lazy_import.py (+45/-58)
bzrlib/tests/test_lazy_import.py (+24/-54)
doc/en/release-notes/bzr-2.5.txt (+5/-0)
To merge this branch: bzr merge lp:~gz/bzr/lazy_import_threadsafe_396819
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
Martin von Gagern (community) Approve
Review via email: mp+87525@code.launchpad.net

Commit message

Make lazy imports proxy by default so that concurrent resolution is robust

Description of the change

Switch the default behaviour of lazy imports to do object proxying rather than raise an exception, which is the simplest and safest way of ensuring concurrent usage doesn't break. What this loses us is some detection of badly written code where a proxied object is aliased in more than one location, so copies persist even after the original is replaced by the resolved object.

This is a version of Martin von Gagern's branch, with some minor cleanups, and the complication of tracking the last replacement removed. This loses part of the compensation for changing the error detection, which was to flip proxying back to disallowed for selftest. Now imports that happen before that point will not be flagged. This is a minor additional loss, as selftest is already very limited in how it can track lazy imports, given it will only do one path and running other commands may take another, with different import order and behaviours.

The proposal doesn't mark without prerequisite set as the diff against trunk is quite different, but the changes against MvG's branch are also of interest:

<https://code.launchpad.net/~gagern/bzr/bug396819-lazy_import-threadsafe/+merge/73475>

There's a lot of prior discussion, but the short version is that without locks (which we don't want, and are dangerous when combined with python imports), the only way to support concurrent resolution of lazy objects is to accept that sometimes the factory will be called more than once, and not to treat that as an error. Detecting coding mistakes has to be left to other mechanisms than always raising an exception when a lazy object is used multiple times.

To post a comment you must log in.
Revision history for this message
Martin von Gagern (gagern) wrote :

Had a look at the individual changesets; they all look good to me.

Why ScopeReplacer._should_proxy instead of object.__getattribute__(self, '_should_proxy')? The latter would allow overriding, although we don't need it yet. Performance shouldn't be an issue here, as this will be called rarely. Is it for the sake of readability?

Diff below shows merge problems, so you should probably pull from trunk and merge manually.

I'm setting review to Approve in LP, although I'm no core dev. Hope you won't consider this presumptuous of me.

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

> Had a look at the individual changesets; they all look good to me.

Thank you for going over this Martin!

> Why ScopeReplacer._should_proxy instead of object.__getattribute__(self,
> '_should_proxy')? The latter would allow overriding, although we don't need it
> yet. Performance shouldn't be an issue here, as this will be called rarely. Is
> it for the sake of readability?

Simply because it's now not overridden in a subclass, but is changed on the class object itself, so I think this is a clearer spelling.

> Diff below shows merge problems, so you should probably pull from trunk and
> merge manually.

Thanks, will do, need to remember to add release notes too.

> I'm setting review to Approve in LP, although I'm no core dev. Hope you won't
> consider this presumptuous of me.

That's entirely appropriate, feel free to review any other merge proposals in the same manner as well. :)

Revision history for this message
Vincent Ladeuil (vila) wrote :

@gagern: You're definitely welcome to approve any proposal ! And in this specific case, you're even more welcome !

The approval rules is that 2 votes are required to land, core devs getting an implicit approval when submitting. Here, your vote gives mgz a confirmation that you agree with his modifications to your former work and this definitely has weight.

> Switch the default behaviour of lazy imports to do object proxying rather
  than raise an exception, which is the simplest and safest way of ensuring
  concurrent usage doesn't break. What this loses us is some detection of
  badly written code where a proxied object is aliased in more than one
  location, so copies persist even after the original is replaced by the
  resolved object.

Right, this is especially true if symbols (instead of modules) are lazily
imported, but this has been a hot topic for a long time and the only new
material I can add to the discussion is from
http://docs.python.org/howto/doanddont.html:

,----
| from module import name1, name2
|
| This is a “don’t” which is much weaker than the previous “don’t”s but is
| still something you should not do if you don’t have good reasons to do
| that. The reason it is usually a bad idea is because you suddenly have an
| object which lives in two separate namespaces. When the binding in one
| namespace changes, the binding in the other will not, so there will be a
| discrepancy between them. This happens when, for example, one module is
| reloaded, or changes the definition of a function at runtime.
`----

Two separate namespaces indeed which bite us anyway when monkey-patching.

So, you're not making things worse here and in fact, I think allowing
multiple replacements is just the right thing to do. I can't remember a case
where the expections raised really revealed a bug. I *can* remember multiple
cases where the fix was to *not* import lazily instead which is hardly a
win.

Anyway, under the hood, the module is still read from disk only once IIUC so
the lazy loading is achieved in any case.

380 + in order win the race, setting up the original caller to lose.

in order *to* ?

Other than that, if the tests are still passing, this seems clearer than
what we had.

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/builtins.py'
2--- bzrlib/builtins.py 2011-12-30 14:36:46 +0000
3+++ bzrlib/builtins.py 2012-01-05 10:16:30 +0000
4@@ -22,8 +22,8 @@
5
6 import bzrlib.bzrdir
7
8-from bzrlib.lazy_import import lazy_import
9-lazy_import(globals(), """
10+from bzrlib import lazy_import
11+lazy_import.lazy_import(globals(), """
12 import cStringIO
13 import errno
14 import sys
15@@ -4004,6 +4004,15 @@
16 load_list=None, debugflag=None, starting_with=None, subunit=False,
17 parallel=None, lsprof_tests=False,
18 sync=False):
19+
20+ # During selftest, disallow proxying, as it can cause severe
21+ # performance penalties and is only needed for thread
22+ # safety. The selftest command is assumed to not use threads
23+ # too heavily. The call should be as early as possible, as
24+ # error reporting for past duplicate imports won't have useful
25+ # backtraces.
26+ lazy_import.disallow_proxying()
27+
28 from bzrlib import tests
29
30 if testspecs_list is not None:
31
32=== modified file 'bzrlib/lazy_import.py'
33--- bzrlib/lazy_import.py 2011-12-19 14:40:24 +0000
34+++ bzrlib/lazy_import.py 2012-01-05 10:16:30 +0000
35@@ -53,10 +53,11 @@
36
37 __slots__ = ('_scope', '_factory', '_name', '_real_obj')
38
39- # Setting this to True will allow you to do x = y, and still access members
40- # from both variables. This should not normally be enabled, but is useful
41- # when building documentation.
42- _should_proxy = False
43+ # If you to do x = y, setting this to False will disallow access to
44+ # members from the second variable (i.e. x). This should normally
45+ # be enabled for reasons of thread safety and documentation, but
46+ # will be disabled during the selftest command to check for abuse.
47+ _should_proxy = True
48
49 def __init__(self, scope, factory, name):
50 """Create a temporary object in the specified scope.
51@@ -73,75 +74,61 @@
52 object.__setattr__(self, '_real_obj', None)
53 scope[name] = self
54
55- def _replace(self):
56- """Actually replace self with other in the given scope"""
57+ def _resolve(self):
58+ """Return the real object for which this is a placeholder"""
59 name = object.__getattribute__(self, '_name')
60- try:
61+ real_obj = object.__getattribute__(self, '_real_obj')
62+ if real_obj is None:
63+ # No obj generated previously, so generate from factory and scope.
64 factory = object.__getattribute__(self, '_factory')
65 scope = object.__getattribute__(self, '_scope')
66- except AttributeError, e:
67- # Because ScopeReplacer objects only replace a single
68- # item, passing them to another variable before they are
69- # replaced would cause them to keep getting replaced
70- # (only they are replacing the wrong variable). So we
71- # make it forbidden, and try to give a good error.
72+ obj = factory(self, scope, name)
73+ if obj is self:
74+ raise errors.IllegalUseOfScopeReplacer(name, msg="Object tried"
75+ " to replace itself, check it's not using its own scope.")
76+
77+ # Check if another thread has jumped in while obj was generated.
78+ real_obj = object.__getattribute__(self, '_real_obj')
79+ if real_obj is None:
80+ # Still no prexisting obj, so go ahead and assign to scope and
81+ # return. There is still a small window here where races will
82+ # not be detected, but safest to avoid additional locking.
83+ object.__setattr__(self, '_real_obj', obj)
84+ scope[name] = obj
85+ return obj
86+
87+ # Raise if proxying is disabled as obj has already been generated.
88+ if not ScopeReplacer._should_proxy:
89 raise errors.IllegalUseOfScopeReplacer(
90- name, msg="Object already cleaned up, did you assign it"
91- " to another variable?",
92- extra=e)
93- obj = factory(self, scope, name)
94- if obj is self:
95- raise errors.IllegalUseOfScopeReplacer(name, msg="Object tried"
96- " to replace itself, check it's not using its own scope.")
97- if ScopeReplacer._should_proxy:
98- object.__setattr__(self, '_real_obj', obj)
99- scope[name] = obj
100- return obj
101-
102- def _cleanup(self):
103- """Stop holding on to all the extra stuff"""
104- try:
105- del self._factory
106- except AttributeError:
107- # Oops, we just lost a race with another caller of _cleanup. Just
108- # ignore it.
109- pass
110-
111- try:
112- del self._scope
113- except AttributeError:
114- # Another race loss. See above.
115- pass
116-
117- # We keep _name, so that we can report errors
118- # del self._name
119+ name, msg="Object already replaced, did you assign it"
120+ " to another variable?")
121+ return real_obj
122
123 def __getattribute__(self, attr):
124- obj = object.__getattribute__(self, '_real_obj')
125- if obj is None:
126- _replace = object.__getattribute__(self, '_replace')
127- obj = _replace()
128- _cleanup = object.__getattribute__(self, '_cleanup')
129- _cleanup()
130+ obj = object.__getattribute__(self, '_resolve')()
131 return getattr(obj, attr)
132
133 def __setattr__(self, attr, value):
134- obj = object.__getattribute__(self, '_real_obj')
135- if obj is None:
136- _replace = object.__getattribute__(self, '_replace')
137- obj = _replace()
138- _cleanup = object.__getattribute__(self, '_cleanup')
139- _cleanup()
140+ obj = object.__getattribute__(self, '_resolve')()
141 return setattr(obj, attr, value)
142
143 def __call__(self, *args, **kwargs):
144- _replace = object.__getattribute__(self, '_replace')
145- obj = _replace()
146- _cleanup = object.__getattribute__(self, '_cleanup')
147- _cleanup()
148+ obj = object.__getattribute__(self, '_resolve')()
149 return obj(*args, **kwargs)
150
151
152+def disallow_proxying():
153+ """Disallow lazily imported modules to be used as proxies.
154+
155+ Calling this function might cause problems with concurrent imports
156+ in multithreaded environments, but will help detecting wasteful
157+ indirection, so it should be called when executing unit tests.
158+
159+ Only lazy imports that happen after this call are affected.
160+ """
161+ ScopeReplacer._should_proxy = False
162+
163+
164 class ImportReplacer(ScopeReplacer):
165 """This is designed to replace only a portion of an import list.
166
167
168=== modified file 'bzrlib/tests/test_lazy_import.py'
169--- bzrlib/tests/test_lazy_import.py 2011-12-19 12:14:20 +0000
170+++ bzrlib/tests/test_lazy_import.py 2012-01-05 10:16:30 +0000
171@@ -39,10 +39,6 @@
172 def use_actions(actions):
173 InstrumentedReplacer.actions = actions
174
175- def _replace(self):
176- InstrumentedReplacer.actions.append('_replace')
177- return lazy_import.ScopeReplacer._replace(self)
178-
179 def __getattribute__(self, attr):
180 InstrumentedReplacer.actions.append(('__getattribute__', attr))
181 return lazy_import.ScopeReplacer.__getattribute__(self, attr)
182@@ -62,10 +58,6 @@
183 InstrumentedImportReplacer.actions.append(('_import', name))
184 return lazy_import.ImportReplacer._import(self, scope, name)
185
186- def _replace(self):
187- InstrumentedImportReplacer.actions.append('_replace')
188- return lazy_import.ScopeReplacer._replace(self)
189-
190 def __getattribute__(self, attr):
191 InstrumentedImportReplacer.actions.append(('__getattribute__', attr))
192 return lazy_import.ScopeReplacer.__getattribute__(self, attr)
193@@ -144,7 +136,6 @@
194 self.assertIsInstance(test_obj1, TestClass)
195 self.assertEqual('foo', test_obj1.foo(2))
196 self.assertEqual([('__getattribute__', 'foo'),
197- '_replace',
198 'factory',
199 'init',
200 ('foo', 1),
201@@ -243,7 +234,6 @@
202 self.assertEqual('class_member', test_class1.class_member)
203 self.assertEqual(test_class1, TestClass)
204 self.assertEqual([('__getattribute__', 'class_member'),
205- '_replace',
206 'factory',
207 ], actions)
208
209@@ -273,7 +263,6 @@
210 self.assertIsInstance(obj, TestClass)
211 self.assertEqual('class_member', obj.class_member)
212 self.assertEqual([('__call__', (), {}),
213- '_replace',
214 'factory',
215 'init',
216 ], actions)
217@@ -306,7 +295,6 @@
218
219 self.assertEqual((1,2,'3'), val)
220 self.assertEqual([('__call__', (1,2), {'c':'3'}),
221- '_replace',
222 'factory',
223 'func',
224 ], actions)
225@@ -368,14 +356,12 @@
226 getattr, test_obj3, 'foo')
227
228 self.assertEqual([('__getattribute__', 'foo'),
229- '_replace',
230 'factory',
231 'init',
232 ('foo', 1),
233 ('foo', 2),
234 ('foo', 3),
235 ('__getattribute__', 'foo'),
236- '_replace',
237 ], actions)
238
239 def test_enable_proxying(self):
240@@ -426,7 +412,6 @@
241 object.__getattribute__(test_obj5, '__class__'))
242
243 self.assertEqual([('__getattribute__', 'foo'),
244- '_replace',
245 'factory',
246 'init',
247 ('foo', 1),
248@@ -464,7 +449,6 @@
249 e = self.assertRaises(errors.IllegalUseOfScopeReplacer, test_obj7)
250 self.assertIn("replace itself", e.msg)
251 self.assertEqual([('__call__', (), {}),
252- '_replace',
253 'factory'], actions)
254
255
256@@ -599,7 +583,6 @@
257 self.assertEqual('x', root1.func1('x'))
258
259 self.assertEqual([('__getattribute__', 'var1'),
260- '_replace',
261 ('_import', 'root1'),
262 ('import', self.root_name, [], 0),
263 ], self.actions)
264@@ -625,7 +608,6 @@
265 self.assertEqual('y', mod1.func2('y'))
266
267 self.assertEqual([('__getattribute__', 'var2'),
268- '_replace',
269 ('_import', 'mod1'),
270 ('import', mod_path, [], 0),
271 ], self.actions)
272@@ -650,7 +632,6 @@
273 self.assertEqual('y', mod2.func2('y'))
274
275 self.assertEqual([('__getattribute__', 'var2'),
276- '_replace',
277 ('_import', 'mod2'),
278 ('import', self.root_name, [self.mod_name], 0),
279 ], self.actions)
280@@ -683,11 +664,9 @@
281
282 mod_path = self.root_name + '.' + self.mod_name
283 self.assertEqual([('__getattribute__', 'var1'),
284- '_replace',
285 ('_import', 'root3'),
286 ('import', self.root_name, [], 0),
287 ('__getattribute__', 'var2'),
288- '_replace',
289 ('_import', 'mod3'),
290 ('import', mod_path, [], 0),
291 ], self.actions)
292@@ -727,11 +706,9 @@
293
294 mod_path = self.root_name + '.' + self.mod_name
295 self.assertEqual([('__getattribute__', 'mod4'),
296- '_replace',
297 ('_import', 'root4'),
298 ('import', self.root_name, [], 0),
299 ('__getattribute__', 'var2'),
300- '_replace',
301 ('_import', 'mod4'),
302 ('import', mod_path, [], 0),
303 ], self.actions)
304@@ -792,23 +769,18 @@
305 submodb_path = sub_path + '.' + self.submodb_name
306
307 self.assertEqual([('__getattribute__', 'mod5'),
308- '_replace',
309 ('_import', 'root5'),
310 ('import', self.root_name, [], 0),
311 ('__getattribute__', 'submoda5'),
312- '_replace',
313 ('_import', 'sub5'),
314 ('import', sub_path, [], 0),
315 ('__getattribute__', 'var2'),
316- '_replace',
317 ('_import', 'mod5'),
318 ('import', mod_path, [], 0),
319 ('__getattribute__', 'var4'),
320- '_replace',
321 ('_import', 'submoda5'),
322 ('import', submoda_path, [], 0),
323 ('__getattribute__', 'var5'),
324- '_replace',
325 ('_import', 'submodb5'),
326 ('import', submodb_path, [], 0),
327 ], self.actions)
328@@ -1104,7 +1076,6 @@
329 self.assertEqual('x', root6.func1('x'))
330
331 self.assertEqual([('__getattribute__', 'var1'),
332- '_replace',
333 ('_import', 'root6'),
334 ('import', self.root_name, [], 0),
335 ], self.actions)
336@@ -1140,7 +1111,6 @@
337 submoda_path = sub_path + '.' + self.submoda_name
338
339 self.assertEqual([('__getattribute__', 'var4'),
340- '_replace',
341 ('_import', 'submoda7'),
342 ('import', submoda_path, [], 0),
343 ], self.actions)
344@@ -1165,7 +1135,6 @@
345 self.assertEqual(1, root8.func1(1))
346
347 self.assertEqual([('__getattribute__', 'var1'),
348- '_replace',
349 ('_import', 'root8'),
350 ('import', self.root_name, [], 0),
351 ], self.actions)
352@@ -1175,41 +1144,42 @@
353 """The ScopeReplacer should be reentrant.
354
355 Invoking a replacer while an invocation was already on-going leads to a
356- race to see which invocation will be the first to delete the _factory and
357- _scope attributes. The loosing caller used to see AttributeErrors (bug
358- 702914).
359+ race to see which invocation will be the first to call _replace.
360+ The losing caller used to see an exception (bugs 396819 and 702914).
361
362- These tests set up a tracer that stops at the moment just before one of
363- the attributes is being deleted and starts another call to the
364- functionality in question (__call__, __getattribute__, __setattr_) in
365- order win the race, setting up the originall caller to loose.
366+ These tests set up a tracer that stops at a suitable moment (upon
367+ entry of a specified method) and starts another call to the
368+ functionality in question (__call__, __getattribute__, __setattr_)
369+ in order to win the race, setting up the original caller to lose.
370 """
371
372 def tracer(self, frame, event, arg):
373+ if event != 'call':
374+ return self.tracer
375 # Grab the name of the file that contains the code being executed.
376- filename = frame.f_globals["__file__"]
377+ code = frame.f_code
378+ filename = code.co_filename
379 # Convert ".pyc" and ".pyo" file names to their ".py" equivalent.
380 filename = re.sub(r'\.py[co]$', '.py', filename)
381+ function_name = code.co_name
382 # If we're executing a line of code from the right module...
383- if event == 'line' and 'lazy_import.py' in filename:
384- line = linecache.getline(filename, frame.f_lineno)
385- # ...and the line of code is the one we're looking for...
386- if 'del self._factory' in line:
387- # We don't need to trace any more.
388- sys.settrace(None)
389- # Run another racer. This one will "win" the race, deleting
390- # the attributes. When the first racer resumes it will loose
391- # the race, generating an AttributeError.
392- self.racer()
393+ if (filename.endswith('lazy_import.py') and
394+ function_name == self.method_to_trace):
395+ # We don't need to trace any more.
396+ sys.settrace(None)
397+ # Run another racer. This one will "win" the race.
398+ self.racer()
399 return self.tracer
400
401- def run_race(self, racer):
402+ def run_race(self, racer, method_to_trace='_resolve'):
403+ self.overrideAttr(lazy_import.ScopeReplacer, '_should_proxy', True)
404 self.racer = racer
405+ self.method_to_trace = method_to_trace
406 sys.settrace(self.tracer)
407- self.racer() # Should not raise an AttributeError
408- # Make sure the tracer actually found the code it was looking for. If
409- # not, maybe the code was refactored in such a way that these tests
410- # aren't needed any more.
411+ self.racer() # Should not raise any exception
412+ # Make sure the tracer actually found the code it was
413+ # looking for. If not, maybe the code was refactored in
414+ # such a way that these tests aren't needed any more.
415 self.assertEqual(None, sys.gettrace())
416
417 def test_call(self):
418
419=== modified file 'doc/en/release-notes/bzr-2.5.txt'
420--- doc/en/release-notes/bzr-2.5.txt 2012-01-04 16:02:14 +0000
421+++ doc/en/release-notes/bzr-2.5.txt 2012-01-05 10:16:30 +0000
422@@ -82,6 +82,11 @@
423 of revisions whose ancestry is not obviously on the same developement
424 line. (Vincent Ladeuil, #904744)
425
426+* Make lazy imports resilient when resolved concurrently from multiple
427+ threads. Now the stand-in object will behave as a proxy for the real object
428+ after the initial access, rather than throwing. Assigning the object to
429+ multiple names should still be avoided. (Martin von Gagern, #396819)
430+
431 * Not setting ``gpg_signing_key`` or setting it to ``default`` will use the
432 user email (obtained from the ``email`` configuration option or its
433 default value). (Vincent Ladeuil, Jelmer Vernooij, #904550)