Merge lp:~gz/bzr/transport_post_connect_hook into lp:bzr
- transport_post_connect_hook
- Merge into bzr.dev
Status: | Superseded |
---|---|
Proposed branch: | lp:~gz/bzr/transport_post_connect_hook |
Merge into: | lp:bzr |
Prerequisite: | lp:~spiv/bzr/hooks-refactoring |
Diff against target: |
280 lines (+62/-108) 6 files modified
bzrlib/hooks.py (+1/-0) bzrlib/tests/__init__.py (+14/-9) bzrlib/tests/test_transport.py (+23/-0) bzrlib/tests/transport_util.py (+4/-98) bzrlib/transport/__init__.py (+18/-1) bzrlib/transport/remote.py (+2/-0) |
To merge this branch: | bzr merge lp:~gz/bzr/transport_post_connect_hook |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Vincent Ladeuil | Approve | ||
Review via email: mp+38431@code.launchpad.net |
This proposal has been superseded by a proposal from 2011-12-14.
Commit message
Description of the change
Creates a new post_connect hook for transports, and uses it in the testing framework to ensure transports are disconnected when the test finishes so we don't leak resources. This replaces the current hack that overrides get_transport which was incomplete and causing other problems. See the mailing list thread for more:
<https:/
I'm not really sure how ready this is, but putting up an mp seems as good a way of getting feedback as bugging people on IRC.
The hook is hurled together based on lp:~parthm/bzr/173274-export-hooks if there are opinions on how it should be written differently I'd like to know.
Todo:
News. I'm leaving this till landing actually happens to lessen pain of flux.
Other documentation changes?
Some tests that don't suck. Suggestions welcome.
Martin Packman (gz) wrote : | # |
So, very nearly this exact hook exists in bzrlib.
Martin Packman (gz) wrote : | # |
Just having a hook turned out to be insufficient, and results in worse hangs than the old get_transport hack as can be seen from the nearly four hour babune runtime last night:
<http://
The problem is that RemoteTransport classes don't use _set_connection and generally behave rather differently. Throwing in a hook point in __init__ when a new medium is built gets us to a leak-free 40 minute runtime:
<http://
However the semantics are not quite correct there. Even if we don't have to worry about remote reconnections (do we?) the post_connect hook is happening before the real connection, and potentially twice in the RemoteHTTPTransport case. Combined with the existing confusion over how many times disconnect gets called, it suggests this hook probably isn't sane enough to be generally useful. We do want the leak problem fixed asap though, so if anyone has any clever ideas...
Vincent Ladeuil (vila) wrote : | # |
>>>>> Martin [gz] <email address hidden> writes:
> Just having a hook turned out to be insufficient, and results in
> worse hangs than the old get_transport hack as can be seen from
> the nearly four hour babune runtime last night:
> <http://
> The problem is that RemoteTransport classes don't use
> _set_connection and generally behave rather differently. Throwing
> in a hook point in __init__ when a new medium is built gets us to
> a leak-free 40 minute runtime:
> <http://
Ok, so this validates the approach of calling disconnect for all
transports that have been able to connect to their server and ensures
all code paths are covered. As such, I'm tempted to accept the patch as
is waiting for a better solution in a followon.
From a design point of view, I think this outlines the divergence
between the smart transport and the others and came from the time where
we implemented connection sharing.
The smart transport has a medium object that implements
_ensure_
_SharedConnection object which is used a data container by the
transport, but the transport object implements connect_xxx() and
disconnect() while calling _set_connection() when the connection is
established (including reconnections).
So while the post_connect hook can be implemented for the transports
objects, it can't be properly implemented for smart transports where the
transport object is not available at the medium level when the
connection occur (in theory we could pass it done but I'm worried about
ref cycles there).
This hints as connection hook instead of a transport hook but this
doesn't play well with the transport objects who consider the connection
as an opaque attribute so far.
> However the semantics are not quite correct there. Even if we
> don't have to worry about remote reconnections (do we?) the
> post_connect hook is happening before the real connection, and
> potentially twice in the RemoteHTTPTransport case. Combined with
> the existing confusion over how many times disconnect gets called,
I think the confusion comes the fact that disconnect() should be
implemented at the transport level since it is defined as closing the
connection even if there are other transports sharing this connection
(as opposed to closing the connection when the *last* transport using
the connection requires it).
In the test suite, all transports sharing a connection calls
disconnect() to ensure we don't get leaks but that's the expected
behaviour.
> it suggests this hook probably isn't sane enough to be generally
> useful.
Indeed, it lies for the smart transports as it's called *before* the
connection occurs.
> We do want the leak problem fixed asap though, so if anyone has
> any clever ideas...
One alternative would be to add a 'created' hook for transports which,
for the tests, will also call disconnect(). From your hack_transport
branch we know that calling disconnect() for all created transports is
enough to fix the leaks ev...
Preview Diff
1 | === modified file 'bzrlib/hooks.py' |
2 | --- bzrlib/hooks.py 2010-09-28 07:34:29 +0000 |
3 | +++ bzrlib/hooks.py 2010-10-15 14:01:50 +0000 |
4 | @@ -80,6 +80,7 @@ |
5 | ('bzrlib.smart.client', '_SmartClient.hooks', 'SmartClientHooks'), |
6 | ('bzrlib.smart.server', 'SmartTCPServer.hooks', 'SmartServerHooks'), |
7 | ('bzrlib.status', 'hooks', 'StatusHooks'), |
8 | + ('bzrlib.transport', 'Transport.hooks', 'TransportHooks'), |
9 | ('bzrlib.version_info_formats.format_rio', 'RioVersionInfoBuilder.hooks', |
10 | 'RioVersionInfoBuilderHooks'), |
11 | ('bzrlib.merge_directive', 'BaseMergeDirective.hooks', |
12 | |
13 | === modified file 'bzrlib/tests/__init__.py' |
14 | --- bzrlib/tests/__init__.py 2010-10-15 11:20:45 +0000 |
15 | +++ bzrlib/tests/__init__.py 2010-10-15 14:01:50 +0000 |
16 | @@ -2444,15 +2444,20 @@ |
17 | |
18 | def setUp(self): |
19 | super(TestCaseWithMemoryTransport, self).setUp() |
20 | - # Ensure that ConnectedTransport doesn't leak sockets |
21 | - def get_transport_with_cleanup(*args, **kwargs): |
22 | - t = orig_get_transport(*args, **kwargs) |
23 | - if isinstance(t, _mod_transport.ConnectedTransport): |
24 | - self.addCleanup(t.disconnect) |
25 | - return t |
26 | - |
27 | - orig_get_transport = self.overrideAttr(_mod_transport, 'get_transport', |
28 | - get_transport_with_cleanup) |
29 | + |
30 | + def _add_disconnect_cleanup(transport): |
31 | + """Schedule disconnection of given transport at test cleanup |
32 | + |
33 | + This needs to happen for all connected transports or leaks occur. |
34 | + |
35 | + Note reconnections may mean we call disconnect multiple times per |
36 | + transport which is suboptimal but seems harmless. |
37 | + """ |
38 | + self.addCleanup(transport.disconnect) |
39 | + |
40 | + _mod_transport.Transport.hooks.install_named_hook('post_connect', |
41 | + _add_disconnect_cleanup, None) |
42 | + |
43 | self._make_test_root() |
44 | self.addCleanup(os.chdir, os.getcwdu()) |
45 | self.makeAndChdirToTestDir() |
46 | |
47 | === modified file 'bzrlib/tests/test_transport.py' |
48 | --- bzrlib/tests/test_transport.py 2010-10-08 07:17:16 +0000 |
49 | +++ bzrlib/tests/test_transport.py 2010-10-15 14:01:50 +0000 |
50 | @@ -465,6 +465,29 @@ |
51 | server.stop_server() |
52 | |
53 | |
54 | +class TestHooks(tests.TestCase): |
55 | + """Basic tests for transport hooks""" |
56 | + |
57 | + def _get_connected_transport(self): |
58 | + return transport.ConnectedTransport("bogus:nowhere") |
59 | + |
60 | + def test_transporthooks_initialisation(self): |
61 | + """Check all expected transport hook points are set up""" |
62 | + hookpoint = transport.TransportHooks() |
63 | + self.assertTrue("post_connect" in hookpoint, |
64 | + "post_connect not in %s" % (hookpoint,)) |
65 | + |
66 | + def test_post_connect(self): |
67 | + """Ensure the post_connect hook is called when _set_transport is""" |
68 | + calls = [] |
69 | + transport.Transport.hooks.install_named_hook("post_connect", |
70 | + calls.append, None) |
71 | + t = self._get_connected_transport() |
72 | + self.assertLength(0, calls) |
73 | + t._set_connection("connection", "auth") |
74 | + self.assertEqual(calls, [t]) |
75 | + |
76 | + |
77 | class PathFilteringDecoratorTransportTest(tests.TestCase): |
78 | """Pathfilter decoration specific tests.""" |
79 | |
80 | |
81 | === modified file 'bzrlib/tests/transport_util.py' |
82 | --- bzrlib/tests/transport_util.py 2010-02-23 07:43:11 +0000 |
83 | +++ bzrlib/tests/transport_util.py 2010-10-15 14:01:50 +0000 |
84 | @@ -14,128 +14,34 @@ |
85 | # along with this program; if not, write to the Free Software |
86 | # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
87 | |
88 | -import bzrlib.hooks |
89 | from bzrlib.tests import features |
90 | |
91 | # SFTPTransport offers better performances but relies on paramiko, if paramiko |
92 | # is not available, we fallback to FtpTransport |
93 | if features.paramiko.available(): |
94 | from bzrlib.tests import test_sftp_transport |
95 | - from bzrlib.transport import sftp |
96 | + from bzrlib.transport import sftp, Transport |
97 | _backing_scheme = 'sftp' |
98 | _backing_transport_class = sftp.SFTPTransport |
99 | _backing_test_class = test_sftp_transport.TestCaseWithSFTPServer |
100 | else: |
101 | - from bzrlib.transport import ftp |
102 | + from bzrlib.transport import ftp, Transport |
103 | from bzrlib.tests import test_ftp_transport |
104 | _backing_scheme = 'ftp' |
105 | _backing_transport_class = ftp.FtpTransport |
106 | _backing_test_class = test_ftp_transport.TestCaseWithFTPServer |
107 | |
108 | -from bzrlib.transport import ( |
109 | - ConnectedTransport, |
110 | - get_transport, |
111 | - register_transport, |
112 | - register_urlparse_netloc_protocol, |
113 | - unregister_transport, |
114 | - _unregister_urlparse_netloc_protocol, |
115 | - ) |
116 | - |
117 | - |
118 | - |
119 | -class TransportHooks(bzrlib.hooks.Hooks): |
120 | - """Dict-mapping hook name to a list of callables for transport hooks""" |
121 | - |
122 | - def __init__(self): |
123 | - super(TransportHooks, self).__init__() |
124 | - # Invoked when the transport has just created a new connection. |
125 | - # The api signature is (transport, connection, credentials) |
126 | - self['_set_connection'] = [] |
127 | - |
128 | -_hooked_scheme = 'hooked' |
129 | - |
130 | -def _change_scheme_in(url, actual, desired): |
131 | - if not url.startswith(actual + '://'): |
132 | - raise AssertionError('url "%r" does not start with "%r]"' |
133 | - % (url, actual)) |
134 | - return desired + url[len(actual):] |
135 | - |
136 | - |
137 | -class InstrumentedTransport(_backing_transport_class): |
138 | - """Instrumented transport class to test commands behavior""" |
139 | - |
140 | - hooks = TransportHooks() |
141 | - |
142 | - def __init__(self, base, _from_transport=None): |
143 | - if not base.startswith(_hooked_scheme + '://'): |
144 | - raise ValueError(base) |
145 | - # We need to trick the backing transport class about the scheme used |
146 | - # We'll do the reverse when we need to talk to the backing server |
147 | - fake_base = _change_scheme_in(base, _hooked_scheme, _backing_scheme) |
148 | - super(InstrumentedTransport, self).__init__( |
149 | - fake_base, _from_transport=_from_transport) |
150 | - # The following is needed to minimize the effects of our trick above |
151 | - # while retaining the best compatibility. |
152 | - self._scheme = _hooked_scheme |
153 | - base = self._unsplit_url(self._scheme, |
154 | - self._user, self._password, |
155 | - self._host, self._port, |
156 | - self._path) |
157 | - super(ConnectedTransport, self).__init__(base) |
158 | - |
159 | - |
160 | -class ConnectionHookedTransport(InstrumentedTransport): |
161 | - """Transport instrumented to inspect connections""" |
162 | - |
163 | - def _set_connection(self, connection, credentials): |
164 | - """Called when a new connection is created """ |
165 | - super(ConnectionHookedTransport, self)._set_connection(connection, |
166 | - credentials) |
167 | - for hook in self.hooks['_set_connection']: |
168 | - hook(self, connection, credentials) |
169 | - |
170 | |
171 | class TestCaseWithConnectionHookedTransport(_backing_test_class): |
172 | |
173 | def setUp(self): |
174 | - register_urlparse_netloc_protocol(_hooked_scheme) |
175 | - register_transport(_hooked_scheme, ConnectionHookedTransport) |
176 | - self.addCleanup(unregister_transport, _hooked_scheme, |
177 | - ConnectionHookedTransport) |
178 | - self.addCleanup(_unregister_urlparse_netloc_protocol, _hooked_scheme) |
179 | super(TestCaseWithConnectionHookedTransport, self).setUp() |
180 | self.reset_connections() |
181 | - # Add the 'hooked' url to the permitted url list. |
182 | - # XXX: See TestCase.start_server. This whole module shouldn't need to |
183 | - # exist - a bug has been filed on that. once its cleanedup/removed, the |
184 | - # standard test support code will work and permit the server url |
185 | - # correctly. |
186 | - url = self.get_url() |
187 | - t = get_transport(url) |
188 | - if t.base.endswith('work/'): |
189 | - t = t.clone('../..') |
190 | - self.permit_url(t.base) |
191 | - |
192 | - def get_url(self, relpath=None): |
193 | - super_self = super(TestCaseWithConnectionHookedTransport, self) |
194 | - url = super_self.get_url(relpath) |
195 | - # Replace the backing scheme by our own (see |
196 | - # InstrumentedTransport.__init__) |
197 | - url = _change_scheme_in(url, _backing_scheme, _hooked_scheme) |
198 | - return url |
199 | |
200 | def start_logging_connections(self): |
201 | - self.overrideAttr(InstrumentedTransport, 'hooks', TransportHooks()) |
202 | - # We preserved the hooks class attribute. Now we install our hook. |
203 | - ConnectionHookedTransport.hooks.install_named_hook( |
204 | - '_set_connection', self._collect_connection, None) |
205 | + Transport.hooks.install_named_hook('post_connect', |
206 | + self.connections.append, None) |
207 | |
208 | def reset_connections(self): |
209 | self.connections = [] |
210 | |
211 | - def _collect_connection(self, transport, connection, credentials): |
212 | - # Note: uncomment the following line and use 'bt' under pdb, that will |
213 | - # identify all the connections made including the extraneous ones. |
214 | - # import pdb; pdb.set_trace() |
215 | - self.connections.append(connection) |
216 | - |
217 | |
218 | === modified file 'bzrlib/transport/__init__.py' |
219 | --- bzrlib/transport/__init__.py 2010-06-18 15:38:20 +0000 |
220 | +++ bzrlib/transport/__init__.py 2010-10-15 14:01:50 +0000 |
221 | @@ -52,7 +52,10 @@ |
222 | from bzrlib.trace import ( |
223 | mutter, |
224 | ) |
225 | -from bzrlib import registry |
226 | +from bzrlib import ( |
227 | + hooks, |
228 | + registry, |
229 | + ) |
230 | |
231 | |
232 | # a dictionary of open file streams. Keys are absolute paths, values are |
233 | @@ -267,6 +270,16 @@ |
234 | self.transport.append_bytes(self.relpath, bytes) |
235 | |
236 | |
237 | +class TransportHooks(hooks.Hooks): |
238 | + """Mapping of hook names to registered callbacks for transport hooks""" |
239 | + def __init__(self): |
240 | + super(TransportHooks, self).__init__() |
241 | + self.create_hook(hooks.HookPoint("post_connect", |
242 | + "Called after a new connection is established or a reconnect " |
243 | + "occurs. The connected transport instance is the sole argument " |
244 | + "passed.", (2, 3), None)) |
245 | + |
246 | + |
247 | class Transport(object): |
248 | """This class encapsulates methods for retrieving or putting a file |
249 | from/to a storage location. |
250 | @@ -291,6 +304,8 @@ |
251 | # where the biggest benefit between combining reads and |
252 | # and seeking is. Consider a runtime auto-tune. |
253 | _bytes_to_read_before_seek = 0 |
254 | + |
255 | + hooks = TransportHooks() |
256 | |
257 | def __init__(self, base): |
258 | super(Transport, self).__init__() |
259 | @@ -1492,6 +1507,8 @@ |
260 | """ |
261 | self._shared_connection.connection = connection |
262 | self._shared_connection.credentials = credentials |
263 | + for hook in self.hooks["post_connect"]: |
264 | + hook(self) |
265 | |
266 | def _get_connection(self): |
267 | """Returns the transport specific connection object.""" |
268 | |
269 | === modified file 'bzrlib/transport/remote.py' |
270 | --- bzrlib/transport/remote.py 2010-06-18 15:38:20 +0000 |
271 | +++ bzrlib/transport/remote.py 2010-10-15 14:01:50 +0000 |
272 | @@ -111,6 +111,8 @@ |
273 | if 'hpss' in debug.debug_flags: |
274 | trace.mutter('hpss: Built a new medium: %s', |
275 | medium.__class__.__name__) |
276 | + for hook in self.hooks["post_connect"]: |
277 | + hook(self) |
278 | self._shared_connection = transport._SharedConnection(medium, |
279 | credentials, |
280 | self.base) |
129 + for hook in self.hooks[ "post_connect" ]:
130 + # GZ 2010-10-14: Should the hook be passed the new connection and
131 + # credentials too or does opaque really mean that?
132 + hook(self)
The hook already receives 'self' so it can access the connection/ credentials if needed.
But they are specific to each transport class...
The tests are ok. We know the hook is heavily exercised or we get leaks anyway.
If you really really want to add tests you can check what happens when several transports share a connection, but even that sounds overkill.
Write the NEWS entry, we'll see what is available when your patch lands. Don't forget the hooks-help.txt file.
I'll ping people on the pre-requisites.