Merge lp:~maxb/bzr/2.2-close-ssh-subprocess-socket into lp:bzr/2.2

Proposed by Max Bowsher
Status: Merged
Approved by: Andrew Bennetts
Approved revision: no longer in the source branch.
Merged at revision: 5110
Proposed branch: lp:~maxb/bzr/2.2-close-ssh-subprocess-socket
Merge into: lp:bzr/2.2
Diff against target: 91 lines (+21/-18)
2 files modified
NEWS (+3/-0)
bzrlib/transport/ssh.py (+18/-18)
To merge this branch: bzr merge lp:~maxb/bzr/2.2-close-ssh-subprocess-socket
Reviewer Review Type Date Requested Status
John A Meinel Approve
Andrew Bennetts Approve
Review via email: mp+40493@code.launchpad.net

Commit message

Close leaked socket to SSH subprocesses

Description of the change

This change fixes a bug introduced in 2.2b4 when bzr was made to use a socketpair to talk to ssh subprocesses if supported on the platform. The resultant socket was never actually closed - I've not observed this breaking anything in bzr, but it causes dput sftp uploads (which use bzrlib) to hang on completion.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote :

The code looks reasonable. I wonder if to be pedantically correct we should also be attempting to close the other side of the socketpair too, in case the subprocess doesn't close it for us? That would mean SubprocessVendor._connect should subproc_sock into SSHSubprocessConnection so it can pass it along to your improved _close_ssh_proc function with the other half. This may actually explain some test suite problems — I think this may actually be an fd leak. So please extend your patch to do this.

That brings me to my other suggestion: I think we should strive to write at least one direct unit test for this code. It's now had multiple bugs, and I think it's proven that the indirect test coverage so far isn't enough. Off the top of my head a test that constructs a SSHSubprocessConnection directly may be a reasonable way to do it, but perhaps instantiating a custom SubprocessVendor would be better? So if you can think of a way to write some direct tests for the fd/process management in ssh.py, please do. I won't block this patch for lack of tests as it's a clear improvement, but I'd love to see some :)

Finally, add a NEWS entry! :)

review: Approve
Revision history for this message
Max Bowsher (maxb) wrote :

NEWS entry added.

Yes, for full correctness we would close the other end of the socketpair within bzrlib - but no need to defer that to _close_ssh_proc time, we can close it as soon as we have handed it off to the subprocess - done.

As for testing, I can't see any meaningful way to test this short of spinning up a test ssh server and trying to put a file across a transport. Unfortunately that's a outside my knowledge of how to achieve it, without a fairly large tangent of research, right now.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Max Bowsher wrote:
> NEWS entry added.
>
> Yes, for full correctness we would close the other end of the socketpair
> within bzrlib - but no need to defer that to _close_ssh_proc time, we can
> close it as soon as we have handed it off to the subprocess - done.

Great, thank you!

> As for testing, I can't see any meaningful way to test this short of spinning
> up a test ssh server and trying to put a file across a transport.
> Unfortunately that's a outside my knowledge of how to achieve it, without a
> fairly large tangent of research, right now.

That's a broader test than I was thinking of — we don't need to run SSH, any
subprocess would do (as an extreme the test might substitute some sort of Test
Double, http://www.martinfowler.com/bliki/TestDouble.html, instead of a
subprocess.Popen instance, if you wanted to unit test SSHSubprocessConnection
and/or _close_ssh_proc in isolation from SubprocessVendor). /bin/echo would
work, for instance (where available).

So maybe something like:

    def fd_count(self):
        return len(os.listdir('/proc/self/fd'))

    def test_ssh_subprocess_fd_closing(self):
        class EchoSubprocessVendor(SubprocessVendor):
            def _get_vendor_specific_argv(self, *args):
                return ['/bin/echo']
        vendor = EchoSubprocessVendor()
        initial_fd_count = self.fd_count()
        ssh_conn = vendor.connect_ssh(
            'dummy user', 'dummy password', 'dummy host', 1, 'dummy command')
        ssh_conn.close()
        self.assertEqual(initial_fd_count, self.fd_count())

Perhaps that's a bit too hacky (it would obviously need some guards to skip on
platforms without /bin/echo and /proc/self/fd), but that's the sort of thing I
had in mind. You could also perhaps just try asserting directly that
ssh_conn's proc is terminated and _sock is closed, etc. But certainly I'm
thinking of tests entirely under the transport layer.

-Andrew.

Revision history for this message
Max Bowsher (maxb) wrote :

Counting fds by means of /proc doesn't feel very nice to me. Does it risk creating issues with attempts to run tests in parallel, or can we guarantee that any such attempt will always be in forked processes? It's also problematically platform-specific, especially when testing code which already has two platform-specific codepaths (with/without socketpair support). I think it would be enough to just run a python subprocess that read stdin until eof, then exited, and let the test merely confirm that this happens without hanging or raising.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/9/2010 6:38 PM, Max Bowsher wrote:
> Max Bowsher has proposed merging lp:~maxb/bzr/2.2-close-ssh-subprocess-socket into lp:bzr/2.2.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #659590 sftp upload hangs after all files successfully uploaded
> https://bugs.launchpad.net/bugs/659590
>
>
> This change fixes a bug introduced in 2.2b4 when bzr was made to use a socketpair to talk to ssh subprocesses if supported on the platform. The resultant socket was never actually closed - I've not observed this breaking anything in bzr, but it causes dput sftp uploads (which use bzrlib) to hang on completion.

 review: approve

This looks good to me.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkza2WgACgkQJdeBCYSNAAPezgCfUCcNnF0ncxH7T0wieulAcX0k
MmcAoJx+DsepUUfF2GflA00FL4sxVWTC
=XYc0
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-11-08 06:10:41 +0000
3+++ NEWS 2010-11-10 08:46:44 +0000
4@@ -49,6 +49,9 @@
5 installed.
6 (Martin [gz], Gary van der Merwe, #632465)
7
8+* Close leaked socket to SSH subprocesses, which caused dput sftp uploads
9+ to hang. (Max Bowsher, #659590)
10+
11 Improvements
12 ************
13
14
15=== modified file 'bzrlib/transport/ssh.py'
16--- bzrlib/transport/ssh.py 2010-09-09 07:31:02 +0000
17+++ bzrlib/transport/ssh.py 2010-11-10 08:46:44 +0000
18@@ -361,13 +361,14 @@
19 # This platform doesn't support socketpair(), so just use ordinary
20 # pipes instead.
21 stdin = stdout = subprocess.PIPE
22- sock = None
23+ my_sock, subproc_sock = None, None
24 else:
25 stdin = stdout = subproc_sock
26- sock = my_sock
27 proc = subprocess.Popen(argv, stdin=stdin, stdout=stdout,
28 **os_specific_subprocess_params())
29- return SSHSubprocessConnection(proc, sock=sock)
30+ if subproc_sock is not None:
31+ subproc_sock.close()
32+ return SSHSubprocessConnection(proc, sock=my_sock)
33
34 def connect_sftp(self, username, password, host, port):
35 try:
36@@ -644,25 +645,24 @@
37 import weakref
38 _subproc_weakrefs = set()
39
40-def _close_ssh_proc(proc):
41+def _close_ssh_proc(proc, sock):
42 """Carefully close stdin/stdout and reap the SSH process.
43
44 If the pipes are already closed and/or the process has already been
45 wait()ed on, that's ok, and no error is raised. The goal is to do our best
46 to clean up (whether or not a clean up was already tried).
47 """
48- dotted_names = ['stdin.close', 'stdout.close', 'wait']
49- for dotted_name in dotted_names:
50- attrs = dotted_name.split('.')
51- try:
52- obj = proc
53- for attr in attrs:
54- obj = getattr(obj, attr)
55- except AttributeError:
56- # It's ok for proc.stdin or proc.stdout to be None.
57- continue
58- try:
59- obj()
60+ funcs = []
61+ for closeable in (proc.stdin, proc.stdout, sock):
62+ # We expect that either proc (a subprocess.Popen) will have stdin and
63+ # stdout streams to close, or that we will have been passed a socket to
64+ # close, with the option not in use being None.
65+ if closeable is not None:
66+ funcs.append(closeable.close)
67+ funcs.append(proc.wait)
68+ for func in funcs:
69+ try:
70+ func()
71 except OSError:
72 # It's ok for the pipe to already be closed, or the process to
73 # already be finished.
74@@ -707,7 +707,7 @@
75 # to avoid leaving processes lingering indefinitely.
76 def terminate(ref):
77 _subproc_weakrefs.remove(ref)
78- _close_ssh_proc(proc)
79+ _close_ssh_proc(proc, sock)
80 _subproc_weakrefs.add(weakref.ref(self, terminate))
81
82 def send(self, data):
83@@ -723,7 +723,7 @@
84 return os.read(self.proc.stdout.fileno(), count)
85
86 def close(self):
87- _close_ssh_proc(self.proc)
88+ _close_ssh_proc(self.proc, self._sock)
89
90 def get_sock_or_pipes(self):
91 if self._sock is not None:

Subscribers

People subscribed via source and target branches