Merge lp:~jameinel/bzr/2.5-unending-sendall-1047309 into lp:bzr/2.5

Proposed by John A Meinel
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 6510
Proposed branch: lp:~jameinel/bzr/2.5-unending-sendall-1047309
Merge into: lp:bzr/2.5
Diff against target: 68 lines (+34/-1)
3 files modified
bzrlib/osutils.py (+5/-1)
bzrlib/tests/test_osutils.py (+22/-0)
doc/en/release-notes/bzr-2.5.txt (+7/-0)
To merge this branch: bzr merge lp:~jameinel/bzr/2.5-unending-sendall-1047309
Reviewer Review Type Date Requested Status
Richard Wilbur Needs Fixing
Martin Packman (community) Approve
Review via email: mp+123268@code.launchpad.net

Commit message

Detect if we are not progressing during osutils.send_all() and fail rather than hang. (bug #1047309)

Description of the change

See the associated bug #1047309.

Basically, with the new ConnectionReset code coming into bzr-2.5, it seems that if Paramiko gets disconnected, it is unable to send any content, but doesn't raise an error. (it returns 0 bytes sent, but no exception is raised.)

As a conservative fix, this allows us to get a series of up to 5 0-bytes-sent, but any more than that in a row, and we just treat it as a ECONNRESET. Looking at the man pages, it seems send in non-blocking socket shouldn't ever give 0 bytes sent, but I preferred to be a little cautious.

If this causes fallout, we could tweak the retries, or even do sleep-based backoff.

To post a comment you must log in.
Revision history for this message
Martin Packman (gz) wrote :

Did you find any answer on if it's valid for send() to return 0? As far as I'm aware it's not, it's either EINTR or a partial write of more than one byte, though I can't find anything explicit either.

For implementations like paramiko that just emulate the interface, coping with this is I guess required, though it would be nice to fix upstream to do something more sane. I'd prefer raising an error straight off, but understand the motivation behind retying several times, as that would previously have worked and regressions are bad. I'd much prefer not to use this heuristic if at possible though, instead quirking on the socket type or something that makes it explict this is a paramiko issue only.

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 9/7/2012 7:54 PM, Martin Packman wrote:
> Review: Needs Information
>
> Did you find any answer on if it's valid for send() to return 0? As
> far as I'm aware it's not, it's either EINTR or a partial write of
> more than one byte, though I can't find anything explicit either.
>
> For implementations like paramiko that just emulate the interface,
> coping with this is I guess required, though it would be nice to
> fix upstream to do something more sane. I'd prefer raising an error
> straight off, but understand the motivation behind retying several
> times, as that would previously have worked and regressions are
> bad. I'd much prefer not to use this heuristic if at possible
> though, instead quirking on the socket type or something that makes
> it explict this is a paramiko issue only.
>

Well, we could go with immediately erroring, and see how bad the
fallout is...

I think this is a reasonable compromise which means we won't have to
think about it again.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlBLLYMACgkQJdeBCYSNAAPMXwCdHdkRL+i8RzI7PckS7yFce62M
1NIAnjiQ3YrIyh7xgkifk5JIScAN05YS
=zHgy
-----END PGP SIGNATURE-----

Revision history for this message
Martin Packman (gz) wrote :

Won't have to think about, apart from when someone writing code wants to know if they should use socket.sendall or osutils.sendall and what the hell is going on... So, I don't really like the code but am somewhat sold on it being the least risky option for fixing this bug.

Suggested tweaks before landing:
* Remove note about making _max_no_content_sends configurable, internal hacks like this shouldn't be exposed
* Add comment/docstring to osutils.sendall about why it's checking for 0 from send
* Maybe add mutter if we get 0 returned from send which is probably the only way we'll find out if it gets hit in the wild
* Make test_send_minimal_progress clear that this is not behaviour we've actually observed

review: Approve
Revision history for this message
Richard Wilbur (richard-wilbur) wrote :

The documentation for paramiko.Channel.send: "Returns the number of bytes sent, or 0 if the channel stream is closed."[1]

Python 2.7 socket library documentation for socket.send: "Returns the number of bytes sent. Applications are responsible for checking that all data has been sent; if only some of the data was transmitted, the application needs to attempt delivery of the remaining data. For further information on this concept, consult the Socket Programming HOWTO."[2]

Interestingly the Socket Programming HOWTO (referenced above) has an example which suggests you might get a 0 return value[3], but only if the connection is broken:
            sent = self.sock.send(msg[totalsent:])
            if sent == 0:
                raise RuntimeError("socket connection broken")

Linux documentation for socket send[4]:
RETURN VALUE
       On success, these calls return the number of characters sent. On
       error, -1 is returned, and errno is set appropriately.

The consensus seems to be that a socket send should never return 0 unless there is an error. Paramiko looks like it thinks that is a normal way to signal a closed connection. The python socket programming HOWTO implies that the only reason for a 0 return value is also a closed connection.

@John, I agree with your proposition that we should throw an exception immediately when send returns 0.
@Martin, this would alleviate the need for _max_no_content_sends. Hence,
  1. I am in favour of removing _max_no_content_sends
  2. Comment/docstring for osutils.sendall would be fine if it isn't obvious from the exception we throw
  3.,4. Agree

References:
[1] http://www.lag.net/paramiko/docs/paramiko.Channel-class.html#send
[2] http://docs.python.org/library/socket.html
[3] http://docs.python.org/howto/sockets.html#socket-howto
[4] info send

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote :

Code has been simplified to just treat 0 bytes as closed connection.

Revision history for this message
John A Meinel (jameinel) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/osutils.py'
--- bzrlib/osutils.py 2012-02-28 04:58:14 +0000
+++ bzrlib/osutils.py 2012-09-11 07:29:18 +0000
@@ -2140,8 +2140,12 @@
2140 if e.args[0] != errno.EINTR:2140 if e.args[0] != errno.EINTR:
2141 raise2141 raise
2142 else:2142 else:
2143 if sent == 0:
2144 raise errors.ConnectionReset('Sending to %s returned 0 bytes'
2145 % (sock,))
2143 sent_total += sent2146 sent_total += sent
2144 report_activity(sent, 'write')2147 if report_activity is not None:
2148 report_activity(sent, 'write')
21452149
21462150
2147def connect_socket(address):2151def connect_socket(address):
21482152
=== modified file 'bzrlib/tests/test_osutils.py'
--- bzrlib/tests/test_osutils.py 2012-02-28 04:58:14 +0000
+++ bzrlib/tests/test_osutils.py 2012-09-11 07:29:18 +0000
@@ -819,6 +819,28 @@
819 self.assertEqual(None, osutils.safe_file_id(None))819 self.assertEqual(None, osutils.safe_file_id(None))
820820
821821
822
823class TestSendAll(tests.TestCase):
824
825 def test_send_with_no_progress(self):
826 # See https://bugs.launchpad.net/bzr/+bug/1047309
827 # It seems that paramiko can get into a state where it doesn't error,
828 # but it returns 0 bytes sent for requests over and over again.
829 class NoSendingSocket(object):
830 def __init__(self):
831 self.call_count = 0
832 def send(self, bytes):
833 self.call_count += 1
834 if self.call_count > 100:
835 # Prevent the test suite from hanging
836 raise RuntimeError('too many calls')
837 return 0
838 sock = NoSendingSocket()
839 self.assertRaises(errors.ConnectionReset,
840 osutils.send_all, sock, 'content')
841 self.assertEqual(1, sock.call_count)
842
843
822class TestPosixFuncs(tests.TestCase):844class TestPosixFuncs(tests.TestCase):
823 """Test that the posix version of normpath returns an appropriate path845 """Test that the posix version of normpath returns an appropriate path
824 when used with 2 leading slashes."""846 when used with 2 leading slashes."""
825847
=== modified file 'doc/en/release-notes/bzr-2.5.txt'
--- doc/en/release-notes/bzr-2.5.txt 2012-09-06 12:05:27 +0000
+++ doc/en/release-notes/bzr-2.5.txt 2012-09-11 07:29:18 +0000
@@ -39,6 +39,13 @@
39 extracted texts from the repository. (Just an ordering constraint on how39 extracted texts from the repository. (Just an ordering constraint on how
40 they consumed the stream.) (John Arbash Meinel, #1046284)40 they consumed the stream.) (John Arbash Meinel, #1046284)
4141
42* ``osutils.send_all`` now detects if we get a series of zero bytes sent,
43 and fails with a ECONNRESET. It seems if paramiko gets disconnected, it
44 will get into a state where it returns 0 bytes sent, but doesn't raise
45 an error. This change allows us to get a couple hiccups of no content
46 sent, but if it is consistent, we will consider it to be a failure.
47 (John Arbash Meinel, #1047309)
48
42* Revert use of --no-tty when gpg signing commits. (Jelmer Vernooij, #1014570)49* Revert use of --no-tty when gpg signing commits. (Jelmer Vernooij, #1014570)
4350
44* Some small bug fixes wrt lightweight checkouts and remote repositories.51* Some small bug fixes wrt lightweight checkouts and remote repositories.

Subscribers

People subscribed via source and target branches