Merge lp:~spiv/bzr/no_until_no_eintr into lp:bzr

Proposed by Andrew Bennetts
Status: Merged
Approved by: Martin Pool
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~spiv/bzr/no_until_no_eintr
Merge into: lp:bzr
Diff against target: 424 lines (+126/-66)
5 files modified
NEWS (+10/-3)
bzrlib/osutils.py (+77/-18)
bzrlib/smart/medium.py (+30/-44)
bzrlib/smart/protocol.py (+7/-1)
bzrlib/smart/server.py (+2/-0)
To merge this branch: bzr merge lp:~spiv/bzr/no_until_no_eintr
Reviewer Review Type Date Requested Status
Martin Packman (community) Needs Information
Martin Pool Approve
Review via email: mp+21699@code.launchpad.net

Commit message

(andrew) Either correctly handle EINTR or don't handle it at all. (#496813)

Description of the change

This change removes the unsafe uses of until_no_eintr, based on the bug reports and patches from the very patient Martin <gz>. Unlike Martin <gz>'s original patch it doesn't remove the until_no_eintr helper entirely, because it is safe (and useful) to use for reading from a pipe.

(In other changes we've already reduced the number of signal handlers we install that will trigger EINTR, but it can still occur, so, when practical, we may as well handle it correctly rather than fail.)

The most alarming diff hunk is:

- return '\x01'.join(args) + '\n'
+ joined = '\x01'.join(args) + '\n'
+ if type(joined) is unicode:
+ # XXX: We should fix things so this never happens! -AJB, 04032010.
+ mutter('response args contain unicode, should be only bytes: %r',
+ joined)
+ joined = joined.encode('ascii')
+ return joined

This is a latent bug in the HPSS protocol v1 (and maybe v2) code now exposed by the change to use buffer() in send_all. This change explicitly does the convert-to-ASCII that was being done anyway. v3 is unaffected.

The diff applies cleanly to 2.1 (aside from the NEWS entry, of course), and I'll submit a backport for review once this proposal is accepted.

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

nice

review: Approve
Revision history for this message
Martin Pool (mbp) wrote :

If Martin <gz> gets the chance to review it that would be good, otherwise I suggest you merge.

Revision history for this message
Martin Packman (gz) wrote :

Interesting on the encode tuple unicode issue, I didn't see any issues there in testing. Nit, you've got the ISO 8601 basic format wrong, it's big endian.

Having thought about this a bit more, I'm not mad about a bzrlib.smart.medium focused change. The module is already wrappers around wrappers around wrappers, and it's too hard to work out what's going on. Case in point - as far as I can tell, SmartSSHClientMedium still uses get_filelike_channels which uses an unsafe wrapper for sockets pre-2.7 in _ParamikoSSHConnection and non-EINTR-protected pipes in SSHSubprocess. This means we're not covering codepaths like the one bug 341535 highlighted.

I've been waiting for magic bullet inspiration on a clean fix here, but nothing's struck yet. Perhaps a different approach, like unifying on one lowest-common-denominator wrapper for both sockets and pipe fds and using those throughout would be clearer and mean less copy-code to get subtly wrong in certain versions.

review: Needs Information
Revision history for this message
Andrew Bennetts (spiv) wrote :

Martin [gz] wrote:
[...]
> I've been waiting for magic bullet inspiration on a clean fix here, but
> nothing's struck yet. Perhaps a different approach, like unifying on one
> lowest-common-denominator wrapper for both sockets and pipe fds and using
> those throughout would be clearer and mean less copy-code to get subtly wrong
> in certain versions.

I've been wondering about that approach too. It might simplify transport
activity reporting too.

-Andrew.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Martin [gz] wrote:
> Interesting on the encode tuple unicode issue, I didn't see any issues there
> in testing. Nit, you've got the ISO 8601 basic format wrong, it's big endian.

Heh, fixed :)

> Having thought about this a bit more, I'm not mad about a bzrlib.smart.medium
> focused change. The module is already wrappers around wrappers around
> wrappers, and it's too hard to work out what's going on. Case in point - as
> far as I can tell, SmartSSHClientMedium still uses get_filelike_channels which
> uses an unsafe wrapper for sockets pre-2.7 in _ParamikoSSHConnection and
> non-EINTR-protected pipes in SSHSubprocess. This means we're not covering
> codepaths like the one bug 341535 highlighted.

That's true. However I've just sent this patch to PQM anyway because I think
it's still a clear improvement. Here's my reasoning:

  - I think regressing 341535 is better than risking silent corruption of HPSS data
    (and certainly better than a confusing error caused by corrupted HPSS data),
  - Our SIGWINCH handler uses siginterrupt to avoid triggering EINTR, and
  - bzrlib's only other signal handler is SIGQUIT, which is a developer tool not
    a production feature.

I'll reopen 341535, of course.

Thanks for another thoughtful review!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-03-25 12:33:15 +0000
3+++ NEWS 2010-03-26 04:50:44 +0000
4@@ -86,6 +86,9 @@
5 * Added docstring for ``Tree.iter_changes``
6 (John Arbash Meinel, #304182)
7
8+* Allow additional arguments to
9+ ``RemoteRepository.add_inventory_by_delta()``. (Jelmer Vernooij, #532631)
10+
11 * Allow exporting a single file using ``bzr export``.
12 (Michal Junák, #511987)
13
14@@ -144,6 +147,13 @@
15 * Fix stub sftp test server to call os.getcwdu().
16 (Vincent Ladeuil, #526211, #526353)
17
18+* Many IO operations that returned ``EINTR`` were retried even if it
19+ wasn't safe to do so via careless use of ``until_no_eintr``. Bazaar now
20+ only retries operations that are safe to retry, and in some cases has
21+ switched to operations that can be retried (e.g. ``sock.send`` rather than
22+ ``sock.sendall``).
23+ (Andrew Bennetts, Martin <gzlist@googlemail.com>, #496813)
24+
25 * Path conflicts now support --take-this and --take-other even when a
26 deletion is involved.
27 (Vincent Ladeuil, #531967)
28@@ -163,9 +173,6 @@
29 the debugger won't kill the session.
30 (Martin <gzlist@googlemail.com>, #162502)
31
32-* Allow additional arguments to
33- ``RemoteRepository.add_inventory_by_delta()``. (Jelmer Vernooij, #532631)
34-
35 * Tolerate patches with leading noise in ``bzr-handle-patch``.
36 (Toshio Kuratomi, Martin Pool, #502076)
37
38
39=== modified file 'bzrlib/osutils.py'
40--- bzrlib/osutils.py 2010-03-25 17:04:08 +0000
41+++ bzrlib/osutils.py 2010-03-26 04:50:44 +0000
42@@ -40,6 +40,7 @@
43 rmtree,
44 )
45 import signal
46+import socket
47 import subprocess
48 import tempfile
49 from tempfile import (
50@@ -50,12 +51,16 @@
51 from bzrlib import (
52 cache_utf8,
53 errors,
54+ trace,
55 win32utils,
56- trace,
57 )
58-
59 """)
60
61+from bzrlib.symbol_versioning import (
62+ deprecated_function,
63+ deprecated_in,
64+ )
65+
66 # sha and md5 modules are deprecated in python2.6 but hashlib is available as
67 # of 2.5
68 if sys.version_info < (2, 5):
69@@ -1957,40 +1962,82 @@
70 return socket.gethostname().decode(get_user_encoding())
71
72
73-def recv_all(socket, bytes):
74+# We must not read/write any more than 64k at a time from/to a socket so we
75+# don't risk "no buffer space available" errors on some platforms. Windows in
76+# particular is likely to throw WSAECONNABORTED or WSAENOBUFS if given too much
77+# data at once.
78+MAX_SOCKET_CHUNK = 64 * 1024
79+
80+def read_bytes_from_socket(sock, report_activity=None,
81+ max_read_size=MAX_SOCKET_CHUNK):
82+ """Read up to max_read_size of bytes from sock and notify of progress.
83+
84+ Translates "Connection reset by peer" into file-like EOF (return an
85+ empty string rather than raise an error), and repeats the recv if
86+ interrupted by a signal.
87+ """
88+ while 1:
89+ try:
90+ bytes = sock.recv(max_read_size)
91+ except socket.error, e:
92+ eno = e.args[0]
93+ if eno == getattr(errno, "WSAECONNRESET", errno.ECONNRESET):
94+ # The connection was closed by the other side. Callers expect
95+ # an empty string to signal end-of-stream.
96+ return ""
97+ elif eno == errno.EINTR:
98+ # Retry the interrupted recv.
99+ continue
100+ raise
101+ else:
102+ if report_activity is not None:
103+ report_activity(len(bytes), 'read')
104+ return bytes
105+
106+
107+def recv_all(socket, count):
108 """Receive an exact number of bytes.
109
110 Regular Socket.recv() may return less than the requested number of bytes,
111- dependning on what's in the OS buffer. MSG_WAITALL is not available
112+ depending on what's in the OS buffer. MSG_WAITALL is not available
113 on all platforms, but this should work everywhere. This will return
114 less than the requested amount if the remote end closes.
115
116 This isn't optimized and is intended mostly for use in testing.
117 """
118 b = ''
119- while len(b) < bytes:
120- new = until_no_eintr(socket.recv, bytes - len(b))
121+ while len(b) < count:
122+ new = read_bytes_from_socket(socket, None, count - len(b))
123 if new == '':
124 break # eof
125 b += new
126 return b
127
128
129-def send_all(socket, bytes, report_activity=None):
130+def send_all(sock, bytes, report_activity=None):
131 """Send all bytes on a socket.
132-
133- Regular socket.sendall() can give socket error 10053 on Windows. This
134- implementation sends no more than 64k at a time, which avoids this problem.
135-
136+
137+ Breaks large blocks in smaller chunks to avoid buffering limitations on
138+ some platforms, and catches EINTR which may be thrown if the send is
139+ interrupted by a signal.
140+
141+ This is preferred to socket.sendall(), because it avoids portability bugs
142+ and provides activity reporting.
143+
144 :param report_activity: Call this as bytes are read, see
145 Transport._report_activity
146 """
147- chunk_size = 2**16
148- for pos in xrange(0, len(bytes), chunk_size):
149- block = bytes[pos:pos+chunk_size]
150- if report_activity is not None:
151- report_activity(len(block), 'write')
152- until_no_eintr(socket.sendall, block)
153+ sent_total = 0
154+ byte_count = len(bytes)
155+ while sent_total < byte_count:
156+ try:
157+ sent = sock.send(buffer(bytes, sent_total, MAX_SOCKET_CHUNK))
158+ except socket.error, e:
159+ if e.args[0] != errno.EINTR:
160+ raise
161+ else:
162+ sent_total += sent
163+ report_activity(sent, 'write')
164
165
166 def dereference_path(path):
167@@ -2067,7 +2114,18 @@
168
169
170 def until_no_eintr(f, *a, **kw):
171- """Run f(*a, **kw), retrying if an EINTR error occurs."""
172+ """Run f(*a, **kw), retrying if an EINTR error occurs.
173+
174+ WARNING: you must be certain that it is safe to retry the call repeatedly
175+ if EINTR does occur. This is typically only true for low-level operations
176+ like os.read. If in any doubt, don't use this.
177+
178+ Keep in mind that this is not a complete solution to EINTR. There is
179+ probably code in the Python standard library and other dependencies that
180+ may encounter EINTR if a signal arrives (and there is signal handler for
181+ that signal). So this function can reduce the impact for IO that bzrlib
182+ directly controls, but it is not a complete solution.
183+ """
184 # Borrowed from Twisted's twisted.python.util.untilConcludes function.
185 while True:
186 try:
187@@ -2077,6 +2135,7 @@
188 continue
189 raise
190
191+
192 def re_compile_checked(re_string, flags=0, where=""):
193 """Return a compiled re, or raise a sensible error.
194
195
196=== modified file 'bzrlib/smart/medium.py'
197--- bzrlib/smart/medium.py 2010-02-17 17:11:16 +0000
198+++ bzrlib/smart/medium.py 2010-03-26 04:50:44 +0000
199@@ -24,15 +24,14 @@
200 bzrlib/transport/smart/__init__.py.
201 """
202
203-import errno
204 import os
205-import socket
206 import sys
207 import urllib
208
209 from bzrlib.lazy_import import lazy_import
210 lazy_import(globals(), """
211 import atexit
212+import socket
213 import thread
214 import weakref
215
216@@ -47,14 +46,13 @@
217 from bzrlib.smart import client, protocol, request, vfs
218 from bzrlib.transport import ssh
219 """)
220-#usually already imported, and getting IllegalScoperReplacer on it here.
221 from bzrlib import osutils
222
223-# We must not read any more than 64k at a time so we don't risk "no buffer
224-# space available" errors on some platforms. Windows in particular is likely
225-# to give error 10053 or 10055 if we read more than 64k from a socket.
226-_MAX_READ_SIZE = 64 * 1024
227-
228+# Throughout this module buffer size parameters are either limited to be at
229+# most _MAX_READ_SIZE, or are ignored and _MAX_READ_SIZE is used instead.
230+# For this module's purposes, MAX_SOCKET_CHUNK is a reasonable size for reads
231+# from non-sockets as well.
232+_MAX_READ_SIZE = osutils.MAX_SOCKET_CHUNK
233
234 def _get_protocol_factory_for_bytes(bytes):
235 """Determine the right protocol factory for 'bytes'.
236@@ -276,9 +274,9 @@
237 def _serve_one_request_unguarded(self, protocol):
238 while protocol.next_read_size():
239 # We can safely try to read large chunks. If there is less data
240- # than _MAX_READ_SIZE ready, the socket wil just return a short
241- # read immediately rather than block.
242- bytes = self.read_bytes(_MAX_READ_SIZE)
243+ # than MAX_SOCKET_CHUNK ready, the socket will just return a
244+ # short read immediately rather than block.
245+ bytes = self.read_bytes(osutils.MAX_SOCKET_CHUNK)
246 if bytes == '':
247 self.finished = True
248 return
249@@ -287,13 +285,13 @@
250 self._push_back(protocol.unused_data)
251
252 def _read_bytes(self, desired_count):
253- return _read_bytes_from_socket(
254- self.socket.recv, desired_count, self._report_activity)
255+ return osutils.read_bytes_from_socket(
256+ self.socket, self._report_activity)
257
258 def terminate_due_to_error(self):
259 # TODO: This should log to a server log file, but no such thing
260 # exists yet. Andrew Bennetts 2006-09-29.
261- osutils.until_no_eintr(self.socket.close)
262+ self.socket.close()
263 self.finished = True
264
265 def _write_out(self, bytes):
266@@ -334,27 +332,27 @@
267 bytes_to_read = protocol.next_read_size()
268 if bytes_to_read == 0:
269 # Finished serving this request.
270- osutils.until_no_eintr(self._out.flush)
271+ self._out.flush()
272 return
273 bytes = self.read_bytes(bytes_to_read)
274 if bytes == '':
275 # Connection has been closed.
276 self.finished = True
277- osutils.until_no_eintr(self._out.flush)
278+ self._out.flush()
279 return
280 protocol.accept_bytes(bytes)
281
282 def _read_bytes(self, desired_count):
283- return osutils.until_no_eintr(self._in.read, desired_count)
284+ return self._in.read(desired_count)
285
286 def terminate_due_to_error(self):
287 # TODO: This should log to a server log file, but no such thing
288 # exists yet. Andrew Bennetts 2006-09-29.
289- osutils.until_no_eintr(self._out.close)
290+ self._out.close()
291 self.finished = True
292
293 def _write_out(self, bytes):
294- osutils.until_no_eintr(self._out.write, bytes)
295+ self._out.write(bytes)
296
297
298 class SmartClientMediumRequest(object):
299@@ -711,6 +709,10 @@
300 """A client medium using simple pipes.
301
302 This client does not manage the pipes: it assumes they will always be open.
303+
304+ Note that if readable_pipe.read might raise IOError or OSError with errno
305+ of EINTR, it must be safe to retry the read. Plain CPython fileobjects
306+ (such as used for sys.stdin) are safe.
307 """
308
309 def __init__(self, readable_pipe, writeable_pipe, base):
310@@ -720,12 +722,12 @@
311
312 def _accept_bytes(self, bytes):
313 """See SmartClientStreamMedium.accept_bytes."""
314- osutils.until_no_eintr(self._writeable_pipe.write, bytes)
315+ self._writeable_pipe.write(bytes)
316 self._report_activity(len(bytes), 'write')
317
318 def _flush(self):
319 """See SmartClientStreamMedium._flush()."""
320- osutils.until_no_eintr(self._writeable_pipe.flush)
321+ self._writeable_pipe.flush()
322
323 def _read_bytes(self, count):
324 """See SmartClientStreamMedium._read_bytes."""
325@@ -777,15 +779,15 @@
326 def _accept_bytes(self, bytes):
327 """See SmartClientStreamMedium.accept_bytes."""
328 self._ensure_connection()
329- osutils.until_no_eintr(self._write_to.write, bytes)
330+ self._write_to.write(bytes)
331 self._report_activity(len(bytes), 'write')
332
333 def disconnect(self):
334 """See SmartClientMedium.disconnect()."""
335 if not self._connected:
336 return
337- osutils.until_no_eintr(self._read_from.close)
338- osutils.until_no_eintr(self._write_to.close)
339+ self._read_from.close()
340+ self._write_to.close()
341 self._ssh_connection.close()
342 self._connected = False
343
344@@ -814,7 +816,7 @@
345 if not self._connected:
346 raise errors.MediumNotConnected(self)
347 bytes_to_read = min(count, _MAX_READ_SIZE)
348- bytes = osutils.until_no_eintr(self._read_from.read, bytes_to_read)
349+ bytes = self._read_from.read(bytes_to_read)
350 self._report_activity(len(bytes), 'read')
351 return bytes
352
353@@ -844,7 +846,7 @@
354 """See SmartClientMedium.disconnect()."""
355 if not self._connected:
356 return
357- osutils.until_no_eintr(self._socket.close)
358+ self._socket.close()
359 self._socket = None
360 self._connected = False
361
362@@ -898,8 +900,8 @@
363 """See SmartClientMedium.read_bytes."""
364 if not self._connected:
365 raise errors.MediumNotConnected(self)
366- return _read_bytes_from_socket(
367- self._socket.recv, count, self._report_activity)
368+ return osutils.read_bytes_from_socket(
369+ self._socket, self._report_activity)
370
371
372 class SmartClientStreamMediumRequest(SmartClientMediumRequest):
373@@ -942,19 +944,3 @@
374 self._medium._flush()
375
376
377-def _read_bytes_from_socket(sock, desired_count, report_activity):
378- # We ignore the desired_count because on sockets it's more efficient to
379- # read large chunks (of _MAX_READ_SIZE bytes) at a time.
380- try:
381- bytes = osutils.until_no_eintr(sock, _MAX_READ_SIZE)
382- except socket.error, e:
383- if len(e.args) and e.args[0] in (errno.ECONNRESET, 10054):
384- # The connection was closed by the other side. Callers expect an
385- # empty string to signal end-of-stream.
386- bytes = ''
387- else:
388- raise
389- else:
390- report_activity(len(bytes), 'read')
391- return bytes
392-
393
394=== modified file 'bzrlib/smart/protocol.py'
395--- bzrlib/smart/protocol.py 2010-02-17 17:11:16 +0000
396+++ bzrlib/smart/protocol.py 2010-03-26 04:50:44 +0000
397@@ -62,7 +62,13 @@
398
399 def _encode_tuple(args):
400 """Encode the tuple args to a bytestream."""
401- return '\x01'.join(args) + '\n'
402+ joined = '\x01'.join(args) + '\n'
403+ if type(joined) is unicode:
404+ # XXX: We should fix things so this never happens! -AJB, 20100304
405+ mutter('response args contain unicode, should be only bytes: %r',
406+ joined)
407+ joined = joined.encode('ascii')
408+ return joined
409
410
411 class Requester(object):
412
413=== modified file 'bzrlib/smart/server.py'
414--- bzrlib/smart/server.py 2010-02-23 07:43:11 +0000
415+++ bzrlib/smart/server.py 2010-03-26 04:50:44 +0000
416@@ -138,6 +138,8 @@
417 if e.args[0] != errno.EBADF:
418 trace.warning("listening socket error: %s", e)
419 else:
420+ if self._should_terminate:
421+ break
422 self.serve_conn(conn, thread_name_suffix)
423 except KeyboardInterrupt:
424 # dont log when CTRL-C'd.