Comment 20 for bug 819604

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 819604] Re: when an idle ssh transport is interrupted, bzrlib errors; should reconnect instead

On 10 October 2011 15:58, Andrew Bennetts
<email address hidden> wrote:
> So I'd be tempted to blacklist all non-readonly VFS ops out
> of paranoia.  We shouldn't be issuing many VFS calls anyway ;)

+1

Perhaps it would be good to have a debug flag to control what can be
retried so that we can easily experiment on the real system, and I
think it would definitely be worth having a config option to turn it
off altogether in case it does turn out to bite some people.

I do wonder if we will really hit the case of a disconnect in the
middle of a write request in realistic cases. In the typical
Launchpad thing of it wanting to restart the connection, it should
reply to one request and then close the socket, and so then we will
see the connection close at the end of the previous request. Thus my
suggestion about doing a nonblocking select-for-read immediately
before starting to write. (In fact you could select for read and
write before writing, which in the case the write side is full will
let us wait a little longer for a disconnection.)

Obviously if the connection drops because of a network error we might
be more likely to be in the middle of writing a request or getting a
response.

So, overall, I would not bother about retrying any write operations
that are not certain to be safe, until we see whether basic
reconnection will be ok.