Comment 18 for bug 819604

Revision history for this message
Andrew Bennetts (spiv) wrote : Re: [Bug 819604] Re: when an idle ssh transport is interrupted, bzrlib errors; should reconnect instead

On Sat, Oct 08, 2011 at 11:27:13AM -0000, John A Meinel wrote:
[...]
> 'delete', 'bzrlib.smart.vfs', 'DeleteRequest')
> safe, will fail if first succeeded

I worry a little that repeating this (and other destructive VFS ops like move)
might be bad in a case like the “simultaneous identical pull” case that the
Twisted buildbot encountered. e.g.:

 * connection A tries to delete (or move to obsolete) some .pack or .?ix
   file, it succeeds but the network connection was lost so the success
   was not reported to the client
 * connection B, re-adds that file (I think this is possible under
   the right conditions)
 * connection A reconnects and retries the delete, and delete's B's
   file from under it.
 * now pack-names written by B references a missing file

I expect this case is rare, and perhaps is a risk we already run on some
network filesystems with unusual consistency guarantees, but I think
it's a real risk. After all I'm not sure any common network filesystems
have this particular kind of behaviour. I'm not sure if it's worth
avoiding retries for, but I think it is worth thinking carefully about
just in caseexpect this case is rare, and perhaps is a risk we already
run on some network filesystems with unusual consistency guarantees, but
I think it's a real risk. I'm not sure if it's worth avoiding retries
for, but I think it is worth thinking carefully about just in case.

Basically, we assume some kinds of atomicity & consistency from our
filesystem operations, and these retries possibly erode those
properties. So I'd be tempted to blacklist all non-readonly VFS ops out
of paranoia. We shouldn't be issuing many VFS calls anyway ;)