Merge lp:~jameinel/bzr/2.4-fdatasync-ENOTSUP-1075108 into lp:bzr/2.4

Proposed by John A Meinel
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 6075
Proposed branch: lp:~jameinel/bzr/2.4-fdatasync-ENOTSUP-1075108
Merge into: lp:bzr/2.4
Prerequisite: lp:~jameinel/bzr/2.4-overridAttr-non-existant
Diff against target: 120 lines (+66/-1)
3 files modified
bzrlib/osutils.py (+14/-1)
bzrlib/tests/test_osutils.py (+44/-0)
doc/en/release-notes/bzr-2.4.txt (+8/-0)
To merge this branch: bzr merge lp:~jameinel/bzr/2.4-fdatasync-ENOTSUP-1075108
Reviewer Review Type Date Requested Status
Vincent Ladeuil Needs Fixing
Martin Packman (community) Approve
Review via email: mp+132877@code.launchpad.net

Commit message

Bug #1075108, handle when fdatasync returns EOPNOTSUPP, which should be considered a non-fatal error.

Description of the change

This branch is a fix for calling fdatasync() in cases where it would otherwise fail.

We introduced fdatasync() as a way to reduce the window for power failure crashes triggering data loss. However, it appears that if you are on an SSH mounted filesystem, we will get an ENOTSUP error.

This change makes it so that any IOError while calling fdatasync() is just logged quietly rather than failing. Similar to what we do if 'chmod()' fails.

We could be more strict in the type of errors that we suppress, but ENOTSUP doesn't exist on Windows, and it isn't like we would get useful errors. Instead we try fdatasync, but don't worry if it fails, because we used to not call it at all.

This depends on my overrideAttr change, because I wanted to have the tests run even on Windows, where fdatasync doesn't exist. And even just doing lots of 'if getattr()... is None: raise TestNotApplicable' makes the tests hard to read.

I can split it out if we decide not to land the overrideAttr change.

To post a comment you must log in.
Revision history for this message
Martin Packman (gz) wrote :

Seems like the best option. It's unfortunate that this will mask persistent errors such as the one in the bug, ideally we'd want to inform the user that their filesystem doesn't provide data safety rather than just logging repeatedly, but that's hard from a command line tool that is run many times.

review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

I have a bad feeling about ignoring errors there, the bug you're fixing is the first occurrence of fdatasync() failing in a very specific context. I'd rather trap only that specific case and keep failing for the others rather than silently ignoring them (the probability that someone notices the mutter() messages in the log is pretty much zero).

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/6/2012 12:43 PM, Vincent Ladeuil wrote:
> I have a bad feeling about ignoring errors there, the bug you're
> fixing is the first occurrence of fdatasync() failing in a very
> specific context. I'd rather trap only that specific case and keep
> failing for the others rather than silently ignoring them (the
> probability that someone notices the mutter() messages in the log
> is pretty much zero).
>

We can trap whatever we feel like, but note that we've done roughly
the same thing for chmod. If it fails, it just gets in a user's way,
rather than actually doing something useful for them.

I'm willing to be swayed, and I was hesitant as well. But the failure
mode here is that Dimiter can't use bzr *at all* because of ENOTSUP.
If we fail to fdatasync, we still have written the data.

I'd like to turn it around. Can you come up with a specific errno that
clearly indicates we have a problem that we should stop on? (That
wouldn't have triggered during the write, etc).

I like being cautious, but I also like not preventing someone from
actually getting their work done until we manage to add one more error
code into an exception clause.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCY0JMACgkQJdeBCYSNAAPdEgCfQzNNtCTVW4vGNVRNE6Sl4A6t
c/IAmQEfqz9NDi+HSojfoOKV1x1r70ec
=4eMi
-----END PGP SIGNATURE-----

Revision history for this message
Vincent Ladeuil (vila) wrote :

> I like being cautious, but I also like not preventing someone from
> actually getting their work done until we manage to add one more error
> code into an exception clause.

My point was more about a valid fdatasync() error that would reveal a data loss and will be ignored with your patch.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/6/2012 1:59 PM, Vincent Ladeuil wrote:
>> I like being cautious, but I also like not preventing someone
>> from actually getting their work done until we manage to add one
>> more error code into an exception clause.
>
> My point was more about a valid fdatasync() error that would reveal
> a data loss and will be ignored with your patch.
>
>

Sure, if you can come up with an fdatasync() error code that is
clearly a data loss then I'm happy to exclude that one, and even work
harder on whitelisting. But if we get EACCESS, or EAGAIN, or EINTR, or
ENOTSUP, or ... we can just ignore the request as we weren't doing it
in the past either.
We log it in the case we need to do a retrospective. I certainly agree
that people won't see it from the beginning.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCY4nkACgkQJdeBCYSNAAOEogCgzzY5hQE/jcFKfrbyWUtl0SbS
ynEAn0Fgn3aL0MObohHo+yjl6Jrn4+bq
=yCpc
-----END PGP SIGNATURE-----

Revision history for this message
Vincent Ladeuil (vila) wrote :

> Sure, if you can come up with an fdatasync() error code that is
> clearly a data loss

Better safe than sorry. Neither of us want to inspect fdatasync() implementation nor guess what could come out from it. What I'm saying is that your patch removes a safe guard around a mechanism added to *add* reliability. If you don't care about it, you may as well remove fdatasync() altogether rather than ignoring what it may report.

fdatasync() has been added in 2.4 and we got a single report that it failed and that's with an ssh mounted fs (a rather edge case since having ssh access our users are better served by bzr+ssh...).

It's a bit thin to suddenly switch to a mode where we ignore *all* errors instead of catching the unusual one.

Revision history for this message
Martin Packman (gz) wrote :

The point is if we raise an exception here, the user *is* sorry, not safe. It's similar to the exception on close problem but worse, the best we can really do is log. It would be nice to sensibly report in the UI if fdatasyn is unhappy, but continuing regardless in this case really is the safest option.

Revision history for this message
Vincent Ladeuil (vila) wrote :

> The point is if we raise an exception here, the user *is* sorry, not safe.

My point is that if an exception is raised the user will be sorry to not be informed that its data could not be saved on disk. From 'man fdatasync':

ERRORS
 <...>

       EIO An error occurred during synchronization.
<...>

If data is silently corrupted, the bug reports we'll get won't make sense, making it harder to help the user.

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote :

sent to pqm by email

Revision history for this message
John A Meinel (jameinel) wrote :

Per Vincent's request I went with a whitelist of errnos that can be considered non-fatal.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/osutils.py'
--- bzrlib/osutils.py 2011-11-29 20:20:02 +0000
+++ bzrlib/osutils.py 2013-05-23 08:33:27 +0000
@@ -2522,6 +2522,10 @@
2522else:2522else:
2523 is_local_pid_dead = _posix_is_local_pid_dead2523 is_local_pid_dead = _posix_is_local_pid_dead
25242524
2525_maybe_ignored = ['EAGAIN', 'EINTR', 'ENOTSUP', 'EOPNOTSUPP', 'EACCES']
2526_fdatasync_ignored = [getattr(errno, name) for name in _maybe_ignored
2527 if getattr(errno, name, None) is not None]
2528
25252529
2526def fdatasync(fileno):2530def fdatasync(fileno):
2527 """Flush file contents to disk if possible.2531 """Flush file contents to disk if possible.
@@ -2531,7 +2535,16 @@
2531 """2535 """
2532 fn = getattr(os, 'fdatasync', getattr(os, 'fsync', None))2536 fn = getattr(os, 'fdatasync', getattr(os, 'fsync', None))
2533 if fn is not None:2537 if fn is not None:
2534 fn(fileno)2538 try:
2539 fn(fileno)
2540 except IOError, e:
2541 # See bug #1075108, on some platforms fdatasync exists, but can
2542 # raise ENOTSUP. However, we are calling fdatasync to be helpful
2543 # and reduce the chance of corruption-on-powerloss situations. It
2544 # is not a mandatory call, so it is ok to suppress failures.
2545 trace.mutter("ignoring error calling fdatasync: %s" % (e,))
2546 if getattr(e, 'errno', None) not in _fdatasync_ignored:
2547 raise
25352548
25362549
2537def ensure_empty_directory_exists(path, exception_class):2550def ensure_empty_directory_exists(path, exception_class):
25382551
=== modified file 'bzrlib/tests/test_osutils.py'
--- bzrlib/tests/test_osutils.py 2011-10-04 18:43:55 +0000
+++ bzrlib/tests/test_osutils.py 2013-05-23 08:33:27 +0000
@@ -22,6 +22,7 @@
22import re22import re
23import socket23import socket
24import sys24import sys
25import tempfile
25import time26import time
2627
27from bzrlib import (28from bzrlib import (
@@ -426,6 +427,49 @@
426 self.assertTrue(-eighteen_hours < offset < eighteen_hours)427 self.assertTrue(-eighteen_hours < offset < eighteen_hours)
427428
428429
430class TestFdatasync(tests.TestCaseInTempDir):
431
432 def do_fdatasync(self):
433 f = tempfile.NamedTemporaryFile()
434 osutils.fdatasync(f.fileno())
435 f.close()
436
437 @staticmethod
438 def raise_eopnotsupp(*args, **kwargs):
439 raise IOError(errno.EOPNOTSUPP, os.strerror(errno.EOPNOTSUPP))
440
441 @staticmethod
442 def raise_enotsup(*args, **kwargs):
443 raise IOError(errno.ENOTSUP, os.strerror(errno.ENOTSUP))
444
445 def test_fdatasync_handles_system_function(self):
446 self.overrideAttr(os, "fdatasync")
447 self.do_fdatasync()
448
449 def test_fdatasync_handles_no_fdatasync_no_fsync(self):
450 self.overrideAttr(os, "fdatasync")
451 self.overrideAttr(os, "fsync")
452 self.do_fdatasync()
453
454 def test_fdatasync_handles_no_EOPNOTSUPP(self):
455 self.overrideAttr(errno, "EOPNOTSUPP")
456 self.do_fdatasync()
457
458 def test_fdatasync_catches_ENOTSUP(self):
459 enotsup = getattr(errno, "ENOTSUP", None)
460 if enotsup is None:
461 raise tests.TestNotApplicable("No ENOTSUP on this platform")
462 self.overrideAttr(os, "fdatasync", self.raise_enotsup)
463 self.do_fdatasync()
464
465 def test_fdatasync_catches_EOPNOTSUPP(self):
466 enotsup = getattr(errno, "EOPNOTSUPP", None)
467 if enotsup is None:
468 raise tests.TestNotApplicable("No EOPNOTSUPP on this platform")
469 self.overrideAttr(os, "fdatasync", self.raise_eopnotsupp)
470 self.do_fdatasync()
471
472
429class TestLinks(tests.TestCaseInTempDir):473class TestLinks(tests.TestCaseInTempDir):
430474
431 def test_dereference_path(self):475 def test_dereference_path(self):
432476
=== modified file 'doc/en/release-notes/bzr-2.4.txt'
--- doc/en/release-notes/bzr-2.4.txt 2012-06-08 06:59:50 +0000
+++ doc/en/release-notes/bzr-2.4.txt 2013-05-23 08:33:27 +0000
@@ -35,6 +35,10 @@
35* Cope with Unix filesystems, such as smbfs, where chmod gives 'permission35* Cope with Unix filesystems, such as smbfs, where chmod gives 'permission
36 denied'. (Martin Pool, #606537)36 denied'. (Martin Pool, #606537)
3737
38* Fix a traceback when trying to checkout a tree that also has an entry
39 with file-id `TREE_ROOT` somewhere other than at the root directory.
40 (John Arbash Meinel, #830947)
41
38* When the ``limbo`` or ``pending-deletion`` directories exist, typically42* When the ``limbo`` or ``pending-deletion`` directories exist, typically
39 because of an interrupted tree update, but are empty, bzr no longer43 because of an interrupted tree update, but are empty, bzr no longer
40 errors out, because there is nothing for the user to clean up. Also,44 errors out, because there is nothing for the user to clean up. Also,
@@ -55,6 +59,10 @@
55* Prevent a traceback being printed to stderr when logging has problems and59* Prevent a traceback being printed to stderr when logging has problems and
56 accept utf-8 byte string without breaking. (Martin Packman, #714449)60 accept utf-8 byte string without breaking. (Martin Packman, #714449)
5761
62* Some filesystems give ``EOPNOTSUPP`` when trying to call ``fdatasync``.
63 This shouldn't be treated as a fatal error.
64 (John Arbash Meinel, #1075108)
65
58* Use ``encoding_type='exact'`` for ``bzr testament`` so that on Windows66* Use ``encoding_type='exact'`` for ``bzr testament`` so that on Windows
59 the sha hash of the long testament matches the sha hash in the short67 the sha hash of the long testament matches the sha hash in the short
60 form. (John Arbash Meinel, #1010339)68 form. (John Arbash Meinel, #1010339)

Subscribers

People subscribed via source and target branches