Bazaar

Merge lp:~jameinel/bzr/2.4-fdatasync-ENOTSUP-1075108 into lp:bzr/2.4

2.4-fdatasync-ENOTSUP-1075108
Merge into 2.4

Proposed by John A Meinel on 2012-11-05

Status:

Merged

Approved by:

John A Meinel on 2013-05-23

Approved revision:

no longer in the source branch.

Merged at revision:

6075

Proposed branch:

lp:~jameinel/bzr/2.4-fdatasync-ENOTSUP-1075108

Merge into:

lp:bzr/2.4

Prerequisite:

lp:~jameinel/bzr/2.4-overridAttr-non-existant

Diff against target:

120 lines (+66/-1)

3 files modified

bzrlib/osutils.py (+14/-1)
bzrlib/tests/test_osutils.py (+44/-0)
doc/en/release-notes/bzr-2.4.txt (+8/-0)

To merge this branch:

bzr merge lp:~jameinel/bzr/2.4-fdatasync-ENOTSUP-1075108

High

Fix Released

Link a bug report

Reviewer	Date Requested	Status
Vincent Ladeuil		Needs Fixing on 2012-11-08
Martin Packman (community)	2012-11-05	Approve on 2012-11-06
Review via email: mp+132877@code.launchpad.net

Commit message

Bug #1075108, handle when fdatasync returns EOPNOTSUPP, which should be considered a non-fatal error.

Description of the change

This branch is a fix for calling fdatasync() in cases where it would otherwise fail.

We introduced fdatasync() as a way to reduce the window for power failure crashes triggering data loss. However, it appears that if you are on an SSH mounted filesystem, we will get an ENOTSUP error.

This change makes it so that any IOError while calling fdatasync() is just logged quietly rather than failing. Similar to what we do if 'chmod()' fails.

We could be more strict in the type of errors that we suppress, but ENOTSUP doesn't exist on Windows, and it isn't like we would get useful errors. Instead we try fdatasync, but don't worry if it fails, because we used to not call it at all.

This depends on my overrideAttr change, because I wanted to have the tests run even on Windows, where fdatasync doesn't exist. And even just doing lots of 'if getattr()... is None: raise TestNotApplicable' makes the tests hard to read.

I can split it out if we decide not to land the overrideAttr change.

Revision history for this message

Martin Packman (gz) wrote on 2012-11-06:

Seems like the best option. It's unfortunate that this will mask persistent errors such as the one in the bug, ideally we'd want to inform the user that their filesystem doesn't provide data safety rather than just logging repeatedly, but that's hard from a command line tool that is run many times.

review: Approve

Revision history for this message

Vincent Ladeuil (vila) wrote on 2012-11-06:

I have a bad feeling about ignoring errors there, the bug you're fixing is the first occurrence of fdatasync() failing in a very specific context. I'd rather trap only that specific case and keep failing for the others rather than silently ignoring them (the probability that someone notices the mutter() messages in the log is pretty much zero).

Revision history for this message

John A Meinel (jameinel) wrote on 2012-11-06:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/6/2012 12:43 PM, Vincent Ladeuil wrote:
> I have a bad feeling about ignoring errors there, the bug you're
> fixing is the first occurrence of fdatasync() failing in a very
> specific context. I'd rather trap only that specific case and keep
> failing for the others rather than silently ignoring them (the
> probability that someone notices the mutter() messages in the log
> is pretty much zero).
>

We can trap whatever we feel like, but note that we've done roughly
the same thing for chmod. If it fails, it just gets in a user's way,
rather than actually doing something useful for them.

I'm willing to be swayed, and I was hesitant as well. But the failure
mode here is that Dimiter can't use bzr *at all* because of ENOTSUP.
If we fail to fdatasync, we still have written the data.

I'd like to turn it around. Can you come up with a specific errno that
clearly indicates we have a problem that we should stop on? (That
wouldn't have triggered during the write, etc).

I like being cautious, but I also like not preventing someone from
actually getting their work done until we manage to add one more error
code into an exception clause.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCY0JMACgkQJdeBCYSNAAPdEgCfQzNNtCTVW4vGNVRNE6Sl4A6t
c/IAmQEfqz9NDi+HSojfoOKV1x1r70ec
=4eMi
-----END PGP SIGNATURE-----

Revision history for this message

Vincent Ladeuil (vila) wrote on 2012-11-06:

> I like being cautious, but I also like not preventing someone from
> actually getting their work done until we manage to add one more error
> code into an exception clause.

My point was more about a valid fdatasync() error that would reveal a data loss and will be ignored with your patch.

Revision history for this message

John A Meinel (jameinel) wrote on 2012-11-06:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/6/2012 1:59 PM, Vincent Ladeuil wrote:
>> I like being cautious, but I also like not preventing someone
>> from actually getting their work done until we manage to add one
>> more error code into an exception clause.
>
> My point was more about a valid fdatasync() error that would reveal
> a data loss and will be ignored with your patch.
>
>

Sure, if you can come up with an fdatasync() error code that is
clearly a data loss then I'm happy to exclude that one, and even work
harder on whitelisting. But if we get EACCESS, or EAGAIN, or EINTR, or
ENOTSUP, or ... we can just ignore the request as we weren't doing it
in the past either.
We log it in the case we need to do a retrospective. I certainly agree
that people won't see it from the beginning.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCY4nkACgkQJdeBCYSNAAOEogCgzzY5hQE/jcFKfrbyWUtl0SbS
ynEAn0Fgn3aL0MObohHo+yjl6Jrn4+bq
=yCpc
-----END PGP SIGNATURE-----

Revision history for this message

Vincent Ladeuil (vila) wrote on 2012-11-06:

> Sure, if you can come up with an fdatasync() error code that is
> clearly a data loss

Better safe than sorry. Neither of us want to inspect fdatasync() implementation nor guess what could come out from it. What I'm saying is that your patch removes a safe guard around a mechanism added to *add* reliability. If you don't care about it, you may as well remove fdatasync() altogether rather than ignoring what it may report.

fdatasync() has been added in 2.4 and we got a single report that it failed and that's with an ssh mounted fs (a rather edge case since having ssh access our users are better served by bzr+ssh...).

It's a bit thin to suddenly switch to a mode where we ignore *all* errors instead of catching the unusual one.

Revision history for this message

Martin Packman (gz) wrote on 2012-11-06:

The point is if we raise an exception here, the user *is* sorry, not safe. It's similar to the exception on close problem but worse, the best we can really do is log. It would be nice to sensibly report in the UI if fdatasyn is unhappy, but continuing regardless in this case really is the safest option.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2012-11-08:

> The point is if we raise an exception here, the user *is* sorry, not safe.

My point is that if an exception is raised the user will be sorry to not be informed that its data could not be saved on disk. From 'man fdatasync':

ERRORS
<...>

EIO An error occurred during synchronization.
<...>

If data is silently corrupted, the bug reports we'll get won't make sense, making it harder to help the user.

review: Needs Fixing

Revision history for this message

John A Meinel (jameinel) wrote on 2013-05-23:

sent to pqm by email

Revision history for this message

John A Meinel (jameinel) wrote on 2013-05-23:

Per Vincent's request I went with a whitelist of errnos that can be considered non-fatal.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Bazaar Codereview Subscribers

John A Meinel

bzr PQM

 === modified file 'bzrlib/osutils.py'
 --- bzrlib/osutils.py	2011-11-29 20:20:02 +0000
 +++ bzrlib/osutils.py	2013-05-23 08:33:27 +0000
@@ -2522,6 +2522,10 @@
  else:
      is_local_pid_dead = _posix_is_local_pid_dead
++_maybe_ignored = ['EAGAIN', 'EINTR', 'ENOTSUP', 'EOPNOTSUPP', 'EACCES']
++_fdatasync_ignored = [getattr(errno, name) for name in _maybe_ignored
++                      if getattr(errno, name, None) is not None]
++
  def fdatasync(fileno):
      """Flush file contents to disk if possible.
@@ -2531,7 +2535,16 @@
      """
      fn = getattr(os, 'fdatasync', getattr(os, 'fsync', None))
      if fn is not None:
--        fn(fileno)
++        try:
++            fn(fileno)
++        except IOError, e:
++            # See bug #1075108, on some platforms fdatasync exists, but can
++            # raise ENOTSUP. However, we are calling fdatasync to be helpful
++            # and reduce the chance of corruption-on-powerloss situations. It
++            # is not a mandatory call, so it is ok to suppress failures.
++            trace.mutter("ignoring error calling fdatasync: %s" % (e,))
++            if getattr(e, 'errno', None) not in _fdatasync_ignored:
++                raise
  def ensure_empty_directory_exists(path, exception_class):
 === modified file 'bzrlib/tests/test_osutils.py'
 --- bzrlib/tests/test_osutils.py	2011-10-04 18:43:55 +0000
 +++ bzrlib/tests/test_osutils.py	2013-05-23 08:33:27 +0000
@@ -22,6 +22,7 @@
  import re
  import socket
  import sys
++import tempfile
  import time
  from bzrlib import (
@@ -426,6 +427,49 @@
          self.assertTrue(-eighteen_hours < offset < eighteen_hours)
++class TestFdatasync(tests.TestCaseInTempDir):
++
++    def do_fdatasync(self):
++        f = tempfile.NamedTemporaryFile()
++        osutils.fdatasync(f.fileno())
++        f.close()
++
++    @staticmethod
++    def raise_eopnotsupp(*args, **kwargs):
++        raise IOError(errno.EOPNOTSUPP, os.strerror(errno.EOPNOTSUPP))
++
++    @staticmethod
++    def raise_enotsup(*args, **kwargs):
++        raise IOError(errno.ENOTSUP, os.strerror(errno.ENOTSUP))
++
++    def test_fdatasync_handles_system_function(self):
++        self.overrideAttr(os, "fdatasync")
++        self.do_fdatasync()
++
++    def test_fdatasync_handles_no_fdatasync_no_fsync(self):
++        self.overrideAttr(os, "fdatasync")
++        self.overrideAttr(os, "fsync")
++        self.do_fdatasync()
++
++    def test_fdatasync_handles_no_EOPNOTSUPP(self):
++        self.overrideAttr(errno, "EOPNOTSUPP")
++        self.do_fdatasync()
++
++    def test_fdatasync_catches_ENOTSUP(self):
++        enotsup = getattr(errno, "ENOTSUP", None)
++        if enotsup is None:
++            raise tests.TestNotApplicable("No ENOTSUP on this platform")
++        self.overrideAttr(os, "fdatasync", self.raise_enotsup)
++        self.do_fdatasync()
++
++    def test_fdatasync_catches_EOPNOTSUPP(self):
++        enotsup = getattr(errno, "EOPNOTSUPP", None)
++        if enotsup is None:
++            raise tests.TestNotApplicable("No EOPNOTSUPP on this platform")
++        self.overrideAttr(os, "fdatasync", self.raise_eopnotsupp)
++        self.do_fdatasync()
++
++
  class TestLinks(tests.TestCaseInTempDir):
      def test_dereference_path(self):
 === modified file 'doc/en/release-notes/bzr-2.4.txt'
 --- doc/en/release-notes/bzr-2.4.txt	2012-06-08 06:59:50 +0000
 +++ doc/en/release-notes/bzr-2.4.txt	2013-05-23 08:33:27 +0000
@@ -35,6 +35,10 @@
  * Cope with Unix filesystems, such as smbfs, where chmod gives 'permission
    denied'.  (Martin Pool, #606537)
++* Fix a traceback when trying to checkout a tree that also has an entry
++  with file-id `TREE_ROOT` somewhere other than at the root directory.
++  (John Arbash Meinel, #830947)
++
  * When the ``limbo`` or ``pending-deletion`` directories exist, typically
    because of an interrupted tree update, but are empty, bzr no longer
    errors out, because there is nothing for the user to clean up.  Also,
@@ -55,6 +59,10 @@
  * Prevent a traceback being printed to stderr when logging has problems and
    accept utf-8 byte string without breaking. (Martin Packman, #714449)
++* Some filesystems give ``EOPNOTSUPP`` when trying to call ``fdatasync``.
++  This shouldn't be treated as a fatal error.
++  (John Arbash Meinel, #1075108)
++
  * Use ``encoding_type='exact'`` for ``bzr testament`` so that on Windows
    the sha hash of the long testament matches the sha hash in the short
    form. (John Arbash Meinel, #1010339)