Merge lp:~vlad-lesin/percona-server/5.5-bug801989 into lp:percona-server/5.5

Proposed by Vlad Lesin
Status: Work in progress
Proposed branch: lp:~vlad-lesin/percona-server/5.5-bug801989
Merge into: lp:percona-server/5.5
Diff against target: 64 lines (+13/-5)
2 files modified
Percona-Server/storage/innobase/include/os0file.h (+3/-2)
Percona-Server/storage/innobase/os/os0file.c (+10/-3)
To merge this branch: bzr merge lp:~vlad-lesin/percona-server/5.5-bug801989
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Needs Fixing
Review via email: mp+123403@code.launchpad.net

Description of the change

The https://bugs.launchpad.net/percona-server/+bug/801989 bugfix.

The main idea of this fix is to return "true" on success and "false" otherwise from os_file_set_nocache. That allows to process O_DIRECT flag setting failures. Except that all calls of os_file_set_nocache are wrapped with code which abort execution on failure.

I haven't found the way to test this bug with MTR. The only idea is to use LD_PRELOAD and the library which just returns "false" on "fcntl" and "open" with O_DIRECT. But I'm not sure if such test should be MTR test. It's supposed MTR tests are platform-independed but we need at least to build library and use platform-specific environment for testing.

Jenkins tests: http://jenkins.percona.com/view/PS%205.5/job/percona-server-5.5-param/489/

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Vlad, some comments on the patch:

  - I don't think we should crash the server after failing to set
    O_DIRECT. This way you are essentially resurrecting
    http://bugs.mysql.com/bug.php?id=26662

    The warning is relatively harmless, the InnoDB will just do buffered
    I/O on file systems that do not support O_DIRECT such as tmpfs.

    It only becomes a problem in XtraDB in case of ALL_O_DIRECT, because
    in this case XtraDB does not flush the log files, even if O_DIRECT
    fails (the correctness of not doing fsync() when O_DIRECT succeeds
    is a separate issue).

    So what we should do is to disable ALL_O_DIRECT (i.e. O_DIRECT for
    log files) and switch to O_DIRECT (i.e. O_DIRECT on data files only)
    if setting O_DIRECT on a log file fails. That is, essentially use
    the fix suggested by Yasufumi, but perhaps a cleaner variant,
    i.e. in os_file_create_func() check the value returned by
    os_file_set_nocache() introduced by your patch, and if it failed, do
    srv_unix_file_flush_method = SRV_UNIX_O_DIRECT). This way we will at
    least do fsync() on log files when setting O_DIRECT failed.

  - the bug also affects 5.1. I have updated the bug accordingly

  - after fixing the above, please put description to revision comments
    in addition to MP. Note that it's ok to reference bug numbers by
    their number (e.g. "bug #...") rather than by the full URL. It's
    unambiguous, but shorter

  - to answer your concerns about the test case, I don't think we should
    have a regression test for this fix.

review: Needs Fixing
Revision history for this message
Stewart Smith (stewart) wrote :

Alexey Kopytov <email address hidden> writes:
> - after fixing the above, please put description to revision comments
> in addition to MP. Note that it's ok to reference bug numbers by
> their number (e.g. "bug #...") rather than by the full URL. It's
> unambiguous, but shorter

If the notation lp:1234 is used, then Launchpad will automatically link
to the correct bug number (lp and bzr use bug-tracker:number format)
--
Stewart Smith

Revision history for this message
Alexey Kopytov (akopytov) wrote :

On Mon, 10 Sep 2012 15:56:05 +1000, Stewart Smith wrote:
> Alexey Kopytov <email address hidden> writes:
>> - after fixing the above, please put description to revision
>> comments in addition to MP. Note that it's ok to reference bug
>> numbers by their number (e.g. "bug #...") rather than by the full
>> URL. It's unambiguous, but shorter
>
> If the notation lp:1234 is used, then Launchpad will automatically
> link to the correct bug number (lp and bzr use bug-tracker:number
> format)
>

Same goes for "bug #1234", "bug 1234" and "bug1234".

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

> So what we should do is to disable ALL_O_DIRECT (i.e. O_DIRECT for
> log files) and switch to O_DIRECT (i.e. O_DIRECT on data files only)
> if setting O_DIRECT on a log file fails. That is, essentially use
> the fix suggested by Yasufumi, but perhaps a cleaner variant,
> i.e. in os_file_create_func() check the value returned by
> os_file_set_nocache() introduced by your patch, and if it failed, do
> srv_unix_file_flush_method = SRV_UNIX_O_DIRECT). This way we will at
> least do fsync() on log files when setting O_DIRECT failed.
Done.
> - the bug also affects 5.1. I have updated the bug accordingly
It will be separate MP.

> - after fixing the above, please put description to revision comments
> in addition to MP. Note that it's ok to reference bug numbers by
> their number (e.g. "bug #...") rather than by the full URL. It's
> unambiguous, but shorter
Done.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

On 12.09.12 1:30, Vladislav Lesin wrote:
>> - the bug also affects 5.1. I have updated the bug accordingly
> It will be separate MP.
>

Well, in that case this MP must be a merge revision from a 5.1-based
tree. This is the way we implement fixes in both branches: the revision
is first committed to the lowest version branch and then merged upwards.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Also the change around the "type != OS_LOG_FILE && srv_unix_file_flush_method == SRV_UNIX_O_DIRECT" case doesn't really change anything in behavior. Do we really need to touch the code there just to add a comment and an unnecessary typecast?

review: Needs Fixing
Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

> Well, in that case this MP must be a merge revision from a 5.1-based
> tree. This is the way we implement fixes in both branches: the revision
> is first committed to the lowest version branch and then merged upwards.
The MP for 5.1 is done.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

> Also the change around the "type != OS_LOG_FILE && srv_unix_file_flush_method
> == SRV_UNIX_O_DIRECT" case doesn't really change anything in behavior. Do we
> really need to touch the code there just to add a comment and an unnecessary
> typecast?
The comment and typecast are removed.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Vlad,

On 13.09.12 11:45, Vladislav Lesin wrote:
>> Well, in that case this MP must be a merge revision from a 5.1-based
>> tree. This is the way we implement fixes in both branches: the revision
>> is first committed to the lowest version branch and then merged upwards.
> The MP for 5.1 is done.
>

You misunderstood me. You should not commit a separate revision for 5.5,
you should merge your revision from the 5.1 branch. Since our trees are
currently not merged, you can use a GCA 5.1 clone to commit your 5.1
revision, then you can merge it to 5.5.

Unmerged revisions

297. By Vlad Lesin

The bug #801989 fix.

The main idea of this fix is to return "true" on success and "false" otherwise
from os_file_set_nocache(). That allows to process O_DIRECT flag setting failures.

If setting O_DIRECT failes on log files the srv_unix_file_flush_method is set
to SRV_UNIX_O_DIRECT value to do fsync() on log files. If O_DIRECT fails on
other files the error is ignored to do buffered I/O on filesystems that do not
support direct I/O(for example tmpfs).

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'Percona-Server/storage/innobase/include/os0file.h'
--- Percona-Server/storage/innobase/include/os0file.h 2012-08-07 06:10:00 +0000
+++ Percona-Server/storage/innobase/include/os0file.h 2012-09-13 07:44:07 +0000
@@ -503,9 +503,10 @@
503 used by a backup program reading the file */503 used by a backup program reading the file */
504 ibool* success);/*!< out: TRUE if succeed, FALSE if error */504 ibool* success);/*!< out: TRUE if succeed, FALSE if error */
505/****************************************************************//**505/****************************************************************//**
506Tries to disable OS caching on an opened file descriptor. */506Tries to disable OS caching on an opened file descriptor.
507@return TRUE if operation is success and FALSE otherwise */
507UNIV_INTERN508UNIV_INTERN
508void509ibool
509os_file_set_nocache(510os_file_set_nocache(
510/*================*/511/*================*/
511 int fd, /*!< in: file descriptor to alter */512 int fd, /*!< in: file descriptor to alter */
512513
=== modified file 'Percona-Server/storage/innobase/os/os0file.c'
--- Percona-Server/storage/innobase/os/os0file.c 2012-08-07 06:10:00 +0000
+++ Percona-Server/storage/innobase/os/os0file.c 2012-09-13 07:44:07 +0000
@@ -1321,9 +1321,10 @@
1321}1321}
13221322
1323/****************************************************************//**1323/****************************************************************//**
1324Tries to disable OS caching on an opened file descriptor. */1324Tries to disable OS caching on an opened file descriptor.
1325@return TRUE if operation is success and FALSE otherwise */
1325UNIV_INTERN1326UNIV_INTERN
1326void1327ibool
1327os_file_set_nocache(1328os_file_set_nocache(
1328/*================*/1329/*================*/
1329 int fd /*!< in: file descriptor to alter */1330 int fd /*!< in: file descriptor to alter */
@@ -1344,6 +1345,7 @@
1344 " InnoDB: Failed to set DIRECTIO_ON "1345 " InnoDB: Failed to set DIRECTIO_ON "
1345 "on file %s: %s: %s, continuing anyway\n",1346 "on file %s: %s: %s, continuing anyway\n",
1346 file_name, operation_name, strerror(errno_save));1347 file_name, operation_name, strerror(errno_save));
1348 return FALSE;
1347 }1349 }
1348#elif defined(O_DIRECT)1350#elif defined(O_DIRECT)
1349 if (fcntl(fd, F_SETFL, O_DIRECT) == -1) {1351 if (fcntl(fd, F_SETFL, O_DIRECT) == -1) {
@@ -1361,8 +1363,10 @@
1361 "'Invalid argument' on Linux on tmpfs, "1363 "'Invalid argument' on Linux on tmpfs, "
1362 "see MySQL Bug#26662\n");1364 "see MySQL Bug#26662\n");
1363 }1365 }
1366 return FALSE;
1364 }1367 }
1365#endif1368#endif
1369 return TRUE;
1366}1370}
13671371
1368/****************************************************************//**1372/****************************************************************//**
@@ -1593,7 +1597,10 @@
15931597
1594 /* ALL_O_DIRECT: O_DIRECT also for transaction log file */1598 /* ALL_O_DIRECT: O_DIRECT also for transaction log file */
1595 if (srv_unix_file_flush_method == SRV_UNIX_ALL_O_DIRECT) {1599 if (srv_unix_file_flush_method == SRV_UNIX_ALL_O_DIRECT) {
1596 os_file_set_nocache(file, name, mode_str);1600 /* Do fsync() on log files when setting O_DIRECT fails.
1601 See log_io_complete() */
1602 if (!os_file_set_nocache(file, name, mode_str))
1603 srv_unix_file_flush_method = SRV_UNIX_O_DIRECT;
1597 }1604 }
15981605
1599#ifdef USE_FILE_LOCK1606#ifdef USE_FILE_LOCK

Subscribers

People subscribed via source and target branches