Merge lp:~laurynas-biveinis/percona-server/atomic-fio-5.5 into lp:percona-server/5.5

Proposed by Laurynas Biveinis
Status: Merged
Approved by: Stewart Smith
Approved revision: no longer in the source branch.
Merged at revision: 518
Proposed branch: lp:~laurynas-biveinis/percona-server/atomic-fio-5.5
Merge into: lp:percona-server/5.5
Diff against target: 333 lines (+199/-0)
9 files modified
Percona-Server/mysql-test/r/percona_server_variables_debug.result (+1/-0)
Percona-Server/mysql-test/r/percona_server_variables_release.result (+1/-0)
Percona-Server/mysql-test/suite/sys_vars/r/innodb_use_atomic_writes_basic.result (+30/-0)
Percona-Server/mysql-test/suite/sys_vars/t/innodb_use_atomic_writes_basic.test (+29/-0)
Percona-Server/storage/innobase/fil/fil0fil.c (+24/-0)
Percona-Server/storage/innobase/handler/ha_innodb.cc (+44/-0)
Percona-Server/storage/innobase/include/srv0srv.h (+5/-0)
Percona-Server/storage/innobase/os/os0file.c (+60/-0)
Percona-Server/storage/innobase/srv/srv0srv.c (+5/-0)
To merge this branch: bzr merge lp:~laurynas-biveinis/percona-server/atomic-fio-5.5
Reviewer Review Type Date Requested Status
Stewart Smith (community) Approve
Alexey Kopytov (community) Approve
Review via email: mp+165202@code.launchpad.net

This proposal supersedes a proposal from 2013-05-21.

Description of the change

2nd MP:
- all the review comments addressed;
- complete_io label in fil_extend_space_to_desired_space() moved to skip fil_node_complete_io(). That is required only for the case when the node is extended by writing to it, not posix_fallocate() call.

Jenkins at http://jenkins.percona.com/job/percona-server-5.5-param/742/. Still running, but enough slaves completed for confidence. (This is a low-impact MP for common code path).
Note that while submitted branch is a GCA, the Jenkins-tested one is based on trunk. This is to save a staging test run.

Please review this without a 5.6 MP. That one is ready too, but Jenkins testing is postponed to let the more urgent 5.5 Jenkins jobs proceed.

Implement directFS Fusion I/O atomic writes for 5.5.
https://blueprints.launchpad.net/percona-server/+spec/atomic-writes-beta-5.5

http://jenkins.percona.com/job/percona-server-5.5-param/738/

    Implement atomic write support for Fusion I/O storage with directFS
    file system, implementing blueprint
    https://blueprints.launchpad.net/percona-server/+spec/atomic-writes-beta-5.5

    This implementation is based on MariaDB implementation at
    https://mariadb.atlassian.net/browse/MDEV-4338.

    - Add new InnoDB global, read-only option innodb_use_atomic_writes.
    - If this option is enabled, then at InnoDB initialization disable the
      doublewrite buffer if it's enabled and set file flush method to
      O_DIRECT if it's not O_DIRECT or ALL_O_DIRECT.
    - Add new function os_file_set_atomic_writes() that either enables
      atomic writes on a specified file descriptor if a Fusion
      I/O-specific syscall is available, either fails.
    - Call os_file_set_atomic_writes() from os_file_create() on data files
      if atomic writes are enabled.
    - If atomic writes are enabled and posix_fallocate() is available,
      then work around a directFS bug that atomic files fail beyond
      current EOF by:
      - calling os_file_set_size() from fil_extend_space_to_desired_size();
      - calling posix_fallocate() in os_file_set_size().
    - New variable test sys_vars.innodb_use_atomic_writes_basic, re-record
      percona_server_variables_debug and percona_server_variables_release
      tests.

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Laurynas,

  - fil_extend_space_to_desired_size() will not work correctly for a
    multi-node ibdata1. The existing code extends the last node, but
    takes the size of other nodes into account by calculating offsets
    as via (start_page_no - file_start_page_no).

    But the new srv_use_posix_fallocate code path just extends the last
    node to the total desired size regardless of other node sizes,
    i.e. overallocate space by always assuming a single-node tablespace.

Minor things (since needs recommitting anyway):

   - spurious comment change on lines 324-247
   - incorrect function calls formatting ("func (...)"):

+ if (ioctl (file, DFS_IOCTL_ATOMIC_WRITE_SET, &atomic_option)) {
+ os_file_handle_error_no_exit (name, "ioctl");
+ os_file_handle_error_no_exit (name, "posix_fallocate");

   - the following doesn't really have any effect, but the convention
     there is to assign variables to their default values, which is
     FALSE for innobase_use_atomic_writes:

+static my_bool innobase_use_atomic_writes = TRUE;

     which is also inconsistent with:

+UNIV_INTERN ibool srv_use_atomic_writes = FALSE;

   - and wrong comment formatting:

+ /* Due to a bug in directFS, using atomics needs
+ * posix_fallocate to extend the file
+ * pwrite() past end of the file won't work
+ */

review: Needs Fixing
Revision history for this message
Alexey Kopytov (akopytov) :
review: Approve
Revision history for this message
Stewart Smith (stewart) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'Percona-Server/mysql-test/r/percona_server_variables_debug.result'
--- Percona-Server/mysql-test/r/percona_server_variables_debug.result 2013-05-15 05:05:18 +0000
+++ Percona-Server/mysql-test/r/percona_server_variables_debug.result 2013-05-22 17:19:29 +0000
@@ -178,6 +178,7 @@
178INNODB_TRACK_CHANGED_PAGES178INNODB_TRACK_CHANGED_PAGES
179INNODB_TRX_PURGE_VIEW_UPDATE_ONLY_DEBUG179INNODB_TRX_PURGE_VIEW_UPDATE_ONLY_DEBUG
180INNODB_TRX_RSEG_N_SLOTS_DEBUG180INNODB_TRX_RSEG_N_SLOTS_DEBUG
181INNODB_USE_ATOMIC_WRITES
181INNODB_USE_GLOBAL_FLUSH_LOG_AT_TRX_COMMIT182INNODB_USE_GLOBAL_FLUSH_LOG_AT_TRX_COMMIT
182INNODB_USE_NATIVE_AIO183INNODB_USE_NATIVE_AIO
183INNODB_USE_SYS_MALLOC184INNODB_USE_SYS_MALLOC
184185
=== modified file 'Percona-Server/mysql-test/r/percona_server_variables_release.result'
--- Percona-Server/mysql-test/r/percona_server_variables_release.result 2013-04-04 10:52:07 +0000
+++ Percona-Server/mysql-test/r/percona_server_variables_release.result 2013-05-22 17:19:29 +0000
@@ -170,6 +170,7 @@
170INNODB_THREAD_CONCURRENCY_TIMER_BASED170INNODB_THREAD_CONCURRENCY_TIMER_BASED
171INNODB_THREAD_SLEEP_DELAY171INNODB_THREAD_SLEEP_DELAY
172INNODB_TRACK_CHANGED_PAGES172INNODB_TRACK_CHANGED_PAGES
173INNODB_USE_ATOMIC_WRITES
173INNODB_USE_GLOBAL_FLUSH_LOG_AT_TRX_COMMIT174INNODB_USE_GLOBAL_FLUSH_LOG_AT_TRX_COMMIT
174INNODB_USE_NATIVE_AIO175INNODB_USE_NATIVE_AIO
175INNODB_USE_SYS_MALLOC176INNODB_USE_SYS_MALLOC
176177
=== added file 'Percona-Server/mysql-test/suite/sys_vars/r/innodb_use_atomic_writes_basic.result'
--- Percona-Server/mysql-test/suite/sys_vars/r/innodb_use_atomic_writes_basic.result 1970-01-01 00:00:00 +0000
+++ Percona-Server/mysql-test/suite/sys_vars/r/innodb_use_atomic_writes_basic.result 2013-05-22 17:19:29 +0000
@@ -0,0 +1,30 @@
1SELECT @@global.innodb_use_atomic_writes;
2@@global.innodb_use_atomic_writes
30
4SELECT @@innodb_use_atomic_writes;
5@@innodb_use_atomic_writes
60
7SELECT COUNT(VARIABLE_VALUE) AS should_be_1
8FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES
9WHERE VARIABLE_NAME='innodb_use_atomic_writes';
10should_be_1
111
12SELECT COUNT(VARIABLE_VALUE) AS should_be_1
13FROM INFORMATION_SCHEMA.SESSION_VARIABLES
14WHERE VARIABLE_NAME='innodb_use_atomic_writes';
15should_be_1
161
17SET @@global.innodb_use_atomic_writes=1;
18ERROR HY000: Variable 'innodb_use_atomic_writes' is a read only variable
19SELECT IF(@@global.innodb_use_atomic_writes, "ON", "OFF") = VARIABLE_VALUE AS should_be_1
20FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES
21WHERE VARIABLE_NAME='innodb_use_atomic_writes';
22should_be_1
231
24SELECT @@innodb_use_atomic_writes = @@global.innodb_use_atomic_writes AS should_be_1;
25should_be_1
261
27SELECT @@local.innodb_use_atomic_writes;
28ERROR HY000: Variable 'innodb_use_atomic_writes' is a GLOBAL variable
29SELECT @@session.innodb_use_atomic_writes;
30ERROR HY000: Variable 'innodb_use_atomic_writes' is a GLOBAL variable
031
=== added file 'Percona-Server/mysql-test/suite/sys_vars/t/innodb_use_atomic_writes_basic.test'
--- Percona-Server/mysql-test/suite/sys_vars/t/innodb_use_atomic_writes_basic.test 1970-01-01 00:00:00 +0000
+++ Percona-Server/mysql-test/suite/sys_vars/t/innodb_use_atomic_writes_basic.test 2013-05-22 17:19:29 +0000
@@ -0,0 +1,29 @@
1# Test that innodb_use_atomic_writes is a global read-only variable
2--source include/have_innodb.inc
3
4SELECT @@global.innodb_use_atomic_writes;
5
6SELECT @@innodb_use_atomic_writes;
7
8SELECT COUNT(VARIABLE_VALUE) AS should_be_1
9FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES
10WHERE VARIABLE_NAME='innodb_use_atomic_writes';
11
12SELECT COUNT(VARIABLE_VALUE) AS should_be_1
13FROM INFORMATION_SCHEMA.SESSION_VARIABLES
14WHERE VARIABLE_NAME='innodb_use_atomic_writes';
15
16--error ER_INCORRECT_GLOBAL_LOCAL_VAR
17SET @@global.innodb_use_atomic_writes=1;
18
19SELECT IF(@@global.innodb_use_atomic_writes, "ON", "OFF") = VARIABLE_VALUE AS should_be_1
20FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES
21WHERE VARIABLE_NAME='innodb_use_atomic_writes';
22
23SELECT @@innodb_use_atomic_writes = @@global.innodb_use_atomic_writes AS should_be_1;
24
25--error ER_INCORRECT_GLOBAL_LOCAL_VAR
26SELECT @@local.innodb_use_atomic_writes;
27
28--error ER_INCORRECT_GLOBAL_LOCAL_VAR
29SELECT @@session.innodb_use_atomic_writes;
030
=== modified file 'Percona-Server/storage/innobase/fil/fil0fil.c'
--- Percona-Server/storage/innobase/fil/fil0fil.c 2013-04-04 10:50:07 +0000
+++ Percona-Server/storage/innobase/fil/fil0fil.c 2013-05-22 17:19:29 +0000
@@ -4861,6 +4861,26 @@
4861 start_page_no = space->size;4861 start_page_no = space->size;
4862 file_start_page_no = space->size - node->size;4862 file_start_page_no = space->size - node->size;
48634863
4864#ifdef HAVE_POSIX_FALLOCATE
4865 if (srv_use_posix_fallocate) {
4866 offset_high = (size_after_extend - file_start_page_no)
4867 * page_size / (4ULL * 1024 * 1024 * 1024);
4868 offset_low = (size_after_extend - file_start_page_no)
4869 * page_size % (4ULL * 1024 * 1024 * 1024);
4870
4871 mutex_exit(&fil_system->mutex);
4872 success = os_file_set_size(node->name, node->handle,
4873 offset_low, offset_high);
4874 mutex_enter(&fil_system->mutex);
4875 if (success) {
4876 node->size += (size_after_extend - start_page_no);
4877 space->size += (size_after_extend - start_page_no);
4878 os_has_said_disk_full = FALSE;
4879 }
4880 goto complete_io;
4881 }
4882#endif
4883
4864 /* Extend at most 64 pages at a time */4884 /* Extend at most 64 pages at a time */
4865 buf_size = ut_min(64, size_after_extend - start_page_no) * page_size;4885 buf_size = ut_min(64, size_after_extend - start_page_no) * page_size;
4866 buf2 = mem_alloc(buf_size + page_size);4886 buf2 = mem_alloc(buf_size + page_size);
@@ -4919,6 +4939,10 @@
49194939
4920 fil_node_complete_io(node, fil_system, OS_FILE_WRITE);4940 fil_node_complete_io(node, fil_system, OS_FILE_WRITE);
49214941
4942#ifdef HAVE_POSIX_FALLOCATE
4943complete_io:
4944#endif
4945
4922 *actual_size = space->size;4946 *actual_size = space->size;
49234947
4924#ifndef UNIV_HOTBACKUP4948#ifndef UNIV_HOTBACKUP
49254949
=== modified file 'Percona-Server/storage/innobase/handler/ha_innodb.cc'
--- Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-05-21 16:37:02 +0000
+++ Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-05-22 17:19:29 +0000
@@ -186,6 +186,7 @@
186static my_bool innobase_log_archive = FALSE;186static my_bool innobase_log_archive = FALSE;
187static char* innobase_log_arch_dir = NULL;187static char* innobase_log_arch_dir = NULL;
188#endif /* UNIV_LOG_ARCHIVE */188#endif /* UNIV_LOG_ARCHIVE */
189static my_bool innobase_use_atomic_writes = FALSE;
189static my_bool innobase_use_doublewrite = TRUE;190static my_bool innobase_use_doublewrite = TRUE;
190static my_bool innobase_use_checksums = TRUE;191static my_bool innobase_use_checksums = TRUE;
191static my_bool innobase_fast_checksum = FALSE;192static my_bool innobase_fast_checksum = FALSE;
@@ -3118,6 +3119,39 @@
3118 srv_kill_idle_transaction = 0;3119 srv_kill_idle_transaction = 0;
3119#endif3120#endif
31203121
3122 srv_use_atomic_writes = (ibool) innobase_use_atomic_writes;
3123 if (innobase_use_atomic_writes) {
3124 fprintf(stderr, "InnoDB: using atomic writes.\n");
3125
3126 /* Force doublewrite buffer off, atomic writes replace it. */
3127 if (srv_use_doublewrite_buf) {
3128 fprintf(stderr,
3129 "InnoDB: Switching off doublewrite buffer "
3130 "because of atomic writes.\n");
3131 innobase_use_doublewrite = FALSE;
3132 srv_use_doublewrite_buf = FALSE;
3133 }
3134
3135 /* Force O_DIRECT on Unixes (on Windows writes are always
3136 unbuffered)*/
3137#ifndef _WIN32
3138 if(!innobase_file_flush_method ||
3139 !strstr(innobase_file_flush_method, "O_DIRECT")) {
3140 innobase_file_flush_method =
3141 srv_file_flush_method_str = (char*)"O_DIRECT";
3142 fprintf(stderr,
3143 "InnoDB: using O_DIRECT due to atomic "
3144 "writes.\n");
3145 }
3146#endif
3147#ifdef HAVE_POSIX_FALLOCATE
3148 /* Due to a bug in directFS, using atomics needs
3149 posix_fallocate() to extend the file, because pwrite() past the
3150 end of the file won't work */
3151 srv_use_posix_fallocate = TRUE;
3152#endif
3153 }
3154
3121#ifdef HAVE_PSI_INTERFACE3155#ifdef HAVE_PSI_INTERFACE
3122 /* Register keys with MySQL performance schema */3156 /* Register keys with MySQL performance schema */
3123 if (PSI_server) {3157 if (PSI_server) {
@@ -12526,6 +12560,15 @@
12526 "Disable with --skip-innodb-doublewrite.",12560 "Disable with --skip-innodb-doublewrite.",
12527 NULL, NULL, TRUE);12561 NULL, NULL, TRUE);
1252812562
12563static MYSQL_SYSVAR_BOOL(use_atomic_writes, innobase_use_atomic_writes,
12564 PLUGIN_VAR_NOCMDARG | PLUGIN_VAR_READONLY,
12565 "Prevent partial page writes, via atomic writes (beta). "
12566 "The option is used to prevent partial writes in case of a crash/poweroff, "
12567 "as faster alternative to doublewrite buffer. "
12568 "Currently this option works only "
12569 "on Linux only with FusionIO device, and directFS filesystem.",
12570 NULL, NULL, FALSE);
12571
12529static MYSQL_SYSVAR_ULONG(io_capacity, srv_io_capacity,12572static MYSQL_SYSVAR_ULONG(io_capacity, srv_io_capacity,
12530 PLUGIN_VAR_RQCMDARG,12573 PLUGIN_VAR_RQCMDARG,
12531 "Number of IOPs the server can do. Tunes the background IO rate",12574 "Number of IOPs the server can do. Tunes the background IO rate",
@@ -13180,6 +13223,7 @@
13180 MYSQL_SYSVAR(doublewrite_file),13223 MYSQL_SYSVAR(doublewrite_file),
13181 MYSQL_SYSVAR(data_home_dir),13224 MYSQL_SYSVAR(data_home_dir),
13182 MYSQL_SYSVAR(doublewrite),13225 MYSQL_SYSVAR(doublewrite),
13226 MYSQL_SYSVAR(use_atomic_writes),
13183 MYSQL_SYSVAR(recovery_stats),13227 MYSQL_SYSVAR(recovery_stats),
13184 MYSQL_SYSVAR(fast_shutdown),13228 MYSQL_SYSVAR(fast_shutdown),
13185 MYSQL_SYSVAR(file_io_threads),13229 MYSQL_SYSVAR(file_io_threads),
1318613230
=== modified file 'Percona-Server/storage/innobase/include/srv0srv.h'
--- Percona-Server/storage/innobase/include/srv0srv.h 2013-04-27 10:04:14 +0000
+++ Percona-Server/storage/innobase/include/srv0srv.h 2013-05-22 17:19:29 +0000
@@ -246,6 +246,11 @@
246#endif246#endif
247247
248extern ibool srv_use_doublewrite_buf;248extern ibool srv_use_doublewrite_buf;
249extern ibool srv_use_atomic_writes;
250#ifdef HAVE_POSIX_FALLOCATE
251extern ibool srv_use_posix_fallocate;
252#endif
253
249extern ibool srv_use_checksums;254extern ibool srv_use_checksums;
250extern ibool srv_fast_checksum;255extern ibool srv_fast_checksum;
251256
252257
=== modified file 'Percona-Server/storage/innobase/os/os0file.c'
--- Percona-Server/storage/innobase/os/os0file.c 2013-04-11 21:03:27 +0000
+++ Percona-Server/storage/innobase/os/os0file.c 2013-05-22 17:19:29 +0000
@@ -62,6 +62,13 @@
62#include <libaio.h>62#include <libaio.h>
63#endif63#endif
6464
65#if defined(UNIV_LINUX) && defined(HAVE_SYS_IOCTL_H)
66# include <sys/ioctl.h>
67# ifndef DFS_IOCTL_ATOMIC_WRITE_SET
68# define DFS_IOCTL_ATOMIC_WRITE_SET _IOW(0x95, 2, uint)
69# endif
70#endif
71
65/* This specifies the file permissions InnoDB uses when it creates files in72/* This specifies the file permissions InnoDB uses when it creates files in
66Unix; the value of os_innodb_umask is initialized in ha_innodb.cc to73Unix; the value of os_innodb_umask is initialized in ha_innodb.cc to
67my_umask */74my_umask */
@@ -1367,6 +1374,35 @@
1367}1374}
13681375
1369/****************************************************************//**1376/****************************************************************//**
1377Tries to enable the atomic write feature, if available, for the specified file
1378handle.
1379@return TRUE if success */
1380static __attribute__((warn_unused_result))
1381ibool
1382os_file_set_atomic_writes(
1383/*======================*/
1384 const char* name, /*!< in: name of the file */
1385 os_file_t file) /*!< in: handle to the file */
1386{
1387#ifdef DFS_IOCTL_ATOMIC_WRITE_SET
1388 int atomic_option = 1;
1389
1390 if (ioctl(file, DFS_IOCTL_ATOMIC_WRITE_SET, &atomic_option)) {
1391
1392 os_file_handle_error_no_exit(name, "ioctl");
1393 return(FALSE);
1394 }
1395
1396 return(TRUE);
1397#else
1398 fprintf(stderr, "InnoDB: Error: trying to enable atomic writes on "
1399 "non-supported platform! Please restart with "
1400 "innodb_use_atomic_writes disabled.\n");
1401 return(FALSE);
1402#endif
1403}
1404
1405/****************************************************************//**
1370NOTE! Use the corresponding macro os_file_create(), not directly1406NOTE! Use the corresponding macro os_file_create(), not directly
1371this function!1407this function!
1372Opens an existing file or creates a new.1408Opens an existing file or creates a new.
@@ -1637,6 +1673,14 @@
1637 }1673 }
1638#endif /* USE_FILE_LOCK */1674#endif /* USE_FILE_LOCK */
16391675
1676 if (srv_use_atomic_writes && type == OS_DATA_FILE
1677 && os_file_set_atomic_writes(name, file)) {
1678
1679 *success = FALSE;
1680 close(file);
1681 file = -1;
1682 }
1683
1640 return(file);1684 return(file);
1641#endif /* __WIN__ */1685#endif /* __WIN__ */
1642}1686}
@@ -1980,6 +2024,22 @@
1980 current_size = 0;2024 current_size = 0;
1981 desired_size = (ib_int64_t)size + (((ib_int64_t)size_high) << 32);2025 desired_size = (ib_int64_t)size + (((ib_int64_t)size_high) << 32);
19822026
2027#ifdef HAVE_POSIX_FALLOCATE
2028 if (srv_use_posix_fallocate) {
2029
2030 if (posix_fallocate(file, current_size, desired_size) == -1) {
2031
2032 fprintf(stderr, "InnoDB: Error: preallocating file "
2033 "space for file \'%s\' failed. Current size "
2034 "%lld, desired size %lld\n",
2035 name, current_size, desired_size);
2036 os_file_handle_error_no_exit(name, "posix_fallocate");
2037 return(FALSE);
2038 }
2039 return(TRUE);
2040 }
2041#endif
2042
1983 /* Write up to 1 megabyte at a time. */2043 /* Write up to 1 megabyte at a time. */
1984 buf_size = ut_min(64, (ulint) (desired_size / UNIV_PAGE_SIZE))2044 buf_size = ut_min(64, (ulint) (desired_size / UNIV_PAGE_SIZE))
1985 * UNIV_PAGE_SIZE;2045 * UNIV_PAGE_SIZE;
19862046
=== modified file 'Percona-Server/storage/innobase/srv/srv0srv.c'
--- Percona-Server/storage/innobase/srv/srv0srv.c 2013-05-21 16:37:02 +0000
+++ Percona-Server/storage/innobase/srv/srv0srv.c 2013-05-22 17:19:29 +0000
@@ -409,6 +409,11 @@
409#endif409#endif
410410
411UNIV_INTERN ibool srv_use_doublewrite_buf = TRUE;411UNIV_INTERN ibool srv_use_doublewrite_buf = TRUE;
412UNIV_INTERN ibool srv_use_atomic_writes = FALSE;
413#ifdef HAVE_POSIX_FALLOCATE
414UNIV_INTERN ibool srv_use_posix_fallocate = FALSE;
415#endif
416
412UNIV_INTERN ibool srv_use_checksums = TRUE;417UNIV_INTERN ibool srv_use_checksums = TRUE;
413UNIV_INTERN ibool srv_fast_checksum = FALSE;418UNIV_INTERN ibool srv_fast_checksum = FALSE;
414419

Subscribers

People subscribed via source and target branches