Merge lp:~laurynas-biveinis/percona-server/atomic-fio-5.5 into lp:percona-server/5.5

Proposed by Laurynas Biveinis
Status: Merged
Approved by: Stewart Smith
Approved revision: no longer in the source branch.
Merged at revision: 518
Proposed branch: lp:~laurynas-biveinis/percona-server/atomic-fio-5.5
Merge into: lp:percona-server/5.5
Diff against target: 333 lines (+199/-0)
9 files modified
Percona-Server/mysql-test/r/percona_server_variables_debug.result (+1/-0)
Percona-Server/mysql-test/r/percona_server_variables_release.result (+1/-0)
Percona-Server/mysql-test/suite/sys_vars/r/innodb_use_atomic_writes_basic.result (+30/-0)
Percona-Server/mysql-test/suite/sys_vars/t/innodb_use_atomic_writes_basic.test (+29/-0)
Percona-Server/storage/innobase/fil/fil0fil.c (+24/-0)
Percona-Server/storage/innobase/handler/ha_innodb.cc (+44/-0)
Percona-Server/storage/innobase/include/srv0srv.h (+5/-0)
Percona-Server/storage/innobase/os/os0file.c (+60/-0)
Percona-Server/storage/innobase/srv/srv0srv.c (+5/-0)
To merge this branch: bzr merge lp:~laurynas-biveinis/percona-server/atomic-fio-5.5
Reviewer Review Type Date Requested Status
Stewart Smith (community) Approve
Alexey Kopytov (community) Approve
Review via email: mp+165202@code.launchpad.net

This proposal supersedes a proposal from 2013-05-21.

Description of the change

2nd MP:
- all the review comments addressed;
- complete_io label in fil_extend_space_to_desired_space() moved to skip fil_node_complete_io(). That is required only for the case when the node is extended by writing to it, not posix_fallocate() call.

Jenkins at http://jenkins.percona.com/job/percona-server-5.5-param/742/. Still running, but enough slaves completed for confidence. (This is a low-impact MP for common code path).
Note that while submitted branch is a GCA, the Jenkins-tested one is based on trunk. This is to save a staging test run.

Please review this without a 5.6 MP. That one is ready too, but Jenkins testing is postponed to let the more urgent 5.5 Jenkins jobs proceed.

Implement directFS Fusion I/O atomic writes for 5.5.
https://blueprints.launchpad.net/percona-server/+spec/atomic-writes-beta-5.5

http://jenkins.percona.com/job/percona-server-5.5-param/738/

    Implement atomic write support for Fusion I/O storage with directFS
    file system, implementing blueprint
    https://blueprints.launchpad.net/percona-server/+spec/atomic-writes-beta-5.5

    This implementation is based on MariaDB implementation at
    https://mariadb.atlassian.net/browse/MDEV-4338.

    - Add new InnoDB global, read-only option innodb_use_atomic_writes.
    - If this option is enabled, then at InnoDB initialization disable the
      doublewrite buffer if it's enabled and set file flush method to
      O_DIRECT if it's not O_DIRECT or ALL_O_DIRECT.
    - Add new function os_file_set_atomic_writes() that either enables
      atomic writes on a specified file descriptor if a Fusion
      I/O-specific syscall is available, either fails.
    - Call os_file_set_atomic_writes() from os_file_create() on data files
      if atomic writes are enabled.
    - If atomic writes are enabled and posix_fallocate() is available,
      then work around a directFS bug that atomic files fail beyond
      current EOF by:
      - calling os_file_set_size() from fil_extend_space_to_desired_size();
      - calling posix_fallocate() in os_file_set_size().
    - New variable test sys_vars.innodb_use_atomic_writes_basic, re-record
      percona_server_variables_debug and percona_server_variables_release
      tests.

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Laurynas,

  - fil_extend_space_to_desired_size() will not work correctly for a
    multi-node ibdata1. The existing code extends the last node, but
    takes the size of other nodes into account by calculating offsets
    as via (start_page_no - file_start_page_no).

    But the new srv_use_posix_fallocate code path just extends the last
    node to the total desired size regardless of other node sizes,
    i.e. overallocate space by always assuming a single-node tablespace.

Minor things (since needs recommitting anyway):

   - spurious comment change on lines 324-247
   - incorrect function calls formatting ("func (...)"):

+ if (ioctl (file, DFS_IOCTL_ATOMIC_WRITE_SET, &atomic_option)) {
+ os_file_handle_error_no_exit (name, "ioctl");
+ os_file_handle_error_no_exit (name, "posix_fallocate");

   - the following doesn't really have any effect, but the convention
     there is to assign variables to their default values, which is
     FALSE for innobase_use_atomic_writes:

+static my_bool innobase_use_atomic_writes = TRUE;

     which is also inconsistent with:

+UNIV_INTERN ibool srv_use_atomic_writes = FALSE;

   - and wrong comment formatting:

+ /* Due to a bug in directFS, using atomics needs
+ * posix_fallocate to extend the file
+ * pwrite() past end of the file won't work
+ */

review: Needs Fixing
Revision history for this message
Alexey Kopytov (akopytov) :
review: Approve
Revision history for this message
Stewart Smith (stewart) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'Percona-Server/mysql-test/r/percona_server_variables_debug.result'
2--- Percona-Server/mysql-test/r/percona_server_variables_debug.result 2013-05-15 05:05:18 +0000
3+++ Percona-Server/mysql-test/r/percona_server_variables_debug.result 2013-05-22 17:19:29 +0000
4@@ -178,6 +178,7 @@
5 INNODB_TRACK_CHANGED_PAGES
6 INNODB_TRX_PURGE_VIEW_UPDATE_ONLY_DEBUG
7 INNODB_TRX_RSEG_N_SLOTS_DEBUG
8+INNODB_USE_ATOMIC_WRITES
9 INNODB_USE_GLOBAL_FLUSH_LOG_AT_TRX_COMMIT
10 INNODB_USE_NATIVE_AIO
11 INNODB_USE_SYS_MALLOC
12
13=== modified file 'Percona-Server/mysql-test/r/percona_server_variables_release.result'
14--- Percona-Server/mysql-test/r/percona_server_variables_release.result 2013-04-04 10:52:07 +0000
15+++ Percona-Server/mysql-test/r/percona_server_variables_release.result 2013-05-22 17:19:29 +0000
16@@ -170,6 +170,7 @@
17 INNODB_THREAD_CONCURRENCY_TIMER_BASED
18 INNODB_THREAD_SLEEP_DELAY
19 INNODB_TRACK_CHANGED_PAGES
20+INNODB_USE_ATOMIC_WRITES
21 INNODB_USE_GLOBAL_FLUSH_LOG_AT_TRX_COMMIT
22 INNODB_USE_NATIVE_AIO
23 INNODB_USE_SYS_MALLOC
24
25=== added file 'Percona-Server/mysql-test/suite/sys_vars/r/innodb_use_atomic_writes_basic.result'
26--- Percona-Server/mysql-test/suite/sys_vars/r/innodb_use_atomic_writes_basic.result 1970-01-01 00:00:00 +0000
27+++ Percona-Server/mysql-test/suite/sys_vars/r/innodb_use_atomic_writes_basic.result 2013-05-22 17:19:29 +0000
28@@ -0,0 +1,30 @@
29+SELECT @@global.innodb_use_atomic_writes;
30+@@global.innodb_use_atomic_writes
31+0
32+SELECT @@innodb_use_atomic_writes;
33+@@innodb_use_atomic_writes
34+0
35+SELECT COUNT(VARIABLE_VALUE) AS should_be_1
36+FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES
37+WHERE VARIABLE_NAME='innodb_use_atomic_writes';
38+should_be_1
39+1
40+SELECT COUNT(VARIABLE_VALUE) AS should_be_1
41+FROM INFORMATION_SCHEMA.SESSION_VARIABLES
42+WHERE VARIABLE_NAME='innodb_use_atomic_writes';
43+should_be_1
44+1
45+SET @@global.innodb_use_atomic_writes=1;
46+ERROR HY000: Variable 'innodb_use_atomic_writes' is a read only variable
47+SELECT IF(@@global.innodb_use_atomic_writes, "ON", "OFF") = VARIABLE_VALUE AS should_be_1
48+FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES
49+WHERE VARIABLE_NAME='innodb_use_atomic_writes';
50+should_be_1
51+1
52+SELECT @@innodb_use_atomic_writes = @@global.innodb_use_atomic_writes AS should_be_1;
53+should_be_1
54+1
55+SELECT @@local.innodb_use_atomic_writes;
56+ERROR HY000: Variable 'innodb_use_atomic_writes' is a GLOBAL variable
57+SELECT @@session.innodb_use_atomic_writes;
58+ERROR HY000: Variable 'innodb_use_atomic_writes' is a GLOBAL variable
59
60=== added file 'Percona-Server/mysql-test/suite/sys_vars/t/innodb_use_atomic_writes_basic.test'
61--- Percona-Server/mysql-test/suite/sys_vars/t/innodb_use_atomic_writes_basic.test 1970-01-01 00:00:00 +0000
62+++ Percona-Server/mysql-test/suite/sys_vars/t/innodb_use_atomic_writes_basic.test 2013-05-22 17:19:29 +0000
63@@ -0,0 +1,29 @@
64+# Test that innodb_use_atomic_writes is a global read-only variable
65+--source include/have_innodb.inc
66+
67+SELECT @@global.innodb_use_atomic_writes;
68+
69+SELECT @@innodb_use_atomic_writes;
70+
71+SELECT COUNT(VARIABLE_VALUE) AS should_be_1
72+FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES
73+WHERE VARIABLE_NAME='innodb_use_atomic_writes';
74+
75+SELECT COUNT(VARIABLE_VALUE) AS should_be_1
76+FROM INFORMATION_SCHEMA.SESSION_VARIABLES
77+WHERE VARIABLE_NAME='innodb_use_atomic_writes';
78+
79+--error ER_INCORRECT_GLOBAL_LOCAL_VAR
80+SET @@global.innodb_use_atomic_writes=1;
81+
82+SELECT IF(@@global.innodb_use_atomic_writes, "ON", "OFF") = VARIABLE_VALUE AS should_be_1
83+FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES
84+WHERE VARIABLE_NAME='innodb_use_atomic_writes';
85+
86+SELECT @@innodb_use_atomic_writes = @@global.innodb_use_atomic_writes AS should_be_1;
87+
88+--error ER_INCORRECT_GLOBAL_LOCAL_VAR
89+SELECT @@local.innodb_use_atomic_writes;
90+
91+--error ER_INCORRECT_GLOBAL_LOCAL_VAR
92+SELECT @@session.innodb_use_atomic_writes;
93
94=== modified file 'Percona-Server/storage/innobase/fil/fil0fil.c'
95--- Percona-Server/storage/innobase/fil/fil0fil.c 2013-04-04 10:50:07 +0000
96+++ Percona-Server/storage/innobase/fil/fil0fil.c 2013-05-22 17:19:29 +0000
97@@ -4861,6 +4861,26 @@
98 start_page_no = space->size;
99 file_start_page_no = space->size - node->size;
100
101+#ifdef HAVE_POSIX_FALLOCATE
102+ if (srv_use_posix_fallocate) {
103+ offset_high = (size_after_extend - file_start_page_no)
104+ * page_size / (4ULL * 1024 * 1024 * 1024);
105+ offset_low = (size_after_extend - file_start_page_no)
106+ * page_size % (4ULL * 1024 * 1024 * 1024);
107+
108+ mutex_exit(&fil_system->mutex);
109+ success = os_file_set_size(node->name, node->handle,
110+ offset_low, offset_high);
111+ mutex_enter(&fil_system->mutex);
112+ if (success) {
113+ node->size += (size_after_extend - start_page_no);
114+ space->size += (size_after_extend - start_page_no);
115+ os_has_said_disk_full = FALSE;
116+ }
117+ goto complete_io;
118+ }
119+#endif
120+
121 /* Extend at most 64 pages at a time */
122 buf_size = ut_min(64, size_after_extend - start_page_no) * page_size;
123 buf2 = mem_alloc(buf_size + page_size);
124@@ -4919,6 +4939,10 @@
125
126 fil_node_complete_io(node, fil_system, OS_FILE_WRITE);
127
128+#ifdef HAVE_POSIX_FALLOCATE
129+complete_io:
130+#endif
131+
132 *actual_size = space->size;
133
134 #ifndef UNIV_HOTBACKUP
135
136=== modified file 'Percona-Server/storage/innobase/handler/ha_innodb.cc'
137--- Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-05-21 16:37:02 +0000
138+++ Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-05-22 17:19:29 +0000
139@@ -186,6 +186,7 @@
140 static my_bool innobase_log_archive = FALSE;
141 static char* innobase_log_arch_dir = NULL;
142 #endif /* UNIV_LOG_ARCHIVE */
143+static my_bool innobase_use_atomic_writes = FALSE;
144 static my_bool innobase_use_doublewrite = TRUE;
145 static my_bool innobase_use_checksums = TRUE;
146 static my_bool innobase_fast_checksum = FALSE;
147@@ -3118,6 +3119,39 @@
148 srv_kill_idle_transaction = 0;
149 #endif
150
151+ srv_use_atomic_writes = (ibool) innobase_use_atomic_writes;
152+ if (innobase_use_atomic_writes) {
153+ fprintf(stderr, "InnoDB: using atomic writes.\n");
154+
155+ /* Force doublewrite buffer off, atomic writes replace it. */
156+ if (srv_use_doublewrite_buf) {
157+ fprintf(stderr,
158+ "InnoDB: Switching off doublewrite buffer "
159+ "because of atomic writes.\n");
160+ innobase_use_doublewrite = FALSE;
161+ srv_use_doublewrite_buf = FALSE;
162+ }
163+
164+ /* Force O_DIRECT on Unixes (on Windows writes are always
165+ unbuffered)*/
166+#ifndef _WIN32
167+ if(!innobase_file_flush_method ||
168+ !strstr(innobase_file_flush_method, "O_DIRECT")) {
169+ innobase_file_flush_method =
170+ srv_file_flush_method_str = (char*)"O_DIRECT";
171+ fprintf(stderr,
172+ "InnoDB: using O_DIRECT due to atomic "
173+ "writes.\n");
174+ }
175+#endif
176+#ifdef HAVE_POSIX_FALLOCATE
177+ /* Due to a bug in directFS, using atomics needs
178+ posix_fallocate() to extend the file, because pwrite() past the
179+ end of the file won't work */
180+ srv_use_posix_fallocate = TRUE;
181+#endif
182+ }
183+
184 #ifdef HAVE_PSI_INTERFACE
185 /* Register keys with MySQL performance schema */
186 if (PSI_server) {
187@@ -12526,6 +12560,15 @@
188 "Disable with --skip-innodb-doublewrite.",
189 NULL, NULL, TRUE);
190
191+static MYSQL_SYSVAR_BOOL(use_atomic_writes, innobase_use_atomic_writes,
192+ PLUGIN_VAR_NOCMDARG | PLUGIN_VAR_READONLY,
193+ "Prevent partial page writes, via atomic writes (beta). "
194+ "The option is used to prevent partial writes in case of a crash/poweroff, "
195+ "as faster alternative to doublewrite buffer. "
196+ "Currently this option works only "
197+ "on Linux only with FusionIO device, and directFS filesystem.",
198+ NULL, NULL, FALSE);
199+
200 static MYSQL_SYSVAR_ULONG(io_capacity, srv_io_capacity,
201 PLUGIN_VAR_RQCMDARG,
202 "Number of IOPs the server can do. Tunes the background IO rate",
203@@ -13180,6 +13223,7 @@
204 MYSQL_SYSVAR(doublewrite_file),
205 MYSQL_SYSVAR(data_home_dir),
206 MYSQL_SYSVAR(doublewrite),
207+ MYSQL_SYSVAR(use_atomic_writes),
208 MYSQL_SYSVAR(recovery_stats),
209 MYSQL_SYSVAR(fast_shutdown),
210 MYSQL_SYSVAR(file_io_threads),
211
212=== modified file 'Percona-Server/storage/innobase/include/srv0srv.h'
213--- Percona-Server/storage/innobase/include/srv0srv.h 2013-04-27 10:04:14 +0000
214+++ Percona-Server/storage/innobase/include/srv0srv.h 2013-05-22 17:19:29 +0000
215@@ -246,6 +246,11 @@
216 #endif
217
218 extern ibool srv_use_doublewrite_buf;
219+extern ibool srv_use_atomic_writes;
220+#ifdef HAVE_POSIX_FALLOCATE
221+extern ibool srv_use_posix_fallocate;
222+#endif
223+
224 extern ibool srv_use_checksums;
225 extern ibool srv_fast_checksum;
226
227
228=== modified file 'Percona-Server/storage/innobase/os/os0file.c'
229--- Percona-Server/storage/innobase/os/os0file.c 2013-04-11 21:03:27 +0000
230+++ Percona-Server/storage/innobase/os/os0file.c 2013-05-22 17:19:29 +0000
231@@ -62,6 +62,13 @@
232 #include <libaio.h>
233 #endif
234
235+#if defined(UNIV_LINUX) && defined(HAVE_SYS_IOCTL_H)
236+# include <sys/ioctl.h>
237+# ifndef DFS_IOCTL_ATOMIC_WRITE_SET
238+# define DFS_IOCTL_ATOMIC_WRITE_SET _IOW(0x95, 2, uint)
239+# endif
240+#endif
241+
242 /* This specifies the file permissions InnoDB uses when it creates files in
243 Unix; the value of os_innodb_umask is initialized in ha_innodb.cc to
244 my_umask */
245@@ -1367,6 +1374,35 @@
246 }
247
248 /****************************************************************//**
249+Tries to enable the atomic write feature, if available, for the specified file
250+handle.
251+@return TRUE if success */
252+static __attribute__((warn_unused_result))
253+ibool
254+os_file_set_atomic_writes(
255+/*======================*/
256+ const char* name, /*!< in: name of the file */
257+ os_file_t file) /*!< in: handle to the file */
258+{
259+#ifdef DFS_IOCTL_ATOMIC_WRITE_SET
260+ int atomic_option = 1;
261+
262+ if (ioctl(file, DFS_IOCTL_ATOMIC_WRITE_SET, &atomic_option)) {
263+
264+ os_file_handle_error_no_exit(name, "ioctl");
265+ return(FALSE);
266+ }
267+
268+ return(TRUE);
269+#else
270+ fprintf(stderr, "InnoDB: Error: trying to enable atomic writes on "
271+ "non-supported platform! Please restart with "
272+ "innodb_use_atomic_writes disabled.\n");
273+ return(FALSE);
274+#endif
275+}
276+
277+/****************************************************************//**
278 NOTE! Use the corresponding macro os_file_create(), not directly
279 this function!
280 Opens an existing file or creates a new.
281@@ -1637,6 +1673,14 @@
282 }
283 #endif /* USE_FILE_LOCK */
284
285+ if (srv_use_atomic_writes && type == OS_DATA_FILE
286+ && os_file_set_atomic_writes(name, file)) {
287+
288+ *success = FALSE;
289+ close(file);
290+ file = -1;
291+ }
292+
293 return(file);
294 #endif /* __WIN__ */
295 }
296@@ -1980,6 +2024,22 @@
297 current_size = 0;
298 desired_size = (ib_int64_t)size + (((ib_int64_t)size_high) << 32);
299
300+#ifdef HAVE_POSIX_FALLOCATE
301+ if (srv_use_posix_fallocate) {
302+
303+ if (posix_fallocate(file, current_size, desired_size) == -1) {
304+
305+ fprintf(stderr, "InnoDB: Error: preallocating file "
306+ "space for file \'%s\' failed. Current size "
307+ "%lld, desired size %lld\n",
308+ name, current_size, desired_size);
309+ os_file_handle_error_no_exit(name, "posix_fallocate");
310+ return(FALSE);
311+ }
312+ return(TRUE);
313+ }
314+#endif
315+
316 /* Write up to 1 megabyte at a time. */
317 buf_size = ut_min(64, (ulint) (desired_size / UNIV_PAGE_SIZE))
318 * UNIV_PAGE_SIZE;
319
320=== modified file 'Percona-Server/storage/innobase/srv/srv0srv.c'
321--- Percona-Server/storage/innobase/srv/srv0srv.c 2013-05-21 16:37:02 +0000
322+++ Percona-Server/storage/innobase/srv/srv0srv.c 2013-05-22 17:19:29 +0000
323@@ -409,6 +409,11 @@
324 #endif
325
326 UNIV_INTERN ibool srv_use_doublewrite_buf = TRUE;
327+UNIV_INTERN ibool srv_use_atomic_writes = FALSE;
328+#ifdef HAVE_POSIX_FALLOCATE
329+UNIV_INTERN ibool srv_use_posix_fallocate = FALSE;
330+#endif
331+
332 UNIV_INTERN ibool srv_use_checksums = TRUE;
333 UNIV_INTERN ibool srv_fast_checksum = FALSE;
334

Subscribers

People subscribed via source and target branches