Merge lp:~gl-az/percona-xtrabackup/2.0-io-block-size into lp:percona-xtrabackup/2.0

Proposed by George Ormond Lorch III
Status: Work in progress
Proposed branch: lp:~gl-az/percona-xtrabackup/2.0-io-block-size
Merge into: lp:percona-xtrabackup/2.0
Diff against target: 234 lines (+99/-34)
3 files modified
innobackupex (+12/-1)
src/xtrabackup.cc (+52/-33)
test/t/xb_io_block_size.sh (+35/-0)
To merge this branch: bzr merge lp:~gl-az/percona-xtrabackup/2.0-io-block-size
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Needs Information
Review via email: mp+161862@code.launchpad.net

Description of the change

This implements the io-block-size option for XtraBackup and innobackupex (just for passthrough) which specifies number of pages to read/write per cycle.
Changes made:
- Increased the default from 64 pages to 256 pages (4M).
- Changed the ds_buffer size when chained behind compression to (UNIV_PAGE_SIZE * (xtrabackup_iosize + 1)), the one extra page is for any additional formatting overhead that would normally be seen in compression and xbstream.
- Corrected the retry logic in fil_cur.cc so that an individual page will be re-read up to a pre-determined number of times and for each validation failure, only that page will be re-read. Previous logic would re-read the entire block and continue counting retry failures even across failed, then successfully read pages.
- Created very simple test case that sets io-block-size to a larger value.
- Corrupt page reads are tested as part of bug766033.sh which creates a corrupted dataset and looks for the error message.
- Not tested as part of any test is a page that has been read and failed validation, then re-read and passed.

To post a comment you must log in.
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

Jenkins: http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/427

Other notes:
- In recent testing on two systems, 256 pages seems to be a sweet spot with slowly increasing benefit up to about 512 pages. At 256 pages, a consistent 20-25% reduction in backup time is observed with only a very minor hit to concurrent load throughput.

Revision history for this message
Stewart Smith (stewart) wrote :

George Ormond Lorch III <email address hidden> writes:
> - Not tested as part of any test is a page that has been read and
> failed validation, then re-read and passed.

This would be a really awesome test to have however... I'm not sure if
Alexey has any ideas of how we might do this with debug sync and the
like... but I think it could be quite worthwhile to have a test.

--
Stewart Smith

Revision history for this message
George Ormond Lorch III (gl-az) wrote :

Completely agree, I even looked into it a bit. I'm pretty sure you're
right though, it would need some sync in order to make it deterministic.
I certainly see this situation 'passing' in my local brute force tests
but it would be _really_ nice to have more of this sort of test in the
suite.

On 5/1/2013 10:12 AM, Stewart Smith wrote:
> George Ormond Lorch III <email address hidden> writes:
>> - Not tested as part of any test is a page that has been read and
>> failed validation, then re-read and passed.
> This would be a really awesome test to have however... I'm not sure if
> Alexey has any ideas of how we might do this with debug sync and the
> like... but I think it could be quite worthwhile to have a test.
>

--
George O. Lorch III
Software Engineer, Percona
+1-888-401-3401 x542 US/Arizona (GMT -7)
skype: george.ormond.lorch.iii

Revision history for this message
Alexey Kopytov (akopytov) wrote :

We also have bug #1093385 and https://bugs.launchpad.net/percona-xtrabackup/+bug/1095249/comments/1. Basically, in XB 2.0+ we are discarding read-ahead on each read. So it would be interesting to test the effect of this change with that bug fixed. I sense that the effect will be much lower (if at all measurable).

George, would you be willing to fix that bug and rerun tests with different block sizes? The fix would be to use FADV_DONTNEED to only discard already consumed data from the cache, rather than the entire file.

Regarding the test case, we used to have sporadic page validation failures in Jenkins before bug #1081882 was fixed. So it is covered in a way. But I agree, creating a dedicated test that would trigger a partially written page reliably doesn't look feasible.

review: Needs Information
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

Certainly can do that Alexey. Will post back when I have some numbers...
On 5/1/2013 11:49 PM, Alexey Kopytov wrote:
> Review: Needs Information
>
> We also have bug #1093385 and https://bugs.launchpad.net/percona-xtrabackup/+bug/1095249/comments/1. Basically, in XB 2.0+ we are discarding read-ahead on each read. So it would be interesting to test the effect of this change with that bug fixed. I sense that the effect will be much lower (if at all measurable).
>
> George, would you be willing to fix that bug and rerun tests with different block sizes? The fix would be to use FADV_DONTNEED to only discard already consumed data from the cache, rather than the entire file.
>
> Regarding the test case, we used to have sporadic page validation failures in Jenkins before bug #1081882 was fixed. So it is covered in a way. But I agree, creating a dedicated test that would trigger a partially written page reliably doesn't look feasible.

--
George O. Lorch III
Software Engineer, Percona
+1-888-401-3401 x542 US/Arizona (GMT -7)
skype: george.ormond.lorch.iii

Unmerged revisions

552. By George Ormond Lorch III

This implements the io-block-size option for XtraBackup and innobackupex (just for passthrough) which specifies number of pages to read/write per cycle.
Changes made:
- Increased the default from 64 pages to 256 pages (4M).
- Corrected the retry logic in xtrabackup_copy_datafile so that an individual page will be re-read up to a pre-determined number of times and for each validation failure, only that page will be re-read. Previous logic would re-read the entire block and continue counting retry failures even across failed and then successfully read pages.
- Created very simple test case that sets io-block-size to a larger value.
- Corrupt page reads are tested as part of bug766033.sh which creates a corrupted dataset and looks for the error message.
- Not tested as part of any test is a page that has been read and failed validation, then re-read and passed.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'innobackupex'
2--- innobackupex 2013-04-28 18:34:11 +0000
3+++ innobackupex 2013-05-01 16:01:24 +0000
4@@ -81,6 +81,7 @@
5 my $option_tables_file = '';
6 my $option_throttle = '';
7 my $option_sleep = '';
8+my $option_io_block_size = '';
9 my $option_compress = 999;
10 my $option_compress_threads = 1;
11 my $option_uncompress = '';
12@@ -1158,6 +1159,11 @@
13 if ($option_throttle) {
14 $options = $options . " --throttle=$option_throttle";
15 }
16+
17+ if ($option_io_block_size) {
18+ $options = $options . " --io-buffer-size=$option_io_block_size";
19+ }
20+
21 if ($option_log_copy_interval) {
22 $options = $options . " --log-copy-interval=$option_log_copy_interval";
23 }
24@@ -1998,7 +2004,8 @@
25 }
26
27 # read command line options
28- $rcode = GetOptions('compress' => \$option_compress,
29+ $rcode = GetOptions('io-block-size=s' => \$option_io_block_size,
30+ 'compress' => \$option_compress,
31 'compress-threads=i' => \$option_compress_threads,
32 'help' => \$option_help,
33 'version' => \$option_version,
34@@ -3142,6 +3149,10 @@
35
36 This option specifies the log sequence number (LSN) to use for the incremental backup. The option accepts a string argument. It is used with the --incremental option. It is used instead of specifying --incremental-basedir. For databases created by MySQL and Percona Server 5.0-series versions, specify the LSN as two 32-bit integers in high:low format. For databases created in 5.1 and later, specify the LSN as a single 64-bit integer.
37
38+=item --io-block-size
39+
40+This option specifies the number of InnoDB pages that will be read per read+write I/O cycle. It is passed directly to xtrabackup's --io-block-size option. See xtrabackup documentation for details.
41+
42 =item --move-back
43
44 Move all the files in a previously made backup from the backup directory to the actual datadir location. Use with caution, as it removes backup files.
45
46=== modified file 'src/xtrabackup.cc'
47--- src/xtrabackup.cc 2013-04-29 05:45:52 +0000
48+++ src/xtrabackup.cc 2013-05-01 16:01:24 +0000
49@@ -1413,6 +1413,8 @@
50 xb_stream_fmt_t xtrabackup_stream_fmt;
51 my_bool xtrabackup_stream = FALSE;
52
53+ulonglong xtrabackup_iosize = 64;
54+
55 static const char *xtrabackup_compress_alg = NULL;
56 ibool xtrabackup_compress = FALSE;
57 uint xtrabackup_compress_threads;
58@@ -1645,6 +1647,7 @@
59 OPT_XTRA_CREATE_IB_LOGFILE,
60 OPT_XTRA_PARALLEL,
61 OPT_XTRA_STREAM,
62+ OPT_XTRA_IOSIZE,
63 OPT_XTRA_COMPRESS,
64 OPT_XTRA_COMPRESS_THREADS,
65 OPT_INNODB,
66@@ -1817,6 +1820,11 @@
67 (G_PTR*) &xtrabackup_stream_str, (G_PTR*) &xtrabackup_stream_str, 0, GET_STR,
68 REQUIRED_ARG, 0, 0, 0, 0, 0, 0},
69
70+ {"io-block-size", OPT_XTRA_IOSIZE,
71+ "Size of working buffer(s) in pages. The default value is 64.",
72+ (G_PTR*) &xtrabackup_iosize, (G_PTR*) &xtrabackup_iosize,
73+ 0, GET_ULL, REQUIRED_ARG, 64, 1, ULONGLONG_MAX, 0, 0, 0},
74+
75 {"compress", OPT_XTRA_COMPRESS, "Compress individual backup files using the "
76 "specified compression algorithm. Currently the only supported algorithm "
77 "is 'quicklz'. It is also the default algorithm, i.e. the one used when "
78@@ -4032,7 +4040,7 @@
79 }
80
81 /* TODO: We may tune the behavior (e.g. by fil_aio)*/
82-#define COPY_CHUNK 64
83+#define COPY_CHUNK xtrabackup_iosize
84
85 static
86 my_bool
87@@ -4204,6 +4212,8 @@
88 ulint chunk;
89 ulint chunk_offset;
90 ulint retry_count = 10;
91+ const ulint max_retries = 10;
92+ const ulint retry_wait = 100000;
93
94 if (file_size - offset > (IB_INT64) (COPY_CHUNK * page_size)) {
95 chunk = COPY_CHUNK * page_size;
96@@ -4211,7 +4221,6 @@
97 chunk = (ulint)(file_size - offset);
98 }
99
100-read_retry:
101 xtrabackup_io_throttling();
102
103 success = xb_os_file_read(src_file, page, offset, chunk);
104@@ -4225,15 +4234,21 @@
105
106 /* check corruption and retry */
107 for (chunk_offset = 0; chunk_offset < chunk; chunk_offset += page_size) {
108- if (xb_buf_page_is_corrupted(page + chunk_offset,
109+ retry_count = 0;
110+ while (xb_buf_page_is_corrupted(page + chunk_offset,
111 info.zip_size))
112 {
113+ ulint page_no = ((offset + (IB_INT64)chunk_offset) >> page_size_shift);
114+ IB_INT64 page_offset = offset + chunk_offset;
115+ ulint retry_read_size = page_size;
116+ if (page_size > chunk - chunk_offset) {
117+ retry_read_size = chunk - chunk_offset;
118+ }
119+
120 if (
121 trx_sys_sys_space(node->space->id)
122- && ((offset + (IB_INT64)chunk_offset) >> page_size_shift)
123- >= (IB_INT64) FSP_EXTENT_SIZE
124- && ((offset + (IB_INT64)chunk_offset) >> page_size_shift)
125- < (IB_INT64) FSP_EXTENT_SIZE * 3) {
126+ && page_no >= (IB_INT64) FSP_EXTENT_SIZE
127+ && page_no < (IB_INT64) FSP_EXTENT_SIZE * 3) {
128 /* double write buffer may have old data in the end
129 or it may contain the other format page like COMPRESSED.
130 So, we can pass the check of double write buffer.*/
131@@ -4241,33 +4256,37 @@
132 msg("[%02u] xtrabackup: "
133 "Page %lu seems double write "
134 "buffer. passing the check.\n",
135- thread_n,
136- (ulint)((offset +
137- (IB_INT64)chunk_offset) >>
138- page_size_shift));
139- } else {
140- retry_count--;
141- if (retry_count == 0) {
142- msg("[%02u] xtrabackup: "
143- "Error: 10 retries "
144- "resulted in fail. File "
145- "%s seems to be "
146- "corrupted.\n",
147- thread_n, node_path);
148- goto error;
149- }
150+ thread_n, page_no);
151+ break;
152+ }
153+
154+ msg("[%02u] xtrabackup: "
155+ "Database page corruption detected at "
156+ "page %lu. retrying %lu of %lu...\n",
157+ thread_n, page_no, retry_count + 1,
158+ max_retries);
159+
160+ os_thread_sleep(retry_wait);
161+
162+ xtrabackup_io_throttling();
163+
164+ success = xb_os_file_read(src_file,
165+ page + chunk_offset,
166+ page_offset,
167+ retry_read_size);
168+
169+ if (!success) {
170+ goto error;
171+ }
172+
173+ retry_count++;
174+ if (retry_count >= max_retries) {
175 msg("[%02u] xtrabackup: "
176- "Database page corruption "
177- "detected at page %lu. "
178- "retrying...\n",
179- thread_n,
180- (ulint)((offset +
181- (IB_INT64)chunk_offset)
182- >> page_size_shift));
183-
184- os_thread_sleep(100000);
185-
186- goto read_retry;
187+ "Error: %ld retries resulted in "
188+ "failure. File %s seems to be "
189+ "corrupted.\n",
190+ thread_n, retry_count, node_path);
191+ goto error;
192 }
193 }
194 }
195
196=== added file 'test/t/xb_io_block_size.sh'
197--- test/t/xb_io_block_size.sh 1970-01-01 00:00:00 +0000
198+++ test/t/xb_io_block_size.sh 2013-05-01 16:01:24 +0000
199@@ -0,0 +1,35 @@
200+. inc/common.sh
201+
202+start_server --innodb_file_per_table
203+
204+load_dbase_schema sakila
205+load_dbase_data sakila
206+
207+# Take backup
208+vlog "Creating the backup directory: $topdir/backup"
209+mkdir -p $topdir/backup
210+innobackupex $topdir/backup --io-block-size=512
211+backup_dir=`grep "innobackupex: Backup created in directory" $OUTFILE | awk -F\' '{ print $2}'`
212+
213+#echo "Backup dir in $backup_dir"
214+
215+stop_server
216+# Remove datadir
217+rm -r $mysql_datadir
218+# Restore sakila
219+vlog "Applying log"
220+vlog "###########"
221+vlog "# PREPARE #"
222+vlog "###########"
223+innobackupex --apply-log $backup_dir
224+vlog "Restoring MySQL datadir"
225+mkdir -p $mysql_datadir
226+vlog "###########"
227+vlog "# RESTORE #"
228+vlog "###########"
229+innobackupex --copy-back $backup_dir
230+
231+start_server
232+# Check sakila
233+run_cmd ${MYSQL} ${MYSQL_ARGS} -e "SELECT count(*) from actor" sakila
234+

Subscribers

People subscribed via source and target branches