Merge lp:~akopytov/percona-xtrabackup/BT27412-bug1055989-2.0 into lp:percona-xtrabackup/2.0

Proposed by Alexey Kopytov
Status: Merged
Approved by: Alexey Kopytov
Approved revision: no longer in the source branch.
Merged at revision: 478
Proposed branch: lp:~akopytov/percona-xtrabackup/BT27412-bug1055989-2.0
Merge into: lp:percona-xtrabackup/2.0
Diff against target: 202 lines (+99/-28)
2 files modified
innobackupex (+52/-6)
src/xtrabackup.c (+47/-22)
To merge this branch: bzr merge lp:~akopytov/percona-xtrabackup/BT27412-bug1055989-2.0
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Approve
Laurynas Biveinis (community) Needs Fixing
Review via email: mp+134413@code.launchpad.net

Description of the change

    Bug #1055989: innobackupex waits for streaming temporary log file to
                  finish before unlocking tables

    The problem was that in XtraBackup 2.0 streaming xtrabackup_logfile from
    a temporary file was performed in the xtrabackup binary and
    innobackupex waited for the xtrabackup process to exit before unlocking
    the tables. Streaming the log file might take a long time, while
    unnecessarily keeping the server locked with FTWRL. In fact, releasing
    the lock can be performed as soon as the xtrabackup binary stops
    duplicating the redo log.

    Fixed by:

    - splitting resume_ibbackup() into two functions. resume_ibbackup()
      itself now resumes the xtrabackup process and then just waits for the
      log copying to stop rather than process termination as previously. The
      new function wait_for_ibbackup_finish() now waits for process
      termination and obtains its exit status. This function is called after
      releasing the FTWRL lock.

    - implementing a new synchronization point between innobackupex and
      xtrabackup using the same mechanism (the xtrabackup_suspended file)
      which is used for synchronization before starting FTWRL: when
      innobackupex wants to wait until log copying finishes, it removes
      xtrabackup_suspended and waits until it is created again by
      xtrabackup. So xtrabackup creates the file to signal innobackupex when
      log copying is complete and it is now safe to release the FTWRL lock.

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Alexey -

The patch is OK with one point: the second use of xtrabackup_suspended file has nothing to do with xtrabackup being suspended when it's created. Also I have a feature in works that suspends/resumes XB at a different point. I think it's a good idea to have unique file names for all the "sync points": avoid "wires-crossed" sync bugs (it's hard to imagine how one could happen currently, but who what else we will develop before we merge IB and XB :), help with troubleshooting if XB crashed and there's a directory listing only, etc.

review: Needs Fixing
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Laurynas,

Thanks for the review. I agree there are currently no problems caused by reusing xtrabackup_suspended file, though the name looks a bit confusing. I could rename it to xtrabackup_lock or something, but I try to do small incremental changes, and not change something unless I absolutely have to.

It does the job for this specific fix. In case we need more fine-grained locking (before merging innobackupex and xtrabackup), we may consider adding more lock files in the future.

Setting to approved, as there are no comments other than those about the lock file name.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'innobackupex'
2--- innobackupex 2012-11-07 06:44:45 +0000
3+++ innobackupex 2012-11-15 09:01:29 +0000
4@@ -425,12 +425,14 @@
5 # (or finalize the backup by syncing changes if using rsync)
6 backup_files(0);
7
8- # resume ibbackup and wait till it has finished
9- my $ibbackup_exit_code = resume_ibbackup();
10+ # resume ibbackup and wait till log copying is finished
11+ resume_ibbackup();
12
13 # release read locks on all tables
14 mysql_unlockall() if !$option_no_lock;
15
16+ my $ibbackup_exit_code = wait_for_ibbackup_finish();
17+
18 if ( $option_safe_slave_backup && $sql_thread_started) {
19 print STDERR "$prefix: Starting slave SQL thread\n";
20 mysql_send('START SLAVE SQL_THREAD;');
21@@ -921,12 +923,56 @@
22
23
24 #
25-# resume_ibbackup subroutine signals ibbackup to complete its execution
26-# by deleting the 'ibbackup_suspended' file.
27+# resume_ibbackup subroutine signals ibbackup to finish log copying by deleting
28+# the 'xtrabackup_suspended' file and waits until log copying is stopped,
29+# i.e. until 'xtrabackup_suspended' is created again.
30+#
31+# If this functions detects that the xtrabackup process has terminated, it also
32+# sets ibbackup_exit_code and resets ibbackup_pid to avoid trying to reap a
33+# non-existing process later
34 #
35 sub resume_ibbackup {
36- print STDERR "$prefix Resuming ibbackup\n\n";
37- unlink $suspend_file || Die "Failed to delete '$suspend_file': $!";
38+ my $pid = -1;
39+
40+ $now = current_time();
41+ print STDERR "$now $prefix Waiting for log copying to finish\n\n";
42+ unlink $suspend_file || Die "Failed to delete '$suspend_file': $!";
43+
44+ for (;;) {
45+ # check if the xtrabackup process is still alive _before_ checking if
46+ # the file exists to avoid a race condition when the file is created and
47+ # the process terminates right after we do the file check
48+ $pid = waitpid($ibbackup_pid, &WNOHANG);
49+
50+ last if -e $suspend_file;
51+
52+ if ($ibbackup_pid == $pid) {
53+ # The file doesn't exist, but the process has terminated
54+ Die "ibbackup child process has died";
55+ }
56+
57+ sleep 1;
58+ }
59+
60+ if ($pid == $ibbackup_pid) {
61+ $ibbackup_exit_code = $CHILD_ERROR >> 8;
62+ $ibbackup_pid = '';
63+ }
64+
65+ unlink $suspend_file || Die "Failed to delete '$suspend_file': $!";
66+}
67+
68+#
69+# wait for ibbackup to finish and return its exit code
70+#
71+sub wait_for_ibbackup_finish {
72+ if (!$ibbackup_pid) {
73+ # The process has already been reaped.
74+ return $ibbackup_exit_code;
75+ }
76+
77+ $now = current_time();
78+ print STDERR "$now $prefix Waiting for ibbackup (pid=$ibbackup_pid) to finish\n";
79
80 # wait for ibbackup to finish
81 waitpid($ibbackup_pid, 0);
82
83=== modified file 'src/xtrabackup.c'
84--- src/xtrabackup.c 2012-11-08 23:42:03 +0000
85+++ src/xtrabackup.c 2012-11-15 09:01:29 +0000
86@@ -4195,6 +4195,32 @@
87 return(TRUE);
88 }
89
90+/***********************************************************************
91+Create an empty file with a given path and close it.
92+@return TRUE on succees, FALSE on error. */
93+static ibool
94+xb_create_suspend_file(const char *path)
95+{
96+ ibool success;
97+ os_file_t suspend_file = XB_FILE_UNDEFINED;
98+
99+ /* xb_file_create reads srv_unix_file_flush_method */
100+ suspend_file = xb_file_create(path, OS_FILE_CREATE,
101+ OS_FILE_NORMAL, OS_DATA_FILE,
102+ &success);
103+
104+ if (success && suspend_file != XB_FILE_UNDEFINED) {
105+
106+ os_file_close(suspend_file);
107+
108+ return(TRUE);
109+ }
110+
111+ msg("xtrabackup: Error: failed to create file '%s'\n", path);
112+
113+ return(FALSE);
114+}
115+
116 static void
117 xtrabackup_backup_func(void)
118 {
119@@ -4204,6 +4230,8 @@
120 datasink_t *ds;
121 ds_ctxt_t *ds_ctxt = NULL;
122 ds_ctxt_t *meta_ds_ctxt;
123+ char suspend_path[FN_REFLEN];
124+ ibool success;
125
126 #ifdef USE_POSIX_FADVISE
127 msg("xtrabackup: uses posix_fadvise().\n");
128@@ -4648,36 +4676,25 @@
129
130 /* suspend-at-end */
131 if (xtrabackup_suspend_at_end) {
132- os_file_t suspend_file = XB_FILE_UNDEFINED;
133- char suspend_path[FN_REFLEN];
134- ibool success, exists;
135+ ibool exists;
136 os_file_type_t type;
137
138 sprintf(suspend_path, "%s%s", xtrabackup_target_dir,
139 "/xtrabackup_suspended");
140-
141 srv_normalize_path_for_win(suspend_path);
142- /* xb_file_create reads srv_unix_file_flush_method */
143- suspend_file = xb_file_create(suspend_path, OS_FILE_OVERWRITE,
144- OS_FILE_NORMAL, OS_DATA_FILE,
145- &success);
146
147- if (!success) {
148- msg("xtrabackup: Error: failed to create file "
149- "'xtrabackup_suspended'\n");
150+ if (!xb_create_suspend_file(suspend_path)) {
151+ exit(EXIT_FAILURE);
152 }
153
154- if (suspend_file != XB_FILE_UNDEFINED)
155- os_file_close(suspend_file);
156-
157 exists = TRUE;
158 while (exists) {
159 os_thread_sleep(200000); /*0.2 sec*/
160 success = os_file_status(suspend_path, &exists, &type);
161- if (!success)
162- break;
163+ /* success == FALSE if file exists, but stat() failed.
164+ os_file_status() prints an error message in this case */
165+ ut_a(success);
166 }
167- xtrabackup_suspend_at_end = FALSE; /* suspend is 1 time */
168 }
169
170 /* Reinit the datasink */
171@@ -4724,6 +4741,19 @@
172 }
173 msg("\n");
174
175+ if (!log_copying_succeed) {
176+ msg("xtrabackup: Error: log_copying_thread failed.\n");
177+ exit(EXIT_FAILURE);
178+ }
179+
180+ /* Signal innobackupex that log copying has stopped and it may now
181+ unlock tables, so we can possibly stream xtrabackup_logfile later
182+ without holding the lock. */
183+ if (xtrabackup_suspend_at_end &&
184+ !xb_create_suspend_file(suspend_path)) {
185+ exit(EXIT_FAILURE);
186+ }
187+
188 if(!xtrabackup_incremental) {
189 strcpy(metadata_type, "full-backuped");
190 metadata_from_lsn = ut_dulint_zero;
191@@ -4753,11 +4783,6 @@
192
193 }
194
195- if (!log_copying_succeed) {
196- msg("xtrabackup: Error: log_copying_thread failed.\n");
197- exit(EXIT_FAILURE);
198- }
199-
200 /* Stream the transaction log from the temporary file */
201 if (!xtrabackup_log_only && xtrabackup_stream &&
202 xtrabackup_stream_temp_logfile(dst_log_fd, ds_ctxt)) {

Subscribers

People subscribed via source and target branches