Merge lp:~akopytov/percona-xtradb-cluster/bug1382797-5.6 into lp:percona-xtradb-cluster

Proposed by Alexey Kopytov on 2014-12-04
Status: Merged
Approved by: Raghavendra D Prabhu on 2014-12-09
Approved revision: 941
Merge reported by: Alexey Kopytov
Merged at revision: not available
Proposed branch: lp:~akopytov/percona-xtradb-cluster/bug1382797-5.6
Merge into: lp:percona-xtradb-cluster
Diff against target: 266 lines (+111/-100)
2 files modified
sql/wsrep_utils.cc (+97/-100)
storage/innobase/os/os0proc.cc (+14/-0)
To merge this branch: bzr merge lp:~akopytov/percona-xtradb-cluster/bug1382797-5.6
Reviewer Review Type Date Requested Status
Raghavendra D Prabhu (community) 2014-12-04 Needs Fixing on 2014-12-06
Alexey Kopytov (community) Needs Fixing on 2014-12-04
Review via email: mp+243698@code.launchpad.net

Description of the change

Bug #1382797: Modify SST code to use fork()/exec() to allow cleanup on
              fatal signals
Bug #1399175: Incorrect POSIX_SPAWN_SETSIGDEF usage in wsrep_utils.cc

Modified wsp::process() to use fork()/exec() instead of posix_spawn() to
have more control on the child process. More specifically:

- the child process now calls prctl() with appropriate arguments (if
  available) before calling exec() so that the SIGTERM signal is
  received by it in case the parent process (i.e. mysqld) is terminated
  either gracefully or ungracefully.

- the child process also call setsid() to create a new session and a
  process group. This is to simplify killing all spawned processes at
  once for child processes. Instead of keeping track of all processes
  started by an SST script and sending termination signals to the
  individually, an SST script can simply kill the entire process group.

Additionally, an optimization to speed up fork() has been implemented to
reduce page table copying overhead as compared to vfork() /
posix_spawn(). Since the InnoDB buffer pool is the major part of mysqld
RSS in most installations, use madvise(..., MADV_DONTFORK) to exclude
buffer pool from memory available to the child on fork().

This revision also fixes bug #1399175 as a side effect, since all code
related to the posix_spawn*() family of functions is replaced.

To post a comment you must log in.
Alexey Kopytov (akopytov) wrote :

Forgot to add signal signal-related code. Will repush.

review: Needs Fixing

Added comments.

review: Needs Fixing
941. By Alexey Kopytov on 2014-12-08

Bug #1382797: Modify SST code to use fork()/exec() to allow cleanup on
              fatal signals
Bug #1399175: Incorrect POSIX_SPAWN_SETSIGDEF usage in wsrep_utils.cc

Modified wsp::process() to use fork()/exec() instead of posix_spawn() to
have more control on the child process. More specifically:

- the child process now calls prctl() with appropriate arguments (if
  available) before calling exec() so that the SIGTERM signal is
  received by it in case the parent process (i.e. mysqld) is terminated
  either gracefully or ungracefully.

- the child process also call setsid() to create a new session and a
  process group. This is to simplify killing all spawned processes at
  once for child processes. Instead of keeping track of all processes
  started by an SST script and sending termination signals to the
  individually, an SST script can simply kill the entire process group.

Additionally, an optimization to speed up fork() has been implemented to
reduce page table copying overhead as compared to vfork() /
posix_spawn(). Since the InnoDB buffer pool is the major part of mysqld
RSS in most installations, use madvise(..., MADV_DONTFORK) to exclude
buffer pool from memory available to the child on fork().

This revision also fixes bug #1399175 as a side effect, since all code
related to the posix_spawn*() family of functions is replaced.

Alexey Kopytov (akopytov) wrote :

Passes Jenkins now.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'sql/wsrep_utils.cc'
2--- sql/wsrep_utils.cc 2014-05-21 08:49:19 +0000
3+++ sql/wsrep_utils.cc 2014-12-08 15:03:30 +0000
4@@ -25,7 +25,6 @@
5
6 #include <sql_class.h>
7
8-#include <spawn.h> // posix_spawn()
9 #include <unistd.h> // pipe()
10 #include <errno.h> // errno
11 #include <string.h> // strerror()
12@@ -33,6 +32,11 @@
13 #include <sys/types.h>
14 #include <sys/socket.h>
15 #include <netdb.h> // getaddrinfo()
16+#include <signal.h>
17+
18+#ifdef HAVE_SYS_PRCTL_H
19+#include <sys/prctl.h>
20+#endif
21
22 extern char** environ; // environment variables
23
24@@ -89,13 +93,12 @@
25 #define STDIN_FD 0
26 #define STDOUT_FD 1
27
28-#ifndef POSIX_SPAWN_USEVFORK
29-# define POSIX_SPAWN_USEVFORK 0
30-#endif
31-
32 process::process (const char* cmd, const char* type)
33- : str_(cmd ? strdup(cmd) : strdup("")), io_(NULL), err_(EINVAL), pid_(0)
34+ : str_(cmd ? strdup(cmd) : strdup("")), io_(NULL), err_(0), pid_(0)
35 {
36+ int sig;
37+ struct sigaction sa;
38+
39 if (0 == str_)
40 {
41 WSREP_ERROR ("Can't allocate command line of size: %zu", strlen(cmd));
42@@ -128,105 +131,99 @@
43 int const child_end (parent_end == PIPE_READ ? PIPE_WRITE : PIPE_READ);
44 int const close_fd (parent_end == PIPE_READ ? STDOUT_FD : STDIN_FD);
45
46- char* const pargv[4] = { strdup("sh"), strdup("-c"), strdup(str_), NULL };
47- if (!(pargv[0] && pargv[1] && pargv[2]))
48- {
49- err_ = ENOMEM;
50- WSREP_ERROR ("Failed to allocate pargv[] array.");
51- goto cleanup_pipe;
52- }
53-
54- posix_spawnattr_t attr;
55- err_ = posix_spawnattr_init (&attr);
56- if (err_)
57- {
58- WSREP_ERROR ("posix_spawnattr_init() failed: %d (%s)",
59- err_, strerror(err_));
60- goto cleanup_pipe;
61- }
62-
63- err_ = posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETSIGDEF |
64- POSIX_SPAWN_USEVFORK);
65- if (err_)
66- {
67- WSREP_ERROR ("posix_spawnattr_setflags() failed: %d (%s)",
68- err_, strerror(err_));
69- goto cleanup_attr;
70- }
71-
72- posix_spawn_file_actions_t fact;
73- err_ = posix_spawn_file_actions_init (&fact);
74- if (err_)
75- {
76- WSREP_ERROR ("posix_spawn_file_actions_init() failed: %d (%s)",
77- err_, strerror(err_));
78- goto cleanup_attr;
79- }
80-
81- // close child's stdout|stdin depending on what we returning
82- err_ = posix_spawn_file_actions_addclose (&fact, close_fd);
83- if (err_)
84- {
85- WSREP_ERROR ("posix_spawn_file_actions_addclose() failed: %d (%s)",
86- err_, strerror(err_));
87- goto cleanup_fact;
88- }
89-
90- // substitute our pipe descriptor in place of the closed one
91- err_ = posix_spawn_file_actions_adddup2 (&fact,
92- pipe_fds[child_end], close_fd);
93- if (err_)
94- {
95- WSREP_ERROR ("posix_spawn_file_actions_addup2() failed: %d (%s)",
96- err_, strerror(err_));
97- goto cleanup_fact;
98- }
99-
100- err_ = posix_spawnp (&pid_, pargv[0], &fact, &attr, pargv, environ);
101- if (err_)
102- {
103- WSREP_ERROR ("posix_spawnp(%s) failed: %d (%s)",
104- pargv[2], err_, strerror(err_));
105- pid_ = 0; // just to make sure it was not messed up in the call
106- goto cleanup_fact;
107- }
108-
109- io_ = fdopen (pipe_fds[parent_end], type);
110-
111- if (io_)
112- {
113+ pid_ = fork();
114+
115+ if (pid_ == -1)
116+ {
117+ err_= errno;
118+ WSREP_ERROR ("fork() failed: %d (%s)", err_, strerror(err_));
119+ pid_ = 0;
120+ goto cleanup;
121+ }
122+ else if (pid_ > 0)
123+ {
124+ /* Parent */
125+
126+ io_ = fdopen (pipe_fds[parent_end], type);
127+
128+ if (io_)
129+ {
130 pipe_fds[parent_end] = -1; // skip close on cleanup
131- }
132- else
133- {
134+ }
135+ else
136+ {
137 err_ = errno;
138 WSREP_ERROR ("fdopen() failed: %d (%s)", err_, strerror(err_));
139- }
140-
141-cleanup_fact:
142- int err; // to preserve err_ code
143- err = posix_spawn_file_actions_destroy (&fact);
144- if (err)
145- {
146- WSREP_ERROR ("posix_spawn_file_actions_destroy() failed: %d (%s)\n",
147- err, strerror(err));
148- }
149-
150-cleanup_attr:
151- err = posix_spawnattr_destroy (&attr);
152- if (err)
153- {
154- WSREP_ERROR ("posix_spawnattr_destroy() failed: %d (%s)",
155- err, strerror(err));
156- }
157-
158-cleanup_pipe:
159+ }
160+
161+ goto cleanup;
162+ }
163+
164+ /* Child */
165+
166+#ifdef PR_SET_PDEATHSIG
167+ /*
168+ Ensure this process gets SIGTERM when the server is terminated for
169+ whatever reasons.
170+ */
171+
172+ if (prctl(PR_SET_PDEATHSIG, SIGTERM))
173+ {
174+ sql_perror("prctl() failed");
175+ _exit(EXIT_FAILURE);
176+ }
177+#endif
178+
179+ /* Reset all ignored signals to SIG_DFL */
180+
181+ memset(&sa, 0, sizeof(sa));
182+ sa.sa_handler = SIG_DFL;
183+
184+ for (sig = 1; sig < NSIG; sig++)
185+ {
186+ /*
187+ For some signals sigaction() even when SIG_DFL handler is set, so don't
188+ check for return value here.
189+ */
190+ sigaction(sig, &sa, NULL);
191+ }
192+
193+ /*
194+ Make the child process a session/group leader. This is required to
195+ simplify cleanup procedure for SST process, so that all processes spawned
196+ by the SST script could be killed by killing the entire process group.
197+ */
198+
199+ if (setsid() < 0)
200+ {
201+ sql_perror("setsid() failed");
202+ _exit(EXIT_FAILURE);
203+ }
204+
205+ /* close child's stdout|stdin depending on what we returning */
206+
207+ if (close(close_fd) < 0)
208+ {
209+ sql_perror("close() failed");
210+ _exit(EXIT_FAILURE);
211+ }
212+
213+ /* substitute our pipe descriptor in place of the closed one */
214+
215+ if (dup2(pipe_fds[child_end], close_fd) < 0)
216+ {
217+ sql_perror("dup2() failed");
218+ _exit(EXIT_FAILURE);
219+ }
220+
221+ execlp("sh", "sh", "-c", str_, NULL);
222+
223+ sql_perror("execlp() failed");
224+ _exit(EXIT_FAILURE);
225+
226+cleanup:
227 if (pipe_fds[0] >= 0) close (pipe_fds[0]);
228 if (pipe_fds[1] >= 0) close (pipe_fds[1]);
229-
230- free (pargv[0]);
231- free (pargv[1]);
232- free (pargv[2]);
233 }
234
235 process::~process ()
236
237=== modified file 'storage/innobase/os/os0proc.cc'
238--- storage/innobase/os/os0proc.cc 2013-10-23 12:32:06 +0000
239+++ storage/innobase/os/os0proc.cc 2014-12-08 15:03:30 +0000
240@@ -38,6 +38,10 @@
241 #include <sys/utsname.h> /* uname() */
242 #endif
243
244+#if defined(WITH_WSREP) && defined(UNIV_LINUX)
245+#include <sys/mman.h> /* madvise() */
246+#endif
247+
248 /* FreeBSD for example has only MAP_ANON, Linux has MAP_ANONYMOUS and
249 MAP_ANON but MAP_ANON is marked as deprecated */
250 #if defined(MAP_ANONYMOUS)
251@@ -229,6 +233,16 @@
252 memset(ptr, '\0', size);
253 }
254
255+#if defined(WITH_WSREP) && defined(UNIV_LINUX)
256+ /* Do not make the pages from this block available to the child after a
257+ fork(). This is required to speed up process spawning for Galera SST. */
258+
259+ if (madvise(ptr, size, MADV_DONTFORK)) {
260+ fprintf(stderr, "InnoDB: Warning: madvise(MADV_DONTFORK) is "
261+ "not supported by the kernel. Spawning SST processes "
262+ "can be slow.\n");
263+ }
264+#endif
265 return(ptr);
266 }
267

Subscribers

People subscribed via source and target branches

to all changes: