Merge lp:~akopytov/percona-xtradb-cluster/bug1382797-5.5 into lp:percona-xtradb-cluster/5.5

Proposed by Alexey Kopytov
Status: Merged
Approved by: Raghavendra D Prabhu
Approved revision: no longer in the source branch.
Merged at revision: 835
Proposed branch: lp:~akopytov/percona-xtradb-cluster/bug1382797-5.5
Merge into: lp:percona-xtradb-cluster/5.5
Diff against target: 265 lines (+110/-100)
2 files modified
sql/wsrep_utils.cc (+96/-100)
storage/innobase/os/os0proc.c (+14/-0)
To merge this branch: bzr merge lp:~akopytov/percona-xtradb-cluster/bug1382797-5.5
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Needs Fixing
Raghavendra D Prabhu Pending
Review via email: mp+243696@code.launchpad.net

Description of the change

Bug #1382797: Modify SST code to use fork()/exec() to allow cleanup on
              fatal signals
Bug #1399175: Incorrect POSIX_SPAWN_SETSIGDEF usage in wsrep_utils.cc

Modified wsp::process() to use fork()/exec() instead of posix_spawn() to
have more control on the child process. More specifically:

- the child process now calls prctl() with appropriate arguments (if
  available) before calling exec() so that the SIGTERM signal is
  received by it in case the parent process (i.e. mysqld) is terminated
  either gracefully or ungracefully.

- the child process also call setsid() to create a new session and a
  process group. This is to simplify killing all spawned processes at
  once for child processes. Instead of keeping track of all processes
  started by an SST script and sending termination signals to the
  individually, an SST script can simply kill the entire process group.

Additionally, an optimization to speed up fork() has been implemented to
reduce page table copying overhead as compared to vfork() /
posix_spawn(). Since the InnoDB buffer pool is the major part of mysqld
RSS in most installations, use madvise(..., MADV_DONTFORK) to exclude
buffer pool from memory available to the child on fork().

This revision also fixes bug #1399175 as a side effect, since all code
related to the posix_spawn*() family of functions is replaced.

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Forgot to add signal signal-related code. Will repush.

review: Needs Fixing
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Passes Jenkins now.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'sql/wsrep_utils.cc'
2--- sql/wsrep_utils.cc 2013-11-19 07:24:09 +0000
3+++ sql/wsrep_utils.cc 2014-12-08 16:17:50 +0000
4@@ -25,7 +25,6 @@
5
6 #include <sql_class.h>
7
8-#include <spawn.h> // posix_spawn()
9 #include <unistd.h> // pipe()
10 #include <errno.h> // errno
11 #include <string.h> // strerror()
12@@ -34,6 +33,10 @@
13 #include <sys/socket.h>
14 #include <netdb.h> // getaddrinfo()
15
16+#ifdef HAVE_SYS_PRCTL_H
17+#include <sys/prctl.h>
18+#endif
19+
20 extern char** environ; // environment variables
21
22 static wsp::string wsrep_PATH;
23@@ -89,13 +92,12 @@
24 #define STDIN_FD 0
25 #define STDOUT_FD 1
26
27-#ifndef POSIX_SPAWN_USEVFORK
28-# define POSIX_SPAWN_USEVFORK 0
29-#endif
30-
31 process::process (const char* cmd, const char* type)
32- : str_(cmd ? strdup(cmd) : strdup("")), io_(NULL), err_(EINVAL), pid_(0)
33+ : str_(cmd ? strdup(cmd) : strdup("")), io_(NULL), err_(0), pid_(0)
34 {
35+ int sig;
36+ struct sigaction sa;
37+
38 if (0 == str_)
39 {
40 WSREP_ERROR ("Can't allocate command line of size: %zu", strlen(cmd));
41@@ -128,105 +130,99 @@
42 int const child_end (parent_end == PIPE_READ ? PIPE_WRITE : PIPE_READ);
43 int const close_fd (parent_end == PIPE_READ ? STDOUT_FD : STDIN_FD);
44
45- char* const pargv[4] = { strdup("sh"), strdup("-c"), strdup(str_), NULL };
46- if (!(pargv[0] && pargv[1] && pargv[2]))
47- {
48- err_ = ENOMEM;
49- WSREP_ERROR ("Failed to allocate pargv[] array.");
50- goto cleanup_pipe;
51- }
52-
53- posix_spawnattr_t attr;
54- err_ = posix_spawnattr_init (&attr);
55- if (err_)
56- {
57- WSREP_ERROR ("posix_spawnattr_init() failed: %d (%s)",
58- err_, strerror(err_));
59- goto cleanup_pipe;
60- }
61-
62- err_ = posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETSIGDEF |
63- POSIX_SPAWN_USEVFORK);
64- if (err_)
65- {
66- WSREP_ERROR ("posix_spawnattr_setflags() failed: %d (%s)",
67- err_, strerror(err_));
68- goto cleanup_attr;
69- }
70-
71- posix_spawn_file_actions_t fact;
72- err_ = posix_spawn_file_actions_init (&fact);
73- if (err_)
74- {
75- WSREP_ERROR ("posix_spawn_file_actions_init() failed: %d (%s)",
76- err_, strerror(err_));
77- goto cleanup_attr;
78- }
79-
80- // close child's stdout|stdin depending on what we returning
81- err_ = posix_spawn_file_actions_addclose (&fact, close_fd);
82- if (err_)
83- {
84- WSREP_ERROR ("posix_spawn_file_actions_addclose() failed: %d (%s)",
85- err_, strerror(err_));
86- goto cleanup_fact;
87- }
88-
89- // substitute our pipe descriptor in place of the closed one
90- err_ = posix_spawn_file_actions_adddup2 (&fact,
91- pipe_fds[child_end], close_fd);
92- if (err_)
93- {
94- WSREP_ERROR ("posix_spawn_file_actions_addup2() failed: %d (%s)",
95- err_, strerror(err_));
96- goto cleanup_fact;
97- }
98-
99- err_ = posix_spawnp (&pid_, pargv[0], &fact, &attr, pargv, environ);
100- if (err_)
101- {
102- WSREP_ERROR ("posix_spawnp(%s) failed: %d (%s)",
103- pargv[2], err_, strerror(err_));
104- pid_ = 0; // just to make sure it was not messed up in the call
105- goto cleanup_fact;
106- }
107-
108- io_ = fdopen (pipe_fds[parent_end], type);
109-
110- if (io_)
111- {
112+ pid_ = fork();
113+
114+ if (pid_ == -1)
115+ {
116+ err_= errno;
117+ WSREP_ERROR ("fork() failed: %d (%s)", err_, strerror(err_));
118+ pid_ = 0;
119+ goto cleanup;
120+ }
121+ else if (pid_ > 0)
122+ {
123+ /* Parent */
124+
125+ io_ = fdopen (pipe_fds[parent_end], type);
126+
127+ if (io_)
128+ {
129 pipe_fds[parent_end] = -1; // skip close on cleanup
130- }
131- else
132- {
133+ }
134+ else
135+ {
136 err_ = errno;
137 WSREP_ERROR ("fdopen() failed: %d (%s)", err_, strerror(err_));
138- }
139-
140-cleanup_fact:
141- int err; // to preserve err_ code
142- err = posix_spawn_file_actions_destroy (&fact);
143- if (err)
144- {
145- WSREP_ERROR ("posix_spawn_file_actions_destroy() failed: %d (%s)\n",
146- err, strerror(err));
147- }
148-
149-cleanup_attr:
150- err = posix_spawnattr_destroy (&attr);
151- if (err)
152- {
153- WSREP_ERROR ("posix_spawnattr_destroy() failed: %d (%s)",
154- err, strerror(err));
155- }
156-
157-cleanup_pipe:
158+ }
159+
160+ goto cleanup;
161+ }
162+
163+ /* Child */
164+
165+#ifdef PR_SET_PDEATHSIG
166+ /*
167+ Ensure this process gets SIGTERM when the server is terminated for
168+ whatever reasons.
169+ */
170+
171+ if (prctl(PR_SET_PDEATHSIG, SIGTERM))
172+ {
173+ sql_perror("prctl() failed");
174+ _exit(EXIT_FAILURE);
175+ }
176+#endif
177+
178+ /* Reset all ignored signals to SIG_DFL */
179+
180+ memset(&sa, 0, sizeof(sa));
181+ sa.sa_handler = SIG_DFL;
182+
183+ for (sig = 1; sig < NSIG; sig++)
184+ {
185+ /*
186+ For some signals sigaction() even when SIG_DFL handler is set, so don't
187+ check for return value here.
188+ */
189+ sigaction(sig, &sa, NULL);
190+ }
191+
192+ /*
193+ Make the child process a session/group leader. This is required to
194+ simplify cleanup procedure for SST process, so that all processes spawned
195+ by the SST script could be killed by killing the entire process group.
196+ */
197+
198+ if (setsid() < 0)
199+ {
200+ sql_perror("setsid() failed");
201+ _exit(EXIT_FAILURE);
202+ }
203+
204+ /* close child's stdout|stdin depending on what we returning */
205+
206+ if (close(close_fd) < 0)
207+ {
208+ sql_perror("close() failed");
209+ _exit(EXIT_FAILURE);
210+ }
211+
212+ /* substitute our pipe descriptor in place of the closed one */
213+
214+ if (dup2(pipe_fds[child_end], close_fd) < 0)
215+ {
216+ sql_perror("dup2() failed");
217+ _exit(EXIT_FAILURE);
218+ }
219+
220+ execlp("sh", "sh", "-c", str_, NULL);
221+
222+ sql_perror("execlp() failed");
223+ _exit(EXIT_FAILURE);
224+
225+cleanup:
226 if (pipe_fds[0] >= 0) close (pipe_fds[0]);
227 if (pipe_fds[1] >= 0) close (pipe_fds[1]);
228-
229- free (pargv[0]);
230- free (pargv[1]);
231- free (pargv[2]);
232 }
233
234 process::~process ()
235
236=== modified file 'storage/innobase/os/os0proc.c'
237--- storage/innobase/os/os0proc.c 2013-08-13 11:26:01 +0000
238+++ storage/innobase/os/os0proc.c 2014-12-08 16:17:50 +0000
239@@ -38,6 +38,10 @@
240 #include <sys/utsname.h> /* uname() */
241 #endif
242
243+#if defined(WITH_WSREP) && defined(UNIV_LINUX)
244+#include <sys/mman.h> /* madvise() */
245+#endif
246+
247 /* FreeBSD for example has only MAP_ANON, Linux has MAP_ANONYMOUS and
248 MAP_ANON but MAP_ANON is marked as deprecated */
249 #if defined(MAP_ANONYMOUS)
250@@ -229,6 +233,16 @@
251 memset(ptr, '\0', size);
252 }
253
254+#if defined(WITH_WSREP) && defined(UNIV_LINUX)
255+ /* Do not make the pages from this block available to the child after a
256+ fork(). This is required to speed up process spawning for Galera SST. */
257+
258+ if (madvise(ptr, size, MADV_DONTFORK)) {
259+ fprintf(stderr, "InnoDB: Warning: madvise(MADV_DONTFORK) is "
260+ "not supported by the kernel. Spawning SST processes "
261+ "can be slow.\n");
262+ }
263+#endif
264 return(ptr);
265 }
266

Subscribers

People subscribed via source and target branches