Merge lp:~laurynas-biveinis/percona-server/bug805805-1060136-1061118-5.1 into lp:percona-server/5.1

Proposed by Laurynas Biveinis
Status: Superseded
Proposed branch: lp:~laurynas-biveinis/percona-server/bug805805-1060136-1061118-5.1
Merge into: lp:percona-server/5.1
Diff against target: 134 lines (+39/-6)
6 files modified
Percona-Server/configure.in (+6/-0)
Percona-Server/mysql-test/lib/My/SafeProcess/safe_process.cc (+4/-1)
Percona-Server/mysql-test/r/percona_signal_handling.result (+3/-0)
Percona-Server/mysql-test/t/percona_signal_handling.test (+8/-0)
Percona-Server/sql/net_serv.cc (+6/-2)
Percona-Server/vio/viosocket.c (+12/-3)
To merge this branch: bzr merge lp:~laurynas-biveinis/percona-server/bug805805-1060136-1061118-5.1
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Needs Fixing
Review via email: mp+128054@code.launchpad.net

This proposal has been superseded by a proposal from 2012-10-09.

Description of the change

Fix:
- bug 1060136 (safe_process.cc/safe_process.pl should not kill
  mysqld on SIGSTOP/SIGCONT);
- bug 1061118 (Patch to remove excessive fcntl() calls was never
  ported correctly to 5.1);
- bug 805805 (attaching to percona-server with gdb disconnects
  clients).

These bugs are fixed together because fixing bug 1061118 exposes bug
805805
, and the testcase for the latter requires bug 1060136 to be
fixed.

For bug 1060136, the issue is that safe_process.cc is setup to kill
the child on SIGCHLD signal. But this signal may be raised when child
receives non-killing signals too, such as SIGSTOP/SIGCONT, resulting
in safe_process killing the child on such signals too. Fixed by
limiting SIGCHLD only to killing signals, by specifying the
SA_NOCLDSTOP option.

For bug 1061118, the issue is that the fcntl()-removing patch is
dependant on NO_ALARM C preprocessor define being defined. But
nothing in the build defined it. Fixed by adding it to configure.in.
If Percona Server suppported more platforms, a better fix would be to
backport SO_SNDTIMEO/SO_RCVTIMEO checks from 5.5, but at this point in
5.1 lifecycle an unconditional define is enough. In addition to
NO_DEBUG, we also define SIGNAL_WITH_VIO_CLOSE to follow 5.5. If the
latter is not defined, percona_innodb_kill_idle_trx* tests regress by
server returning query execution interrupted error instead of closing
the socket. To avoid regressing on upstream bug
http://bugs.mysql.com/bug.php?id=52633 (When building with -DNO_ALARM
on Linux, wait_timeout is effective 10x longer), backport its
vio_should_retry() too. This same backport is used by the Facebook
5.1 branch.

For bug 805805, the issue is that mysqld (with fcntl()s removed,
i.e. the above fixed) will close the the connections upon receiving
SIGSTOP/SIGCONT. This is caused by network read from socket being
cancelled with errno = EINTR on these signals. Normally this is
handled by my_real_read() retrying the read on EINTR, but the
fcntl()-removal patch has disabled this inadvertently in
my_real_read() and my_real_write(). Added a testcase
percona_signal_handling.

http://jenkins.percona.com/job/percona-server-5.1-param/432/

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Looks good except that

1) as I wrote in bug #1061118, I don't think we should bother enabling the feature in our 5.1 binaries and then backporting the fix for http://bugs.mysql.com/bug.php?id=52633 from 5.5

We can keep the fix (and the test case) though. The test case could reference the bug #.

review: Needs Fixing

Unmerged revisions

484. By Laurynas Biveinis

Fix:
- bug 1060136 (safe_process.cc/safe_process.pl should not kill
  mysqld on SIGSTOP/SIGCONT);
- bug 1061118 (Patch to remove excessive fcntl() calls was never
  ported correctly to 5.1);
- bug 805805 (attaching to percona-server with gdb disconnects
  clients).

These bugs are fixed together because fixing bug 1061118 exposes bug
805805
, and the testcase for the latter requires bug 1060136 to be
fixed.

For bug 1060136, the issue is that safe_process.cc is setup to kill
the child on SIGCHLD signal. But this signal may be raised when child
receives non-killing signals too, such as SIGSTOP/SIGCONT, resulting
in safe_process killing the child on such signals too. Fixed by
limiting SIGCHLD only to killing signals, by specifying the
SA_NOCLDSTOP option.

For bug 1061118, the issue is that the fcntl()-removing patch is
dependant on NO_ALARM C preprocessor define being defined. But
nothing in the build defined it. Fixed by adding it to configure.in.
If Percona Server suppported more platforms, a better fix would be to
backport SO_SNDTIMEO/SO_RCVTIMEO checks from 5.5, but at this point in
5.1 lifecycle an unconditional define is enough. In addition to
NO_DEBUG, we also define SIGNAL_WITH_VIO_CLOSE to follow 5.5. If the
latter is not defined, percona_innodb_kill_idle_trx* tests regress by
server returning query execution interrupted error instead of closing
the socket. To avoid regressing on upstream bug
http://bugs.mysql.com/bug.php?id=52633 (When building with -DNO_ALARM
on Linux, wait_timeout is effective 10x longer), backport its
vio_should_retry() too. This same backport is used by the Facebook
5.1 branch.

For bug 805805, the issue is that mysqld (with fcntl()s removed,
i.e. the above fixed) will close the the connections upon receiving
SIGSTOP/SIGCONT. This is caused by network read from socket being
cancelled with errno = EINTR on these signals. Normally this is
handled by my_real_read() retrying the read on EINTR, but the
fcntl()-removal patch has disabled this inadvertently in
my_real_read() and my_real_write(). Added a testcase
percona_signal_handling.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'Percona-Server/configure.in'
2--- Percona-Server/configure.in 2012-08-20 03:14:02 +0000
3+++ Percona-Server/configure.in 2012-10-04 16:16:24 +0000
4@@ -2894,6 +2894,12 @@
5 AC_SUBST(CC)
6 AC_SUBST(GXX)
7
8+# Fix Percona Server bug https://bugs.launchpad.net/percona-server/+bug/1061118
9+# A complete fix would be to backport SO_SNDTIMEO/SO_RCVTIMEO tests from 5.5, but
10+# this fix is good enough for us.
11+AC_DEFINE([NO_ALARM], [], [Disable alarm-based timeouts in network I/O])
12+AC_DEFINE([SIGNAL_WITH_VIO_CLOSE], [], [Close VIO of the thread being killed])
13+
14 # Set configuration options for make_binary_distribution
15 case $SYSTEM_TYPE in
16 *netware*)
17
18=== modified file 'Percona-Server/mysql-test/lib/My/SafeProcess/safe_process.cc'
19--- Percona-Server/mysql-test/lib/My/SafeProcess/safe_process.cc 2011-06-30 15:37:13 +0000
20+++ Percona-Server/mysql-test/lib/My/SafeProcess/safe_process.cc 2012-10-04 16:16:24 +0000
21@@ -153,11 +153,14 @@
22 pid_t own_pid= getpid();
23 pid_t parent_pid= getppid();
24 bool nocore = false;
25+ struct sigaction sigchld_action;
26
27+ sigchld_action.sa_handler= handle_signal;
28+ sigchld_action.sa_flags= SA_NOCLDSTOP;
29 /* Install signal handlers */
30 signal(SIGTERM, handle_signal);
31 signal(SIGINT, handle_signal);
32- signal(SIGCHLD, handle_signal);
33+ sigaction(SIGCHLD, &sigchld_action, NULL);
34 signal(SIGABRT, handle_abort);
35
36 sprintf(safe_process_name, "safe_process[%ld]", (long) own_pid);
37
38=== added file 'Percona-Server/mysql-test/r/percona_signal_handling.result'
39--- Percona-Server/mysql-test/r/percona_signal_handling.result 1970-01-01 00:00:00 +0000
40+++ Percona-Server/mysql-test/r/percona_signal_handling.result 2012-10-04 16:16:24 +0000
41@@ -0,0 +1,3 @@
42+SELECT 2+2;
43+2+2
44+4
45
46=== added file 'Percona-Server/mysql-test/t/percona_signal_handling.test'
47--- Percona-Server/mysql-test/t/percona_signal_handling.test 1970-01-01 00:00:00 +0000
48+++ Percona-Server/mysql-test/t/percona_signal_handling.test 2012-10-04 16:16:24 +0000
49@@ -0,0 +1,8 @@
50+--source include/not_windows.inc
51+
52+let $mysqld_pid_file=`SELECT @@GLOBAL.pid_file`;
53+
54+system kill -STOP `cat $mysqld_pid_file`;
55+system kill -CONT `cat $mysqld_pid_file`;
56+
57+SELECT 2+2;
58
59=== modified file 'Percona-Server/sql/net_serv.cc'
60--- Percona-Server/sql/net_serv.cc 2011-11-24 02:01:19 +0000
61+++ Percona-Server/sql/net_serv.cc 2012-10-04 16:16:24 +0000
62@@ -617,11 +617,12 @@
63 if ((long) (length= vio_write(net->vio,pos,(size_t) (end-pos))) <= 0)
64 {
65 my_bool interrupted = vio_should_retry(net->vio);
66-#if !defined(NO_ALARM) && !defined(__WIN__)
67+#if !defined(__WIN__)
68 if ((interrupted || length == 0) && !thr_alarm_in_use(&alarmed))
69 {
70 if (!thr_alarm(&alarmed, net->write_timeout, &alarm_buff))
71 { /* Always true for client */
72+#if !defined(NO_ALARM)
73 my_bool old_mode;
74 while (vio_blocking(net->vio, TRUE, &old_mode) < 0)
75 {
76@@ -639,6 +640,7 @@
77 #endif
78 goto end;
79 }
80+#endif /* !defined(NO_ALARM) */
81 retry_count=0;
82 continue;
83 }
84@@ -820,7 +822,7 @@
85
86 DBUG_PRINT("info",("vio_read returned %ld errno: %d",
87 (long) length, vio_errno(net->vio)));
88-#if !defined(NO_ALARM) && (!defined(__WIN__) || defined(MYSQL_SERVER))
89+#if !defined(__WIN__) || defined(MYSQL_SERVER)
90 /*
91 We got an error that there was no data on the socket. We now set up
92 an alarm to not 'read forever', change the socket to non blocking
93@@ -830,6 +832,7 @@
94 {
95 if (!thr_alarm(&alarmed,net->read_timeout,&alarm_buff)) /* Don't wait too long */
96 {
97+#if !defined(NO_ALARM)
98 my_bool old_mode;
99 while (vio_blocking(net->vio, TRUE, &old_mode) < 0)
100 {
101@@ -852,6 +855,7 @@
102 #endif
103 goto end;
104 }
105+#endif /* !defined(NO_ALARM) */
106 retry_count=0;
107 continue;
108 }
109
110=== modified file 'Percona-Server/vio/viosocket.c'
111--- Percona-Server/vio/viosocket.c 2012-02-17 19:02:17 +0000
112+++ Percona-Server/vio/viosocket.c 2012-10-04 16:16:24 +0000
113@@ -250,11 +250,20 @@
114
115
116 my_bool
117-vio_should_retry(Vio * vio __attribute__((unused)))
118+vio_should_retry(Vio * vio)
119 {
120 int en = socket_errno;
121- return (en == SOCKET_EAGAIN || en == SOCKET_EINTR ||
122- en == SOCKET_EWOULDBLOCK);
123+ /*
124+ man 2 read write
125+ EAGAIN or EWOULDBLOCK when a socket is a non-blocking mode means
126+ that the read/write would block.
127+ man 7 socket
128+ EAGAIN or EWOULDBLOCK when a socket is in a blocking mode means
129+ that the corresponding receiving or sending timeout was reached.
130+ */
131+ return en == SOCKET_EINTR ||
132+ (!vio_is_blocking(vio) &&
133+ (en == SOCKET_EAGAIN || en == SOCKET_EWOULDBLOCK));
134 }
135
136

Subscribers

People subscribed via source and target branches