Merge lp:~sergei.glushchenko/percona-server/5.6-ps-bug1194097 into lp:percona-server/5.6

Proposed by Sergei Glushchenko
Status: Merged
Approved by: Laurynas Biveinis
Approved revision: no longer in the source branch.
Merged at revision: 443
Proposed branch: lp:~sergei.glushchenko/percona-server/5.6-ps-bug1194097
Merge into: lp:percona-server/5.6
Diff against target: 340 lines (+115/-42)
13 files modified
Percona-Server/include/violite.h (+7/-0)
Percona-Server/mysql-test/r/percona_bug1201681.result (+6/-0)
Percona-Server/mysql-test/t/percona_bug1201681-master.opt (+1/-0)
Percona-Server/mysql-test/t/percona_bug1201681.test (+27/-0)
Percona-Server/sql/mysqld.cc (+1/-1)
Percona-Server/sql/sql_class.cc (+2/-27)
Percona-Server/sql/threadpool_common.cc (+8/-2)
Percona-Server/sql/threadpool_unix.cc (+1/-1)
Percona-Server/vio/vio.c (+4/-0)
Percona-Server/vio/vio_priv.h (+2/-0)
Percona-Server/vio/viopipe.c (+14/-0)
Percona-Server/vio/vioshm.c (+16/-0)
Percona-Server/vio/viosocket.c (+26/-11)
To merge this branch: bzr merge lp:~sergei.glushchenko/percona-server/5.6-ps-bug1194097
Reviewer Review Type Date Requested Status
Laurynas Biveinis (community) Approve
Alexey Kopytov (community) Needs Fixing
Vlad Lesin (community) g2 Needs Fixing
Review via email: mp+174525@code.launchpad.net

Description of the change

This patch fixes multiple issues.
- assertion failure (pfs->m_idle || (state == PSI_SOCKET_STATE_IDLE)) in pfs.cc
PFS updates of current state of socket in threadpool has been improved.
* as comment in threadpool_common.cc states, there is a possibility for
more than one iteration of loop in threadpool_process_request to be done,
which is handled now. Cases when loop wasn't executed at all are also
handled.
* change in mysqld.cc is aimed to prevent race condition (concurrent changes
in THD and it's PFS structures) which caused SIGSEGV's on server shutdown.
* adding vio_cancel for socket and other VIO implementations allows to
handle kill idle transaction correctly. Similar change present in 5.5
branch Percona Server rev. 442.1.4.

http://jenkins.percona.com/view/PS%205.6/job/percona-server-5.6-param/196/

Build is run with parameters using threadpool and maximum pool size of 50
threads. Why default value of 500 threads cannot be used in our Jenkins
farm will be explained below.
Results are pretty good. Except for several sporadic failures of slow log
related tests there are two constantly failing tests which expect maximum
pool size isn't set explicitly:
- pool_of_threads
- thread_pool_max_threads_basic
Tests ssl_crl_crlpath and ssl_crl are also failing. I relate these failures
to the fact that I built with -DWITH_SSL=system.
We can see that slave (http://jenkins.percona.com/view/PS%205.6/job/percona-server-5.6-param/196/BUILD_TYPE=debug,Host=ubuntu-lucid-64bit/)
ran out resources and was unable to create new threads and processes.
I can explain this by the fact that this is a VPS instance which means it
not fully virtualized and resources like processes and threads are shared
between all virtual environments. When 8 mysql tests run in parallel, each
testing replication (2 mysqld processes), each allocated pool of 50 threads,
we'll have about 800 only threadpool worker threads, we should also take
into account IO threads and many others. This also likely a reason of weird
failures seen by Alexey B.

To post a comment you must log in.
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

I added test case for bug #1201681, which is also fixed by this branch

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

1) Unnecessary changes in 303;
2) vio_ssl_cancel() is declared but not defined;

There are two places where vio_shutdown() is replaced by vio_cancel(). vio_shutdown() invokes mysql_socket_close() but vio_cancel() does not. Are there any other places where socket is closed for those two cases to avoid descriptors leak?

review: Needs Fixing (g2)
Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

The above was partial review.

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Hi Vlad,

Thank you for the review,

I agree with 1) and 2). Will puth the empty line back and remove vio_ssl_cancel() declaration.

Connection will eventually be closed via vio_shutdown called from close_connection from threadpool_remove_connection in case of threadpool or from do_handle_one_connection in case of thread per connection. The reason for this change is that when connection is killed we have to signal to signal to the epoll/kevent listener that it should handle this connection and shut it down. By closing socket we will not emit any event for epoll and connection will stuck in epoll queue.

Sergei

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

repushed with fixes of 1 and 2 of Vlad's comments

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Sergei,

Looks good in general, just a couple of minor comments:

  - is “--innodb-read-only” in the test case a copy paste from another
    test?
  - it doesn’t look good that there’s now some copy-pasted code between
    vio_shutdown*() and vio_cancel*(). Would be nice to avoid
    duplication by making vio_shutdown*() call vio_cancel*() to do most
    of the work, and then close/invalidate the socket.

review: Needs Fixing
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Hi Alexey,

Yes, it was copied by mistake, fixed.
Code duplication was removed as well.

http://jenkins.percona.com/view/PS%205.6/job/-server-5.6-param/325/

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Diff lines 61--75 can be replaced by a single --source include/restart_mysqld.inc

Otherwise looks good.

review: Needs Fixing
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Laurynas,

--source include/restart_mysqld.inc

will not work this testcase. This testcase is to check that server with threadpool enabled and two connections opened restarted normally. If bug is present, testcase will fail with timeout.

restart_mysqld.inc will force server restart with timeout 10 seconds, this mean server will restart anyway and we not test the bug.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'Percona-Server/include/violite.h'
2--- Percona-Server/include/violite.h 2013-06-25 13:13:06 +0000
3+++ Percona-Server/include/violite.h 2013-09-29 16:12:18 +0000
4@@ -73,6 +73,7 @@
5
6 void vio_delete(Vio* vio);
7 int vio_shutdown(Vio* vio, int how);
8+int vio_cancel(Vio* vio, int how);
9 my_bool vio_reset(Vio* vio, enum enum_vio_type type,
10 my_socket sd, void *ssl, uint flags);
11 size_t vio_read(Vio *vio, uchar * buf, size_t size);
12@@ -190,6 +191,7 @@
13 #define vio_should_retry(vio) (vio)->should_retry(vio)
14 #define vio_was_timeout(vio) (vio)->was_timeout(vio)
15 #define vio_shutdown(vio,how) ((vio)->vioshutdown)(vio,how)
16+#define vio_cancel(vio,how) ((vio)->viocancel)(vio,how)
17 #define vio_peer_addr(vio, buf, prt, buflen) (vio)->peer_addr(vio, buf, prt, buflen)
18 #define vio_io_wait(vio, event, timeout) (vio)->io_wait(vio, event, timeout)
19 #define vio_is_connected(vio) (vio)->is_connected(vio)
20@@ -266,6 +268,11 @@
21 descriptors, handles can remain valid after a shutdown.
22 */
23 int (*vioshutdown)(Vio*, int);
24+ /*
25+ Partial shutdown. All the actions performed which shutdown performs,
26+ but descriptor remains open and valid.
27+ */
28+ int (*viocancel)(Vio*, int);
29 my_bool (*is_connected)(Vio*);
30 my_bool (*has_data) (Vio*);
31 int (*io_wait)(Vio*, enum enum_vio_io_event, int);
32
33=== added file 'Percona-Server/mysql-test/r/percona_bug1201681.result'
34--- Percona-Server/mysql-test/r/percona_bug1201681.result 1970-01-01 00:00:00 +0000
35+++ Percona-Server/mysql-test/r/percona_bug1201681.result 2013-09-29 16:12:18 +0000
36@@ -0,0 +1,6 @@
37+# Request clean shutdown
38+# Wait for disconect
39+# Restart server.
40+SELECT 1;
41+1
42+1
43
44=== added file 'Percona-Server/mysql-test/t/percona_bug1201681-master.opt'
45--- Percona-Server/mysql-test/t/percona_bug1201681-master.opt 1970-01-01 00:00:00 +0000
46+++ Percona-Server/mysql-test/t/percona_bug1201681-master.opt 2013-09-29 16:12:18 +0000
47@@ -0,0 +1,1 @@
48+--thread-handling=pool-of-threads
49
50=== added file 'Percona-Server/mysql-test/t/percona_bug1201681.test'
51--- Percona-Server/mysql-test/t/percona_bug1201681.test 1970-01-01 00:00:00 +0000
52+++ Percona-Server/mysql-test/t/percona_bug1201681.test 2013-09-29 16:12:18 +0000
53@@ -0,0 +1,27 @@
54+# Bug 1201681: Server with threadpool hangs during shutdown when there are
55+# open connections
56+
57+-- source include/have_pool_of_threads.inc
58+
59+--connect (con1,localhost,root,,)
60+
61+# Write file to make mysql-test-run.pl wait for the server to stop
62+--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
63+
64+--echo # Request clean shutdown
65+--send_shutdown
66+
67+--echo # Wait for disconect
68+--source include/wait_until_disconnected.inc
69+
70+--echo # Restart server.
71+--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
72+
73+-- enable_reconnect
74+-- source include/wait_until_connected_again.inc
75+-- disable_reconnect
76+
77+--connection con1
78+SELECT 1;
79+
80+--disconnect con1
81
82=== modified file 'Percona-Server/sql/mysqld.cc'
83--- Percona-Server/sql/mysqld.cc 2013-09-09 15:24:45 +0000
84+++ Percona-Server/sql/mysqld.cc 2013-09-29 16:12:18 +0000
85@@ -1434,8 +1434,8 @@
86 continue;
87
88 tmp->killed= THD::KILL_CONNECTION;
89+ mysql_mutex_lock(&tmp->LOCK_thd_data);
90 MYSQL_CALLBACK(thread_scheduler, post_kill_notification, (tmp));
91- mysql_mutex_lock(&tmp->LOCK_thd_data);
92 if (tmp->mysys_var)
93 {
94 tmp->mysys_var->abort=1;
95
96=== modified file 'Percona-Server/sql/sql_class.cc'
97--- Percona-Server/sql/sql_class.cc 2013-09-21 18:26:20 +0000
98+++ Percona-Server/sql/sql_class.cc 2013-09-29 16:12:18 +0000
99@@ -1902,33 +1902,8 @@
100 #ifdef SIGNAL_WITH_VIO_SHUTDOWN
101 if (this != current_thd)
102 {
103- /*
104- Before sending a signal, let's close the socket of the thread
105- that is being killed ("this", which is not the current thread).
106- This is to make sure it does not block if the signal is lost.
107- This needs to be done only on platforms where signals are not
108- a reliable interruption mechanism.
109-
110- Note that the downside of this mechanism is that we could close
111- the connection while "this" target thread is in the middle of
112- sending a result to the application, thus violating the client-
113- server protocol.
114-
115- On the other hand, without closing the socket we have a race
116- condition. If "this" target thread passes the check of
117- thd->killed, and then the current thread runs through
118- THD::awake(), sets the 'killed' flag and completes the
119- signaling, and then the target thread runs into read(), it will
120- block on the socket. As a result of the discussions around
121- Bug#37780, it has been decided that we accept the race
122- condition. A second KILL awakes the target from read().
123-
124- If we are killing ourselves, we know that we are not blocked.
125- We also know that we will check thd->killed before we go for
126- reading the next statement.
127- */
128-
129- shutdown_active_vio();
130+ if (active_vio)
131+ vio_cancel(active_vio, SHUT_RDWR);
132 }
133 #endif
134
135
136=== modified file 'Percona-Server/sql/threadpool_common.cc'
137--- Percona-Server/sql/threadpool_common.cc 2013-05-27 03:01:29 +0000
138+++ Percona-Server/sql/threadpool_common.cc 2013-09-29 16:12:18 +0000
139@@ -226,6 +226,7 @@
140 thd->net.reading_or_writing= 1;
141 thd->skip_wait_timeout= true;
142 MYSQL_SOCKET_SET_STATE(thd->net.vio->mysql_socket, PSI_SOCKET_STATE_IDLE);
143+ thd->m_server_idle= true;
144 threadpool_init_net_server_extension(thd);
145 }
146 }
147@@ -326,13 +327,18 @@
148 thd->net.reading_or_writing= 1;
149 goto end;
150 }
151+ if (!thd->m_server_idle) {
152+ MYSQL_SOCKET_SET_STATE(thd->net.vio->mysql_socket, PSI_SOCKET_STATE_IDLE);
153+ MYSQL_START_IDLE_WAIT(thd->m_idle_psi, &thd->m_idle_state);
154+ thd->m_server_idle= true;
155+ }
156 }
157
158 end:
159- if (!thd->m_server_idle) {
160+ if (!retval && !thd->m_server_idle) {
161 MYSQL_SOCKET_SET_STATE(thd->net.vio->mysql_socket, PSI_SOCKET_STATE_IDLE);
162 MYSQL_START_IDLE_WAIT(thd->m_idle_psi, &thd->m_idle_state);
163- thd->m_server_idle = true;
164+ thd->m_server_idle= true;
165 }
166
167 worker_context.restore();
168
169=== modified file 'Percona-Server/sql/threadpool_unix.cc'
170--- Percona-Server/sql/threadpool_unix.cc 2013-05-27 03:01:29 +0000
171+++ Percona-Server/sql/threadpool_unix.cc 2013-09-29 16:12:18 +0000
172@@ -1295,7 +1295,7 @@
173 DBUG_VOID_RETURN;
174
175 if (thd->net.vio)
176- vio_shutdown(thd->net.vio, SHUT_RD);
177+ vio_cancel(thd->net.vio, SHUT_RD);
178 DBUG_VOID_RETURN;
179 }
180
181
182=== modified file 'Percona-Server/vio/vio.c'
183--- Percona-Server/vio/vio.c 2013-05-12 06:24:46 +0000
184+++ Percona-Server/vio/vio.c 2013-09-29 16:12:18 +0000
185@@ -92,6 +92,7 @@
186 vio->should_retry =vio_should_retry;
187 vio->was_timeout =vio_was_timeout;
188 vio->vioshutdown =vio_shutdown_pipe;
189+ vio->viocancel =vio_cancel_pipe;
190 vio->peer_addr =vio_peer_addr;
191 vio->io_wait =no_io_wait;
192 vio->is_connected =vio_is_connected_pipe;
193@@ -111,6 +112,7 @@
194 vio->should_retry =vio_should_retry;
195 vio->was_timeout =vio_was_timeout;
196 vio->vioshutdown =vio_shutdown_shared_memory;
197+ vio->viocancel =vio_cancel_shared_memory;
198 vio->peer_addr =vio_peer_addr;
199 vio->io_wait =no_io_wait;
200 vio->is_connected =vio_is_connected_shared_memory;
201@@ -130,6 +132,7 @@
202 vio->should_retry =vio_should_retry;
203 vio->was_timeout =vio_was_timeout;
204 vio->vioshutdown =vio_ssl_shutdown;
205+ vio->viocancel =vio_cancel;
206 vio->peer_addr =vio_peer_addr;
207 vio->io_wait =vio_io_wait;
208 vio->is_connected =vio_is_connected;
209@@ -147,6 +150,7 @@
210 vio->should_retry =vio_should_retry;
211 vio->was_timeout =vio_was_timeout;
212 vio->vioshutdown =vio_shutdown;
213+ vio->viocancel =vio_cancel;
214 vio->peer_addr =vio_peer_addr;
215 vio->io_wait =vio_io_wait;
216 vio->is_connected =vio_is_connected;
217
218=== modified file 'Percona-Server/vio/vio_priv.h'
219--- Percona-Server/vio/vio_priv.h 2013-05-12 06:24:46 +0000
220+++ Percona-Server/vio/vio_priv.h 2013-09-29 16:12:18 +0000
221@@ -30,6 +30,7 @@
222 size_t vio_write_pipe(Vio *vio, const uchar * buf, size_t size);
223 my_bool vio_is_connected_pipe(Vio *vio);
224 int vio_shutdown_pipe(Vio * vio, int how);
225+int vio_cancel_pipe(Vio * vio, int how);
226 #endif
227
228 #ifdef HAVE_SMEM
229@@ -37,6 +38,7 @@
230 size_t vio_write_shared_memory(Vio *vio, const uchar * buf, size_t size);
231 my_bool vio_is_connected_shared_memory(Vio *vio);
232 int vio_shutdown_shared_memory(Vio * vio, int how);
233+int vio_cancel_shared_memory(Vio * vio, int how);
234 void vio_delete_shared_memory(Vio *vio);
235 #endif
236
237
238=== modified file 'Percona-Server/vio/viopipe.c'
239--- Percona-Server/vio/viopipe.c 2013-05-12 06:24:46 +0000
240+++ Percona-Server/vio/viopipe.c 2013-09-29 16:12:18 +0000
241@@ -143,5 +143,19 @@
242 DBUG_RETURN(ret);
243 }
244
245+
246+int vio_cancel_pipe(Vio *vio, int how)
247+{
248+ DBUG_ENTER("vio_shutdown_pipe");
249+
250+ CancelIo(vio->hPipe);
251+ CloseHandle(vio->overlapped.hEvent);
252+ DisconnectNamedPipe(vio->hPipe);
253+
254+ vio->inactive= TRUE;
255+
256+ DBUG_RETURN(0);
257+}
258+
259 #endif
260
261
262=== modified file 'Percona-Server/vio/vioshm.c'
263--- Percona-Server/vio/vioshm.c 2013-05-12 06:24:46 +0000
264+++ Percona-Server/vio/vioshm.c 2013-09-29 16:12:18 +0000
265@@ -222,5 +222,21 @@
266 DBUG_RETURN(0);
267 }
268
269+int vio_cancel_shared_memory(Vio * vio, int how)
270+{
271+ DBUG_ENTER("vio_cancel_shred_memory");
272+ if (vio->inactive == FALSE)
273+ {
274+ /*
275+ Set event_conn_closed for notification of both client and server that
276+ connection is closed
277+ */
278+ SetEvent(vio->event_conn_closed);
279+ }
280+
281+ vio->inactive= TRUE;
282+ DBUG_RETURN(0);
283+}
284+
285 #endif /* #if defined(_WIN32) && defined(HAVE_SMEM) */
286
287
288=== modified file 'Percona-Server/vio/viosocket.c'
289--- Percona-Server/vio/viosocket.c 2013-06-25 13:13:06 +0000
290+++ Percona-Server/vio/viosocket.c 2013-09-29 16:12:18 +0000
291@@ -449,7 +449,30 @@
292 int r=0;
293 DBUG_ENTER("vio_shutdown");
294
295- if (vio->inactive == FALSE)
296+ r= vio_cancel(vio, how);
297+
298+ if (mysql_socket_close(vio->mysql_socket))
299+ r= -1;
300+
301+ if (r)
302+ {
303+ DBUG_PRINT("vio_error", ("close() failed, error: %d",socket_errno));
304+ /* FIXME: error handling (not critical for MySQL) */
305+ }
306+
307+ vio->inactive= TRUE;
308+ vio->mysql_socket= MYSQL_INVALID_SOCKET;
309+
310+ DBUG_RETURN(r);
311+}
312+
313+
314+int vio_cancel(Vio * vio, int how)
315+{
316+ int r= 0;
317+ DBUG_ENTER("vio_cancel");
318+
319+ if (vio->inactive == FALSE)
320 {
321 DBUG_ASSERT(vio->type == VIO_TYPE_TCPIP ||
322 vio->type == VIO_TYPE_SOCKET ||
323@@ -463,16 +486,8 @@
324 Windows). */
325 (void) cancel_io((HANDLE)vio->mysql_socket, vio->thread_id);
326 #endif
327- if (mysql_socket_close(vio->mysql_socket))
328- r= -1;
329- }
330- if (r)
331- {
332- DBUG_PRINT("vio_error", ("close() failed, error: %d",socket_errno));
333- /* FIXME: error handling (not critical for MySQL) */
334- }
335- vio->inactive= TRUE;
336- vio->mysql_socket= MYSQL_INVALID_SOCKET;
337+ }
338+
339 DBUG_RETURN(r);
340 }
341

Subscribers

People subscribed via source and target branches