server crashing, potentially thread pool related

Bug #1309026 reported by Aurimas Mikalauskas
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Sergei Glushchenko
5.1
Invalid
Undecided
Unassigned
5.5
Invalid
Undecided
Unassigned
5.6
Fix Released
High
Sergei Glushchenko

Bug Description

Percona Server 5.6 crashing. Tested up to Percona-Server-5.6.16-64.2 but the core dump we have is generated by 5.6.14 due to missing dbg packages in 5.6.15 and up.

backtrace from mysql error log:

Mar 16 14:47:56 serverx mysqld: 13:47:56 UTC - mysqld got signal 11 ;
Mar 16 14:47:56 serverx mysqld: This could be because you hit a bug. It is also possible that this binary
Mar 16 14:47:56 serverx mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Mar 16 14:47:56 serverx mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Mar 16 14:47:56 serverx mysqld: We will try our best to scrape up some info that will hopefully help
Mar 16 14:47:56 serverx mysqld: diagnose the problem, but since we have already crashed,
Mar 16 14:47:56 serverx mysqld: something is definitely wrong and this may fail.
Mar 16 14:47:56 serverx mysqld: Please help us make Percona Server better by reporting any
Mar 16 14:47:56 serverx mysqld: bugs at http://bugs.percona.com/
Mar 16 14:47:56 serverx mysqld:
Mar 16 14:47:56 serverx mysqld: key_buffer_size=8388608
Mar 16 14:47:56 serverx mysqld: read_buffer_size=131072
Mar 16 14:47:56 serverx mysqld: max_used_connections=8753
Mar 16 14:47:56 serverx mysqld: max_threads=501
Mar 16 14:47:56 serverx mysqld: thread_count=8667
Mar 16 14:47:56 serverx mysqld: connection_count=8667
Mar 16 14:47:56 serverx mysqld: It is possible that mysqld could use up to
Mar 16 14:47:56 serverx mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 345277 K bytes o
f memory
Mar 16 14:47:56 serverx mysqld: Hope that's ok; if not, decrease some variables in the equation.
Mar 16 14:47:56 serverx mysqld:
Mar 16 14:47:56 serverx mysqld: Thread pointer: 0x7fa575d07040
Mar 16 14:47:56 serverx mysqld: Attempting backtrace. You can use the following information to find out
Mar 16 14:47:56 serverx mysqld: where mysqld died. If you see no messages after this, something went
Mar 16 14:47:56 serverx mysqld: terribly wrong...
Mar 16 14:47:56 serverx mysqld: stack_bottom = 7fa66ac86d68 thread_stack 0x40000
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8cc57e]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x4a1)[0x69d6a1]
Mar 16 14:47:56 serverx mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7fce36e94030]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld[0x962252]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld[0x87e7a5]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13finish_commitEP3THD+0x93)[0x884ec3]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG14ordered_commitEP3THDbb+0x21b)[0x889aeb]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG6commitEP3THDb+0x1a5)[0x889d55]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z15ha_commit_transP3THDbb+0x237)[0x5ff177]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z12trans_commitP3THD+0x33)[0x789483]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1c52)[0x70e982]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x5b8)[0x711e48]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xf52)[0x712e22]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z26threadpool_process_requestP3THD+0xc7)[0x7908a7]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld[0x7917c5]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(pfs_spawn_thread+0x143)[0x963d93]
Mar 16 14:47:56 serverx mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7fce36e8bb50]
Mar 16 14:47:56 serverx mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fce355ada7d]

Related branches

Changed in percona-server:
assignee: nobody → Laurynas Biveinis (laurynas-biveinis)
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

(gdb) print event_name_array
$39 = (PFS_single_stat *) 0x67
(gdb) print thread->m_instr_class_waits_stats
$40 = (PFS_single_stat *) 0x67

No idea yet why it's corrupted.

tags: added: i40387
Revision history for this message
Aurimas Mikalauskas (aurimas-mikalauskas) wrote :

For the record, same crash was reproduced with Performance Schema disabled and the problem on a reproducible case on the customer system was reduced to:

CREATE DATABASE ms ;
CREATE TABLE t1 (a INT PRIMARY KEY AUTO_INCREMENT, b INT, c VARCHAR(50)) ;

- Put the following in q.sql:
SELECT FLOOR(RAND() * 1000000), MD5(RAND()), FLOOR(RAND() * 10000) INTO @ri, @rs, @rid
INSERT INTO t1 (b, c) VALUES (@ri, @rs)
INSERT INTO t1 (b, c) VALUES (@ri, @rs)
INSERT INTO t1 (b, c) VALUES (@ri, @rs)
INSERT INTO t1 (b, c) VALUES (@ri, @rs)
UPDATE t1 SET c = 'UPDATED' WHERE a = @rid

- And run:
mysqlslap --create-schema=ms -c 50 -i1000000 -q q.sql

This crashes the server within seconds when I run it with the thread-pool enabled. It is yet to crash the server w/o thread-pool.

Changed in percona-server:
assignee: Laurynas Biveinis (laurynas-biveinis) → Sergei Glushchenko (sergei.glushchenko)
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

See the duplicate bug 1318609 for analysis.

Sergei, is this an upstream issue, does it need reporting there?

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :
Download full text (6.2 KiB)

There is a commit which intended to fix similar issue for debug builds.

        revno: 0.18338.68
        committer: Andrei Elkin <email address hidden>
        branch nick: 5.6-fixes
        timestamp: Wed 2012-11-21 09:45:32 +0200
        message:
          Bug#15867985 STRESS TESTS CAUSES DBUG SERVER TO EXIT WITH SIGNAL 6

          When run with --mysqld=--binlog-order-commits=1 (default) mysqld executes a
          piece of BGC logics that tames a group of threads in a way that they give up
          their commit execution path to a leader thread.
          This preemption appeared to have a specific to DBUG_ON build flaw in that
          the leader and a follower thread could modify a thread-specific area
          concurrently which led to a segfault.

          Fixed with refining the preemption logics to be thread-safe for the DBUG_ON build.

bzr diff -c0.18338.68 sergei/bzr/issue40387/483
=== modified file 'sql/binlog.cc'
--- sql/binlog.cc 2012-11-15 03:35:25 +0000
+++ sql/binlog.cc 2012-11-21 07:45:32 +0000
@@ -1334,6 +1334,16 @@
   if (!leader)
   {
     mysql_mutex_lock(&m_lock_done);
+#ifndef DBUG_OFF
+ /*
+ Leader can be awaiting all-clear to preempt follower's execution.
+ With setting the status the follower ensures it won't execute anything
+ including thread-specific code.
+ */
+ thd->transaction.flags.ready_preempt= 1;
+ if (leader_await_preempt_status)
+ mysql_cond_signal(&m_cond_preempt);
+#endif
     while (thd->transaction.flags.pending)
       mysql_cond_wait(&m_cond_done, &m_lock_done);
     mysql_mutex_unlock(&m_lock_done);
@@ -1361,6 +1371,22 @@
   DBUG_RETURN(result);
 }

+#ifndef DBUG_OFF
+void Stage_manager::clear_preempt_status(THD *head)
+{
+ DBUG_ASSERT(head);
+
+ mysql_mutex_lock(&m_lock_done);
+ while(!head->transaction.flags.ready_preempt)
+ {
+ leader_await_preempt_status= true;
+ mysql_cond_wait(&m_cond_preempt, &m_lock_done);
+ }
+ leader_await_preempt_status= false;
+ mysql_mutex_unlock(&m_lock_done);
+}
+#endif
+
 /**
   Write a rollback record of the transaction to the binary log.

@@ -6009,6 +6035,9 @@
 {
   mysql_mutex_assert_owner(&LOCK_commit);
   Thread_excursion excursion(thd);
+#ifndef DBUG_OFF
+ thd->transaction.flags.ready_preempt= 1; // formality by the leader
+#endif
   for (THD *head= first ; head ; head = head->next_to_commit)
   {
     DBUG_PRINT("debug", ("Thread ID: %lu, commit_error: %d, flags.pending: %s",
@@ -6022,6 +6051,9 @@
       If flush succeeded, attach to the session and commit it in the
       engines.
     */
+#ifndef DBUG_OFF
+ stage_manager.clear_preempt_status(head);
+#endif
     if (flush_error != 0)
       head->commit_error= flush_error;
     else if (int error= excursion.attach_to(head))
@@ -6031,6 +6063,9 @@
       bool all= head->transaction.flags.real_commit;
       if (head->transaction.flags.commit_low)
       {
+ /* head is parked to have exited append() */
+ DBUG_ASSERT(head->transaction.flags.ready_preempt);
+
         if (int error= ha_commit_low(head, all))
           head->commit_error= error;
         else if (head->transac...

Read more...

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

I think that queries and threads become ordered such way to trigger the crash easily with threadpool enabled. I think issue is not threadpool-specific, but is much easier to reproduce with threadpool. So far I had no luck reproducing the crash without threadpool. DEBUG_SYNC is also not much of use because group commit messing with THD and PSI of current thread.

As a workaround we can disable PSI instrumentation of follower thread until leader not done with commit. Just set PSI_thread to NULL in following method:

Stage_manager::enroll_for(StageID stage, THD *thd, mysql_mutex_t *stage_mutex)
{
  // If the queue was empty: we're the leader for this batch
  DBUG_PRINT("debug", ("Enqueue 0x%llx to queue for stage %d",
                       (ulonglong) thd, stage));
  bool leader= m_queue[stage].append(thd);

  /*
    The stage mutex can be NULL if we are enrolling for the first
    stage.
  */
  if (stage_mutex)
    mysql_mutex_unlock(stage_mutex);

  /*
    If the queue was not empty, we're a follower and wait for the
    leader to process the queue. If we were holding a mutex, we have
    to release it before going to sleep.
  */
  if (!leader)
  {
    mysql_mutex_lock(&m_lock_done);
#ifndef DBUG_OFF
    /*
      Leader can be awaiting all-clear to preempt follower's execution.
      With setting the status the follower ensures it won't execute anything
      including thread-specific code.
    */
    thd->transaction.flags.ready_preempt= 1;
    if (leader_await_preempt_status)
      mysql_cond_signal(&m_cond_preempt);
#endif
    while (thd->transaction.flags.pending) {
      mysql_cond_wait(&m_cond_done, &m_lock_done);
    }
    mysql_mutex_unlock(&m_lock_done);
  }
  return leader;
}

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Reported upstream based on code analysis http://bugs.mysql.com/bug.php?id=72681

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Sergei -

Great work on analysis. Would you be able to check whether 5.5 (with a different group commit implementation and less PFS instrumentation) has a corresponding bug too?

tags: added: upstream
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

PS 5.5 is not affected. Group commit implementation here does not "attach" leader to follower's threads.

Revision history for this message
Aurimas Mikalauskas (aurimas-mikalauskas) wrote :

Sergei, Laurynas - I see Oracle has made the bug private. Any other way to know if they intend to fix it (besides scanning changelogs) ? Are we submitting our own fix to PS 5.6 or waiting for an upstream fix ?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

We will have our own fix for the time being.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-782

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.