Percona Server moved to https://jira.percona.com/projects/PS

server crashing, potentially thread pool related

Bug #1309026 reported by Aurimas Mikalauskas on 2014-04-17

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
MySQL Server	Unknown	Unknown	mysql-bugs #72681
Percona Server moved to https://jira.percona.com/projects/PS	Fix Released	High	Sergei Glushchenko	Percona Server moved to https://jira.percona.com/projects/PS 5.6.19-67.0
5.1	Invalid	Undecided	Unassigned
5.5	Invalid	Undecided	Unassigned
5.6	Fix Released	High	Sergei Glushchenko	Percona Server moved to https://jira.percona.com/projects/PS 5.6.19-67.0

Bug Description

Percona Server 5.6 crashing. Tested up to Percona-Server-5.6.16-64.2 but the core dump we have is generated by 5.6.14 due to missing dbg packages in 5.6.15 and up.

backtrace from mysql error log:

Mar 16 14:47:56 serverx mysqld: 13:47:56 UTC - mysqld got signal 11 ;
Mar 16 14:47:56 serverx mysqld: This could be because you hit a bug. It is also possible that this binary
Mar 16 14:47:56 serverx mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Mar 16 14:47:56 serverx mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Mar 16 14:47:56 serverx mysqld: We will try our best to scrape up some info that will hopefully help
Mar 16 14:47:56 serverx mysqld: diagnose the problem, but since we have already crashed,
Mar 16 14:47:56 serverx mysqld: something is definitely wrong and this may fail.
Mar 16 14:47:56 serverx mysqld: Please help us make Percona Server better by reporting any
Mar 16 14:47:56 serverx mysqld: bugs at http://bugs.percona.com/
Mar 16 14:47:56 serverx mysqld:
Mar 16 14:47:56 serverx mysqld: key_buffer_size=8388608
Mar 16 14:47:56 serverx mysqld: read_buffer_size=131072
Mar 16 14:47:56 serverx mysqld: max_used_connections=8753
Mar 16 14:47:56 serverx mysqld: max_threads=501
Mar 16 14:47:56 serverx mysqld: thread_count=8667
Mar 16 14:47:56 serverx mysqld: connection_count=8667
Mar 16 14:47:56 serverx mysqld: It is possible that mysqld could use up to
Mar 16 14:47:56 serverx mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 345277 K bytes o
f memory
Mar 16 14:47:56 serverx mysqld: Hope that's ok; if not, decrease some variables in the equation.
Mar 16 14:47:56 serverx mysqld:
Mar 16 14:47:56 serverx mysqld: Thread pointer: 0x7fa575d07040
Mar 16 14:47:56 serverx mysqld: Attempting backtrace. You can use the following information to find out
Mar 16 14:47:56 serverx mysqld: where mysqld died. If you see no messages after this, something went
Mar 16 14:47:56 serverx mysqld: terribly wrong...
Mar 16 14:47:56 serverx mysqld: stack_bottom = 7fa66ac86d68 thread_stack 0x40000
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8cc57e]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x4a1)[0x69d6a1]
Mar 16 14:47:56 serverx mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7fce36e94030]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld[0x962252]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld[0x87e7a5]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13finish_commitEP3THD+0x93)[0x884ec3]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG14ordered_commitEP3THDbb+0x21b)[0x889aeb]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG6commitEP3THDb+0x1a5)[0x889d55]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z15ha_commit_transP3THDbb+0x237)[0x5ff177]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z12trans_commitP3THD+0x33)[0x789483]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1c52)[0x70e982]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x5b8)[0x711e48]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xf52)[0x712e22]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(_Z26threadpool_process_requestP3THD+0xc7)[0x7908a7]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld[0x7917c5]
Mar 16 14:47:56 serverx mysqld: /usr/sbin/mysqld(pfs_spawn_thread+0x143)[0x963d93]
Mar 16 14:47:56 serverx mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7fce36e8bb50]
Mar 16 14:47:56 serverx mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fce355ada7d]

Tags:

Related branches

lp:~sergei.glushchenko/percona-server/5.6-ps-bug1309026

Merged into lp:percona-server/5.6 at revision 612

Laurynas Biveinis (community): Approve on 2014-06-11

Laurynas Biveinis (laurynas-biveinis) on 2014-04-17

Changed in percona-server:
assignee:	nobody → Laurynas Biveinis (laurynas-biveinis)

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2014-04-17:

(gdb) print event_name_array
$39 = (PFS_single_stat *) 0x67
(gdb) print thread->m_instr_class_waits_stats
$40 = (PFS_single_stat *) 0x67

No idea yet why it's corrupted.

Laurynas Biveinis (laurynas-biveinis) on 2014-04-18

tags:

added: i40387

Revision history for this message

Aurimas Mikalauskas (aurimas-mikalauskas) wrote on 2014-04-23:

For the record, same crash was reproduced with Performance Schema disabled and the problem on a reproducible case on the customer system was reduced to:

CREATE DATABASE ms ;
CREATE TABLE t1 (a INT PRIMARY KEY AUTO_INCREMENT, b INT, c VARCHAR(50)) ;

- Put the following in q.sql:
SELECT FLOOR(RAND() * 1000000), MD5(RAND()), FLOOR(RAND() * 10000) INTO @ri, @rs, @rid
INSERT INTO t1 (b, c) VALUES (@ri, @rs)
INSERT INTO t1 (b, c) VALUES (@ri, @rs)
INSERT INTO t1 (b, c) VALUES (@ri, @rs)
INSERT INTO t1 (b, c) VALUES (@ri, @rs)
UPDATE t1 SET c = 'UPDATED' WHERE a = @rid

- And run:
mysqlslap --create-schema=ms -c 50 -i1000000 -q q.sql

This crashes the server within seconds when I run it with the thread-pool enabled. It is yet to crash the server w/o thread-pool.

Laurynas Biveinis (laurynas-biveinis) on 2014-05-02

Changed in percona-server:
assignee:	Laurynas Biveinis (laurynas-biveinis) → Sergei Glushchenko (sergei.glushchenko)

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2014-05-13:

See the duplicate bug 1318609 for analysis.

Sergei, is this an upstream issue, does it need reporting there?

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2014-05-19:

Download full text (6.2 KiB)

There is a commit which intended to fix similar issue for debug builds.

        revno: 0.18338.68
        committer: Andrei Elkin <email address hidden>
        branch nick: 5.6-fixes
        timestamp: Wed 2012-11-21 09:45:32 +0200
        message:
          Bug#15867985 STRESS TESTS CAUSES DBUG SERVER TO EXIT WITH SIGNAL 6

          When run with --mysqld=--binlog-order-commits=1 (default) mysqld executes a
          piece of BGC logics that tames a group of threads in a way that they give up
          their commit execution path to a leader thread.
          This preemption appeared to have a specific to DBUG_ON build flaw in that
          the leader and a follower thread could modify a thread-specific area
          concurrently which led to a segfault.

Fixed with refining the preemption logics to be thread-safe for the DBUG_ON build.

bzr diff -c0.18338.68 sergei/bzr/issue40387/483
=== modified file 'sql/binlog.cc'
--- sql/binlog.cc 2012-11-15 03:35:25 +0000
+++ sql/binlog.cc 2012-11-21 07:45:32 +0000
@@ -1334,6 +1334,16 @@
   if (!leader)
   {
     mysql_mutex_lock(&m_lock_done);
+#ifndef DBUG_OFF
+ /*
+ Leader can be awaiting all-clear to preempt follower's execution.
+ With setting the status the follower ensures it won't execute anything
+ including thread-specific code.
+ */
+ thd->transaction.flags.ready_preempt= 1;
+ if (leader_await_preempt_status)
+ mysql_cond_signal(&m_cond_preempt);
+#endif
     while (thd->transaction.flags.pending)
       mysql_cond_wait(&m_cond_done, &m_lock_done);
     mysql_mutex_unlock(&m_lock_done);
@@ -1361,6 +1371,22 @@
   DBUG_RETURN(result);
}

+#ifndef DBUG_OFF
+void Stage_manager::clear_preempt_status(THD *head)
+{
+ DBUG_ASSERT(head);
+
+ mysql_mutex_lock(&m_lock_done);
+ while(!head->transaction.flags.ready_preempt)
+ {
+ leader_await_preempt_status= true;
+ mysql_cond_wait(&m_cond_preempt, &m_lock_done);
+ }
+ leader_await_preempt_status= false;
+ mysql_mutex_unlock(&m_lock_done);
+}
+#endif
+
/**
Write a rollback record of the transaction to the binary log.

There is a commit which intended to fix similar issue for debug builds.

revno: 0.18338.68
        committer: Andrei Elkin <andrei.elkin@oracle.com>
        branch nick: 5.6-fixes
        timestamp: Wed 2012-11-21 09:45:32 +0200
        message:
          Bug#15867985  STRESS TESTS CAUSES DBUG SERVER TO EXIT WITH SIGNAL 6

Fixed with refining the preemption logics to be thread-safe for the DBUG_ON build.

bzr diff -c0.18338.68                                                                sergei/bzr/issue40387/483
=== modified file 'sql/binlog.cc'
--- sql/binlog.cc	2012-11-15 03:35:25 +0000
+++ sql/binlog.cc	2012-11-21 07:45:32 +0000
@@ -1334,6 +1334,16 @@
   if (!leader)
   {
     mysql_mutex_lock(&m_lock_done);
+#ifndef DBUG_OFF
+    /*
+      Leader can be awaiting all-clear to preempt follower's execution.
+      With setting the status the follower ensures it won't execute anything
+      including thread-specific code.
+    */
+    thd->transaction.flags.ready_preempt= 1;
+    if (leader_await_preempt_status)
+      mysql_cond_signal(&m_cond_preempt);
+#endif
     while (thd->transaction.flags.pending)
       mysql_cond_wait(&m_cond_done, &m_lock_done);
     mysql_mutex_unlock(&m_lock_done);
@@ -1361,6 +1371,22 @@
   DBUG_RETURN(result);
 }

+#ifndef DBUG_OFF
+void Stage_manager::clear_preempt_status(THD *head)
+{
+  DBUG_ASSERT(head);
+
+  mysql_mutex_lock(&m_lock_done);
+  while(!head->transaction.flags.ready_preempt)
+  {
+    leader_await_preempt_status= true;
+    mysql_cond_wait(&m_cond_preempt, &m_lock_done);
+  }
+  leader_await_preempt_status= false;
+  mysql_mutex_unlock(&m_lock_done);
+}
+#endif
+
 /**
   Write a rollback record of the transaction to the binary log.

@@ -6009,6 +6035,9 @@
 {
   mysql_mutex_assert_owner(&LOCK_commit);
   Thread_excursion excursion(thd);
+#ifndef DBUG_OFF
+  thd->transaction.flags.ready_preempt= 1; // formality by the leader
+#endif
   for (THD *head= first ; head ; head = head->next_to_commit)
   {
     DBUG_PRINT("debug", ("Thread ID: %lu, commit_error: %d, flags.pending: %s",
@@ -6022,6 +6051,9 @@
       If flush succeeded, attach to the session and commit it in the
       engines.
     */
+#ifndef DBUG_OFF
+    stage_manager.clear_preempt_status(head);
+#endif
     if (flush_error != 0)
       head->commit_error= flush_error;
     else if (int error= excursion.attach_to(head))
@@ -6031,6 +6063,9 @@
       bool all= head->transaction.flags.real_commit;
       if (head->transaction.flags.commit_low)
       {
+        /* head is parked to have exited append() */
+        DBUG_ASSERT(head->transaction.flags.ready_preempt);
+
         if (int error= ha_commit_low(head, all))
           head->commit_error= error;
         else if (head->transaction.flags.xid_written)
@@ -6269,6 +6304,17 @@
   thd->transaction.flags.real_commit= all;
   thd->transaction.flags.xid_written= false;
   thd->transaction.flags.commit_low= !skip_commit;
+#ifndef DBUG_OFF
+  /*
+     The group commit Leader may have to wait for follower whose transaction
+     is not ready to be preempted. Initially the status is pessimistic.
+     Preemption guarding logics is necessary only when DBUG_ON is set.
+     It won't be required for the dbug-off case as long as the follower won't
+     execute any thread-specific write access code in this method, which is
+     the case as of current.
+  */
+  thd->transaction.flags.ready_preempt= 0;
+#endif

DBUG_PRINT("enter", ("flags.pending: %s, commit_error: %d, thread_id: %lu",
                        YESNO(thd->transaction.flags.pending),

=== modified file 'sql/binlog.h'
--- sql/binlog.h	2012-09-18 12:32:43 +0000
+++ sql/binlog.h	2012-11-21 07:45:32 +0000
@@ -119,6 +119,10 @@
   {
     mysql_mutex_init(key_LOCK_done, &m_lock_done, MY_MUTEX_INIT_FAST);
     mysql_cond_init(key_COND_done, &m_cond_done, NULL);
+#ifndef DBUG_OFF
+    /* reuse key_COND_done 'cos a new PSI object would be wasteful in DBUG_ON */
+    mysql_cond_init(key_COND_done, &m_cond_preempt, NULL);
+#endif
     m_queue[FLUSH_STAGE].init(
 #ifdef HAVE_PSI_INTERFACE
                               key_LOCK_flush_queue
@@ -155,6 +159,7 @@
     If wait_if_follower is true the thread is not the stage leader,
     the thread will be wait for the queue to be processed by the
     leader before it returns.
+    In DBUG-ON version the follower marks is preempt status as ready.

@param stage Stage identifier for the queue to append to.
     @param first Queue to append.
@@ -172,6 +177,17 @@
     return m_queue[stage].pop_front();
   }

+#ifndef DBUG_OFF
+  /**
+     The method ensures the follower's execution path can be preempted
+     by the leader's thread.
+     Preempt status of @c head follower is checked to engange the leader
+     into waiting when set.
+
+     @param head  THD* of a follower thread
+  */
+  void clear_preempt_status(THD *head);
+#endif

/**
     Fetch the entire queue and empty it.
@@ -206,6 +222,13 @@

/** Mutex used for the condition variable above */
   mysql_mutex_t m_lock_done;
+#ifndef DBUG_OFF
+  /** Flag is set by Leader when it starts waiting for follower's all-clear */
+  bool leader_await_preempt_status;
+
+  /** Condition variable to indicate a follower started waiting for commit */
+  mysql_cond_t m_cond_preempt;
+#endif
 };

=== modified file 'sql/sql_class.h'
--- sql/sql_class.h	2012-11-20 13:08:53 +0000
+++ sql/sql_class.h	2012-11-21 07:45:32 +0000
@@ -2352,6 +2352,9 @@
       bool xid_written:1;               // The session wrote an XID
       bool real_commit:1;               // Is this a "real" commit?
       bool commit_low:1;                // see MYSQL_BIN_LOG::ordered_commit
+#ifndef DBUG_OFF
+      bool ready_preempt:1;             // internal in MYSQL_BIN_LOG::ordered_commit
+#endif
     } flags;

void cleanup()

With this logic enabled there is no crash.

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2014-05-19:

I think that queries and threads become ordered such way to trigger the crash easily with threadpool enabled. I think issue is not threadpool-specific, but is much easier to reproduce with threadpool. So far I had no luck reproducing the crash without threadpool. DEBUG_SYNC is also not much of use because group commit messing with THD and PSI of current thread.

As a workaround we can disable PSI instrumentation of follower thread until leader not done with commit. Just set PSI_thread to NULL in following method:

Stage_manager::enroll_for(StageID stage, THD *thd, mysql_mutex_t *stage_mutex)
{
  // If the queue was empty: we're the leader for this batch
  DBUG_PRINT("debug", ("Enqueue 0x%llx to queue for stage %d",
                       (ulonglong) thd, stage));
  bool leader= m_queue[stage].append(thd);

  /*
    The stage mutex can be NULL if we are enrolling for the first
    stage.
  */
  if (stage_mutex)
    mysql_mutex_unlock(stage_mutex);

  /*
    If the queue was not empty, we're a follower and wait for the
    leader to process the queue. If we were holding a mutex, we have
    to release it before going to sleep.
  */
  if (!leader)
  {
    mysql_mutex_lock(&m_lock_done);
#ifndef DBUG_OFF
    /*
      Leader can be awaiting all-clear to preempt follower's execution.
      With setting the status the follower ensures it won't execute anything
      including thread-specific code.
    */
    thd->transaction.flags.ready_preempt= 1;
    if (leader_await_preempt_status)
      mysql_cond_signal(&m_cond_preempt);
#endif
    while (thd->transaction.flags.pending) {
      mysql_cond_wait(&m_cond_done, &m_lock_done);
    }
    mysql_mutex_unlock(&m_lock_done);
  }
  return leader;
}

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2014-05-19:

Reported upstream based on code analysis http://bugs.mysql.com/bug.php?id=72681

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2014-05-20:

Sergei -

Great work on analysis. Would you be able to check whether 5.5 (with a different group commit implementation and less PFS instrumentation) has a corresponding bug too?

tags:

added: upstream

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2014-05-20:

PS 5.5 is not affected. Group commit implementation here does not "attach" leader to follower's threads.

Revision history for this message

Aurimas Mikalauskas (aurimas-mikalauskas) wrote on 2014-06-10:

Sergei, Laurynas - I see Oracle has made the bug private. Any other way to know if they intend to fix it (besides scanning changelogs) ? Are we submitting our own fix to PS 5.6 or waiting for an upstream fix ?

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2014-06-11:

#10

We will have our own fix for the time being.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#12

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-782

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1318609

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

mysql-bugs #72681 Edit

Bug watches keep track of this bug in other bug trackers.