Bug #1191375 “Percona Server 5.6 RC with thread pool crashes und...” : Bugs : Percona Server moved to https://jira.percona.com/projects/PS

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-06-18:

#1

consider following scenario
* thread t1 is running
* t1 calls wait_begin -> wake_or_create_thread -> create_worker ->
                      -> mysql_thread_create -> spawn_thread_v1
  current thread's PFS_thread* remembered here as parent thread's
  PFS_thread* for newly created thread t2
* t1 finishes it's wait, sends result to client and finishes itself
* t2 is just about to start, it tries to attach instrumentation to
  itself. It wants to access PFS_thread* of t1 which is no longer
  available.
this scenario is impossible in Percona Server 5.5 as pointers
to parent thread's structures are not used
I'll try to reproduce it with DEBUG_SYNC or using other tricks

Changed in percona-server:
assignee:	nobody → Sergei Glushchenko (sergei.glushchenko)

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-06-19:

#2

It is usual for threadpool's worker threads to spawn new treads. Also parent thread can easily be terminated shortly after it spawned a child. So, the proper fix would be to introduce some kind of synchronization here.

I see couple options.

1. Add atomic variable to PFS_thread which would indicate that structure is in use. Just before child thread is spawned we increment this variable. Once new thread copied all needed information, we decrement this counter. When it is time to destroy parent thread we wait for counter became 0. We can use spin and check for it, or we can use introduce conditional variable.

2. Extend PFS_spawn_thread_arg structure and copy all the needed information from parent thread to it just before creating the new thread. pfs_spawn_thread will use copy stored in PFS_spawn_thread_arg. Additional fields for PFS_spawn_thread_arg are:

  /** Internal thread identifier, unique. */
  ulonglong m_thread_internal_id;
  /** Parent internal thread identifier. */
  ulonglong m_parent_thread_internal_id;
  /** User name. */
  char m_username[USERNAME_LENGTH];
  /** Length of @c m_username. */
  uint m_username_length;
  /** Host name. */
  char m_hostname[HOSTNAME_LENGTH];
  /** Length of @c m_hostname. */
  uint m_hostname_length;

I like second approach more.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-06-20:

#3

Option 2) looks like it'd be less error-prone to implement as well.

Roel Van de Paar (roel11) on 2013-09-06

tags:

added: tp

Roel Van de Paar (roel11) on 2013-09-23

tags:

added: 56qual

Revision history for this message

Roel Van de Paar (roel11) wrote on 2013-10-01:

#4

--56qual as tp is beta

tags:

removed: 56qual

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-11-10:

#5

    The stack trace in description points to access violation at the
    pfs->m_parent_thread_internal_id= parent->m_thread_internal_id; in
    pfs_spawn_thread.

    I've tracked this bug down and can explain it pretty well. Since
    PFS designed to handle only limited amount of threads all the
    threads above this number will not have THR_PFS assigned
    (i.e. create_thread in pfs_spawn_thread can return NULL and it
    will be assigned to thread-specific THR_PFS). This however did not
    taken into account neither in spawn_thread_v1 (where NULL can be
    assigned to psi_arg->m_parent_thread) nor pfs_spawn_thread itself
    (where NULL-valued psi_arg->m_parent_thread can be accessed).
    This probably not the issue for upstream since new threads
    commonly spawned by "old" threads which likely have their THR_PFS
    keys assigned. But with threadpool enabled every worker thread can
    spawn new thread when it about to wait. I saw this kind of
    failures, but it is hard to construct MTR test case for them.

    My previous assumption was that parent spawned a child thread,
    finished it's wait and stopped, while child was too slow to finish
    it's initialization in the meantime. This is also possible but
    I've never seen this in practice due to thread_pool_idle_timeout
    and threadpool internal logic.

This bug was fixed upstream (rev. 0.18870.1) with following
message:

Bug#16939689 VALGRIND ERROR IN PFS_SPAWN_THREAD

Before this fix, valgrind errors could be reported under stress tests.

In particular, the following line in pfs_spawn_thread()
memcpy(pfs->m_username, parent->m_username, sizeof(pfs->m_username));

could cause memcpy to be called for the same source and destination,
which should not happen.

    The root cause, by analysis, is that the code is reading attributes
    from the parent thread, while the parent thread instrumentation is already
    destroyed (and was reused for the child, triggering valgrind complaints).

    The fix is to capture all the parent attributes needed:
    - the thread internal id
    - the thread user name
    - the thread host name
    in the parent thread, before spawning a child,
    and use that to initialize the child instrumentation,
    instead of pasing a pointer that might become invalid.

The stack trace in description points to access violation at the
    pfs->m_parent_thread_internal_id= parent->m_thread_internal_id; in
    pfs_spawn_thread.

I've tracked this bug down and can explain it pretty well. Since
    PFS designed to handle only limited amount of threads all the
    threads above this number will not have THR_PFS assigned
    (i.e. create_thread in pfs_spawn_thread can return NULL and it
    will be assigned to thread-specific THR_PFS). This however did not
    taken into account neither in spawn_thread_v1 (where NULL can be
    assigned to psi_arg->m_parent_thread) nor pfs_spawn_thread itself
    (where NULL-valued psi_arg->m_parent_thread can be accessed).
    This probably not the issue for upstream since new threads
    commonly spawned by "old" threads which likely have their THR_PFS
    keys assigned. But with threadpool enabled every worker thread can
    spawn new thread when it about to wait. I saw this kind of
    failures, but it is hard to construct MTR test case for them.

My previous assumption was that parent spawned a child thread,
    finished it's wait and stopped, while child was too slow to finish
    it's initialization in the meantime. This is also possible but
    I've never seen this in practice due to thread_pool_idle_timeout
    and threadpool internal logic.

This bug was fixed upstream (rev. 0.18870.1) with following
    message:

Bug#16939689 VALGRIND ERROR IN PFS_SPAWN_THREAD

Before this fix, valgrind errors could be reported under stress tests.

In particular, the following line in pfs_spawn_thread()
    memcpy(pfs->m_username, parent->m_username, sizeof(pfs->m_username));

could cause memcpy to be called for the same source and destination,
    which should not happen.

The root cause, by analysis, is that the code is reading attributes
    from the parent thread, while the parent thread instrumentation is already
    destroyed (and was reused for the child, triggering valgrind complaints).

The fix is to capture all the parent attributes needed:
    - the thread internal id
    - the thread user name
    - the thread host name
    in the parent thread, before spawning a child,
    and use that to initialize the child instrumentation,
    instead of pasing a pointer that might become invalid.

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-11-10:

#6

Marking as Invalid because fix already released upstream and incorporated to PS.

Laurynas Biveinis (laurynas-biveinis) on 2013-11-11

tags:

added: upstream

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#7

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-677

	Status	Importance	Assigned to	Milestone
Percona Server moved to https://jira.percona.com/projects/PS	Fix Released	High	Unassigned	Percona Server moved to https://jira.percona.com/projects/PS 5.6.13-60.5
5.1	Invalid	Undecided	Unassigned
5.5	Invalid	Undecided	Unassigned
5.6	Fix Released	High	Unassigned	Percona Server moved to https://jira.percona.com/projects/PS 5.6.13-60.5

Percona Server moved to https://jira.percona.com/projects/PS

Percona Server 5.6 RC with thread pool crashes under load

Bug Description

Related branches

Other bug subscribers

Remote bug watches