Multi-threaded luaJIT application hangs; apparent deadlock in GLIBC
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Released
|
Undecided
|
Unassigned | ||
glibc (Ubuntu) |
Fix Released
|
High
|
Taco Screen team | ||
Xenial |
Fix Released
|
High
|
Adam Conrad |
Bug Description
---Problem Description---
Multi-threaded luaJIT application hangs due to apparent deadlock in GLIBC.
---uname output---
Linux p10a102 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:00:57 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
---Steps to Reproduce---
Build luaJIT + Torch and run the following lua program:
local Threads = require 'threads'
nthreads = 8
thrds = Threads(nthreads,
function() print('Starting thread ') end,
function() require 'image' end
);
thrds:synchronize()
print "Done"
Userspace tool common name: GLIBC
The userspace tool has the following bit modes: 64-bit
Userspace package: GLIBC 2.23
Userspace tool obtained from project website: https:/
Here's a sample run of the lua application and stack backtraces for all the threads. You can all the worker threads are in GLIBC's __lll_lock_
Problem is easily recreatable, and is more likely to strike as the number of threads grows.
I can provide core file, or package with the luajit/torch binaries, etc. as needed.
$ which luajit
/opt/DL/
$ luajit -v
LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://
$ cat t.lua
local Threads = require 'threads'
nthreads = 8
thrds = Threads(nthreads,
function() print('Starting thread ') end,
function() require 'image' end
);
thrds:synchronize()
print "Done"
$ gdb /opt/DL/
GNU gdb (Ubuntu 7.11.1-
[...]
(gdb) run t.lua
Starting program: /opt/DL/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64
[New Thread 0x3ffd3693f1a0 (LWP 22251)]
[New Thread 0x3ffd3611f1a0 (LWP 22252)]
[New Thread 0x3ffd358ff1a0 (LWP 22253)]
[New Thread 0x3ffd350df1a0 (LWP 22254)]
[New Thread 0x3ffd348bf1a0 (LWP 22255)]
[New Thread 0x3ffd27fff1a0 (LWP 22256)]
[New Thread 0x3ffd277ff1a0 (LWP 22257)]
[New Thread 0x3ffd26fff1a0 (LWP 22258)]
Starting thread
Starting thread
Starting thread
Starting thread
Starting thread
Starting thread
Starting thread
Starting thread
^C
Thread 1 "luajit" received signal SIGINT, Interrupt.
0x00003fffb7e1127c in __pthread_cond_wait (cond=0x10095ba0, mutex=0x10095ad0) at pthread_
186 pthread_
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x3fffb7ff68a0 (LWP 22248) "luajit" 0x00003fffb7e1127c in __pthread_cond_wait (cond=0x10095ba0, mutex=0x10095ad0) at pthread_
2 Thread 0x3ffd3693f1a0 (LWP 22251) "luajit" 0x00003fffb7d2c460 in __lll_lock_
3 Thread 0x3ffd3611f1a0 (LWP 22252) "luajit" 0x00003fffb7e15fa8 in __lll_lock_wait (futex=0x200, private=<optimized out>) at lowlevellock.c:46
4 Thread 0x3ffd358ff1a0 (LWP 22253) "luajit" 0x00003fffb7d2c460 in __lll_lock_
5 Thread 0x3ffd350df1a0 (LWP 22254) "luajit" 0x00003fffb7d2c460 in __lll_lock_
6 Thread 0x3ffd348bf1a0 (LWP 22255) "luajit" 0x00003fffb7d2c460 in __lll_lock_
7 Thread 0x3ffd27fff1a0 (LWP 22256) "luajit" 0x00003fffb7d2c460 in __lll_lock_
8 Thread 0x3ffd277ff1a0 (LWP 22257) "luajit" 0x00003fffb7d2c460 in __lll_lock_
9 Thread 0x3ffd26fff1a0 (LWP 22258) "luajit" 0x00003fffb7d2c408 in __lll_lock_
(gdb) thread apply all where
Thread 9 (Thread 0x3ffd26fff1a0 (LWP 22258)):
#0 0x00003fffb7d2c408 in __lll_lock_
#1 0x00003fffb7c7e2d4 in _IO_flush_all_lockp (do_lock=<optimized out>) at genops.c:777
#2 0x00003fffb7c7e63c in __GI__IO_flush_all () at genops.c:817
#3 0x00003fffb7c687a4 in __GI__IO_fflush (fp=<optimized out>) at iofflush.c:34
[...]
#12 0x00000000100460f8 in lua_pcall ()
#13 0x00003ffd36940ad8 in THThread_main () from /opt/DL/
#14 0x00003fffb7e084a0 in start_thread (arg=0x3ffd26ff
#15 0x00003fffb7d17e74 in clone () at ../sysdeps/
Thread 8 (Thread 0x3ffd277ff1a0 (LWP 22257)):
#0 0x00003fffb7d2c460 in __lll_lock_
#1 0x00003fffb7c87c28 in malloc_atfork (sz=65536, caller=<optimized out>) at arena.c:179
#2 0x00003fffb7c88034 in __GI___libc_malloc (bytes=<optimized out>) at malloc.c:2910
#3 0x00003fffb7c67e8c in __GI__IO_
#4 0x00003fffb7c7ce74 in __GI__IO_doallocbuf (fp=0x3ffd180037c0) at genops.c:398
#5 0x00003fffb7c7b77c in _IO_new_
#6 0x00003fffb7c7d2c4 in __GI___underflow (fp=0x3ffd180037c0) at genops.c:342
#7 __GI__IO_
#8 0x00003fffb7c7d168 in __GI__IO_sgetn (fp=<optimized out>, data=<optimized out>, n=<optimized out>) at genops.c:467
#9 0x00003fffb7c696f4 in __GI__IO_fread (buf=0x3ffd2602
[...]
#18 0x00000000100460f8 in lua_pcall ()
#19 0x00003ffd36940ad8 in THThread_main () from /opt/DL/
#20 0x00003fffb7e084a0 in start_thread (arg=0x3ffd277f
#21 0x00003fffb7d17e74 in clone () at ../sysdeps/
Thread 7 (Thread 0x3ffd27fff1a0 (LWP 22256)):
#0 0x00003fffb7d2c460 in __lll_lock_
#1 0x00003fffb7c800e8 in ptmalloc_lock_all () at arena.c:235
#2 0x00003fffb7ccea44 in __libc_fork () at ../sysdeps/
#3 0x00003fffb7c6b0fc in _IO_new_proc_open (fp=0x3ffd140037c0, command=
#4 0x00003fffb7c6b4f8 in _IO_new_popen (command=
[...]
#13 0x00000000100460f8 in lua_pcall ()
#14 0x00003ffd36940ad8 in THThread_main () from /opt/DL/
#15 0x00003fffb7e084a0 in start_thread (arg=0x3ffd27ff
#16 0x00003fffb7d17e74 in clone () at ../sysdeps/
Thread 6 (Thread 0x3ffd348bf1a0 (LWP 22255)):
#0 0x00003fffb7d2c460 in __lll_lock_
#1 0x00003fffb7c87c28 in malloc_atfork (sz=63, caller=<optimized out>) at arena.c:179
#2 0x00003fffb7c88034 in __GI___libc_malloc (bytes=<optimized out>) at malloc.c:2910
#3 0x00003fffb7fbb39c in _dl_signal_error (errcode=0, objname=
errstring=
#4 0x00003fffb7fbb640 in _dl_signal_cerror (errcode=<optimized out>, objname=
errstring=
#5 0x00003fffb7fb424c in _dl_lookup_symbol_x (undef_
type_
#6 0x00003fffb7d6fd34 in call_dl_lookup (ptr=0x3ffd348b
#7 0x00003fffb7fbb6e8 in _dl_catch_error (objname=
at dl-error.c:187
#8 0x00003fffb7d701ac in do_sym (handle=0x0, name=0x3ffd260f9470 "luaJIT_
#9 0x00003fffb7f3138c in dlsym_doit (a=0x3ffd348bdb50) at dlsym.c:50
#10 0x00003fffb7fbb6e8 in _dl_catch_error (objname=
at dl-error.c:187
#11 0x00003fffb7f31cc8 in _dlerror_run (operate=
#12 0x00003fffb7f31438 in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:70
[...]
#23 0x00000000100460f8 in lua_pcall ()
#24 0x00003ffd36940ad8 in THThread_main () from /opt/DL/
#25 0x00003fffb7e084a0 in start_thread (arg=0x3ffd348b
#26 0x00003fffb7d17e74 in clone () at ../sysdeps/
Thread 5 (Thread 0x3ffd350df1a0 (LWP 22254)):
#0 0x00003fffb7d2c460 in __lll_lock_
#1 0x00003fffb7c7f194 in __GI__IO_list_lock () at genops.c:1210
#2 0x00003fffb7ccea70 in __libc_fork () at ../sysdeps/
#3 0x00003fffb7c6b0fc in _IO_new_proc_open (fp=0x3ffd200037c0, command=
#4 0x00003fffb7c6b4f8 in _IO_new_popen (command=
[...]
#13 0x00000000100460f8 in lua_pcall ()
#14 0x00003ffd36940ad8 in THThread_main () from /opt/DL/
#15 0x00003fffb7e084a0 in start_thread (arg=0x3ffd350d
#16 0x00003fffb7d17e74 in clone () at ../sysdeps/
Thread 4 (Thread 0x3ffd358ff1a0 (LWP 22253)):
#0 0x00003fffb7d2c460 in __lll_lock_
#1 0x00003fffb7c800e8 in ptmalloc_lock_all () at arena.c:235
#2 0x00003fffb7ccea44 in __libc_fork () at ../sysdeps/
#3 0x00003fffb7c6b0fc in _IO_new_proc_open (fp=0x3ffd2c0037c0, command=
#4 0x00003fffb7c6b4f8 in _IO_new_popen (command=
[...]
#13 0x00000000100460f8 in lua_pcall ()
#14 0x00003ffd36940ad8 in THThread_main () from /opt/DL/
#15 0x00003fffb7e084a0 in start_thread (arg=0x3ffd358f
#16 0x00003fffb7d17e74 in clone () at ../sysdeps/
Thread 3 (Thread 0x3ffd3611f1a0 (LWP 22252)):
#0 0x00003fffb7e15fa8 in __lll_lock_wait (futex=0x200, private=<optimized out>) at lowlevellock.c:46
#1 0x00003fffb7e0bdec in __GI___
#2 0x00003fffb7f31424 in __dlsym (handle=0x0, name=<optimized out>) at dlsym.c:68
[...]
#11 0x00000000100460f8 in lua_pcall ()
#12 0x00003ffd36940ad8 in THThread_main () from /opt/DL/
#13 0x00003fffb7e084a0 in start_thread (arg=0x3ffd3611
#14 0x00003fffb7d17e74 in clone () at ../sysdeps/
Thread 2 (Thread 0x3ffd3693f1a0 (LWP 22251)):
#0 0x00003fffb7d2c460 in __lll_lock_
#1 0x00003fffb7c87c28 in malloc_atfork (sz=65536, caller=<optimized out>) at arena.c:179
#2 0x00003fffb7c88034 in __GI___libc_malloc (bytes=<optimized out>) at malloc.c:2910
#3 0x00003fffb7c67e8c in __GI__IO_
#4 0x00003fffb7c7ce74 in __GI__IO_doallocbuf (fp=0x3ffd30003bd0) at genops.c:398
#5 0x00003fffb7c7b77c in _IO_new_
#6 0x00003fffb7c7d2c4 in __GI___underflow (fp=0x3ffd30003bd0) at genops.c:342
#7 __GI__IO_
#8 0x00003fffb7c7d168 in __GI__IO_sgetn (fp=<optimized out>, data=<optimized out>, n=<optimized out>) at genops.c:467
#9 0x00003fffb7c696f4 in __GI__IO_fread (buf=0x3ffd2617
[...]
#18 0x00000000100460f8 in lua_pcall ()
#19 0x00003ffd36940ad8 in THThread_main () from /opt/DL/
#20 0x00003fffb7e084a0 in start_thread (arg=0x3ffd3693
#21 0x00003fffb7d17e74 in clone () at ../sysdeps/
Thread 1 (Thread 0x3fffb7ff68a0 (LWP 22248)):
#0 0x00003fffb7e1127c in __pthread_cond_wait (cond=0x10095ba0, mutex=0x10095ad0) at pthread_
#1 0x00003fffb7bc5308 in THCondition_wait () from /opt/DL/
#2 0x00003fffb7bc269c in ?? () from /opt/DL/
#3 0x000000001005a930 in ?? ()
#4 0x00000000100460f8 in lua_pcall ()
#5 0x0000000010006884 in ?? ()
#6 0x000000001005a930 in ?? ()
#7 0x00000000100461f0 in lua_cpcall ()
#8 0x00000000100041a8 in main ()
This is caused by glibc bug https:/
Ubuntu 16.04 is missing the following patches (already backported to release/
commit 888d9a0146b4b83
Author: Florian Weimer <email address hidden>
Date: Thu Apr 14 12:53:03 2016 +0200
malloc: Add missing internal_function attributes on function definitions
Fixes build on i386 after commit 29d794863cd6e03
(cherry picked from commit 186fe877f3df0b8
commit 927170dd59787d9
Author: Florian Weimer <email address hidden>
Date: Thu Apr 14 09:18:30 2016 +0200
malloc: Remove malloc hooks from fork handler
The fork handler now runs so late that there is no risk anymore that
other fork handlers in the same thread use malloc, so it is no
longer necessary to install malloc hooks which made a subset
of malloc functionality available to the thread that called fork.
(cherry picked from commit 8a727af925be63a
commit 2a71cf409681b89
Author: Florian Weimer <email address hidden>
Date: Thu Apr 14 09:17:02 2016 +0200
malloc: Run fork handler as late as possible [BZ #19431]
Previously, a thread M invoking fork would acquire locks in this order:
(M1) malloc arena locks (in the registered fork handler)
(M2) libio list lock
A thread F invoking flush (NULL) would acquire locks in this order:
(F1) libio list lock
(F2) individual _IO_FILE locks
A thread G running getdelim would use this order:
(G1) _IO_FILE lock
(G2) malloc arena lock
After executing (M1), (F1), (G1), none of the threads can make progress.
This commit changes the fork lock order to:
(M'1) libio list lock
(M'2) malloc arena locks
It explicitly encodes the lock order in the implementations of fork,
and does not rely on the registration order, thus avoiding the deadlock.
(cherry picked from commit 29d794863cd6e03
commit a5c2f42566460fc
Author: Samuel Thibault <email address hidden>
Date: Tue Mar 22 09:58:48 2016 +0100
Fix malloc threaded tests link on non-Linux
* malloc/Makefile (tst-malloc-
tst-
instead of hardcoding the path to libpthread.
(cherry picked from commit b87e41378beca3c
commit f69ae17e843b00d
Author: Florian Weimer <email address hidden>
Date: Fri Feb 19 17:07:45 2016 +0100
malloc: Remove NO_THREADS
No functional change. It was not possible to build without
threading support before.
(cherry picked from commit 59eda029a8a35e5
CVE References
tags: | added: architecture-ppc64le bugnameltc-147147 severity-high targetmilestone-inin16041 |
Changed in ubuntu: | |
assignee: | nobody → Taco Screen team (taco-screen-team) |
affects: | ubuntu → glibc (Ubuntu) |
tags: |
added: severity-critical removed: severity-high |
Changed in glibc (Ubuntu): | |
importance: | Undecided → High |
tags: |
added: verification-done removed: verification-needed |
tags: | removed: bugnameltc-147147 severity-critical verification-done |
tags: | added: bugnameltc-147147 severity-critical verification-done |
Changed in ubuntu-power-systems: | |
status: | New → Fix Released |
As I understand it this is fixed upstream in glibc 2.24, which means Ubuntu 16.10 already has the fix; please reopen this task if this is incorrect.