OOPSBz2Serializer can trigger deadlock in python
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu One Servers |
Fix Released
|
Critical
|
Elliot Murphy | ||
wsgi-oops |
Fix Released
|
High
|
Natalia Bidart |
Bug Description
elmo kindly captured some stack frames from 3 hung appservers today, and lifeless did some analysis concluding that there is a locking bug in python. a bug has been filed upstream: http://
<dobey> 13:57:57> hrmm
13:58:08> bzip2 python module crashing python?
<lifeless> 17:09:14> dobey: nope
17:09:21> bz2 is trying to reqacquire the gil
17:11:18> or perhapss
17:11:44> in the 23806/ trace thread 2 is looking for lock 0x7ffb90761ff0 which is oddly high
17:12:29> thread 3 and 2 share the same bz2 object
17:12:48> theory: two python threads are calling 'bz2obj.write'
<lifeless> 17:12:52> statik: ^
<lifeless> 17:13:03> on the same object
* lifeless 17:22:16> checks the source
<dobey> 17:27:50> ah ok
<lifeless> 17:31:26> so
17:31:29> one thread wants the gil
17:31:33> /23805/
17:31:51> Thread 2 (Thread 0x411af950 (LWP 9217)):
17:31:52> #0 0x00007fcd4fd61991 in sem_wait () from /lib/libpthread
17:31:52> #1 0x00000000004b371d in PyThread_
17:31:55> #2 0x0000000000482594 in PyEval_
17:31:58> #3 0x00007fcd47635043 in BZ2File_write (self=0x95495e0, args=<value optimized out>)
17:32:02> at /build/
17:32:14> thats at the end of the bz2 actual compressor
17:32:22> trying to get back to calling python functons
17:32:42> Thread 3 (Thread 0x409ae950 (LWP 9161)):
17:32:42> #0 0x00007fcd4fd61991 in sem_wait () from /lib/libpthread
17:32:42> #1 0x00000000004b371d in PyThread_
17:32:44> #2 0x00007fcd47634fb2 in BZ2File_write (self=0x95495e0, args=<value optimized out>)
17:32:47> at /build/
17:33:04> this thread has the GIL (or things it does, its calling Py functions which only GIL holders are permitted to do)
17:33:15> note that they are for the same bz2 object
17:33:17> now
17:33:35> thread 3 is trying to gain a lock on the object 0x95495e0,
17:33:44> thread 2 holds the lock, but can't release it till it gets the GIL
17:33:58> so, this is as I thought
17:34:12> two threads are calling a_bz2obj.write(), and they are deadlocked.
17:34:16> is there a bug for this issue?
<lifeless> 18:47:41> ok
18:47:50> so a datapoint, doing bz2 in python is very slow for some reason
<MenTaLguY> 18:47:51> it was something new which ~sil was working on recently
18:48:00> hm
<lifeless> 18:48:02> if the new oops code can write uncompressed data that would be better
<statik> guess we oughta back out that bzip stuff in the oops writer for now
<lifeless> statik: just wrapping all writes to the bz2file object in a mutex will avoid it
<lifeless> because the python mutexes are gil-safe
<statik> oh cool
<lifeless> the problem is the gil-release from within a mutex
which the C code does
Related branches
- Rick McBride (community): Approve
- Robert Collins (community): Abstain
- Tim Cole (community): Approve
-
Diff: 120 lines2 files modifiedcanonical/oops/serializer.py (+17/-8)
canonical/oops/tests/test_serializer.py (+22/-7)
- Eric Casteleijn (community): Approve
- Rick McBride (community): Approve
-
Diff: 21 lines1 file modifiedcanonical/oops/wsgi.py (+2/-2)
Changed in ubuntuone-servers: | |
status: | In Progress → Fix Committed |
Changed in wsgi-oops: | |
status: | Triaged → In Progress |
Changed in wsgi-oops: | |
status: | In Progress → Fix Committed |
Changed in wsgi-oops: | |
status: | Fix Committed → Fix Released |
The immediate fix to stop ubuntuone-servers from deadlocking is to switch off the bzip2 serializer, and I'll do that now. The next fix is to introduce a mutex into the bzip2 serializer, and I'll leave that for Naty.
It would be nice to provide a way to configure the oopsmiddleware class so that the calling application could decide what kind of serializer it wanted to use without needing to change code in wsgi-oops itself. I'll leave that for Naty as well :)