OOPSBz2Serializer can trigger deadlock in python

Bug #460800 reported by Elliot Murphy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu One Servers
Fix Released
Critical
Elliot Murphy
wsgi-oops
Fix Released
High
Natalia Bidart

Bug Description

elmo kindly captured some stack frames from 3 hung appservers today, and lifeless did some analysis concluding that there is a locking bug in python. a bug has been filed upstream: http://bugs.python.org/issue7205, but since we are using python2.5 on the servers currently we need to work around this issue in the wsgi-oops code.

<dobey> 13:57:57> hrmm
 13:58:08> bzip2 python module crashing python?
<lifeless> 17:09:14> dobey: nope
 17:09:21> bz2 is trying to reqacquire the gil
 17:11:18> or perhapss
 17:11:44> in the 23806/ trace thread 2 is looking for lock 0x7ffb90761ff0 which is oddly high
 17:12:29> thread 3 and 2 share the same bz2 object
 17:12:48> theory: two python threads are calling 'bz2obj.write'
<lifeless> 17:12:52> statik: ^
<lifeless> 17:13:03> on the same object
* lifeless 17:22:16> checks the source
<dobey> 17:27:50> ah ok
<lifeless> 17:31:26> so
 17:31:29> one thread wants the gil
 17:31:33> /23805/
 17:31:51> Thread 2 (Thread 0x411af950 (LWP 9217)):
 17:31:52> #0 0x00007fcd4fd61991 in sem_wait () from /lib/libpthread.so.0
 17:31:52> #1 0x00000000004b371d in PyThread_acquire_lock (lock=0xb1da30, waitflag=1) at ../Python/thread_pthread.h:349
 17:31:55> #2 0x0000000000482594 in PyEval_RestoreThread (tstate=0x6096620) at ../Python/ceval.c:312
 17:31:58> #3 0x00007fcd47635043 in BZ2File_write (self=0x95495e0, args=<value optimized out>)
 17:32:02> at /build/buildd/python2.5-2.5.2/Modules/bz2module.c:822
 17:32:14> thats at the end of the bz2 actual compressor
 17:32:22> trying to get back to calling python functons
 17:32:42> Thread 3 (Thread 0x409ae950 (LWP 9161)):
 17:32:42> #0 0x00007fcd4fd61991 in sem_wait () from /lib/libpthread.so.0
 17:32:42> #1 0x00000000004b371d in PyThread_acquire_lock (lock=0x930f810, waitflag=1) at ../Python/thread_pthread.h:349
 17:32:44> #2 0x00007fcd47634fb2 in BZ2File_write (self=0x95495e0, args=<value optimized out>)
 17:32:47> at /build/buildd/python2.5-2.5.2/Modules/bz2module.c:801
 17:33:04> this thread has the GIL (or things it does, its calling Py functions which only GIL holders are permitted to do)
 17:33:15> note that they are for the same bz2 object
 17:33:17> now
 17:33:35> thread 3 is trying to gain a lock on the object 0x95495e0,
 17:33:44> thread 2 holds the lock, but can't release it till it gets the GIL
 17:33:58> so, this is as I thought
 17:34:12> two threads are calling a_bz2obj.write(), and they are deadlocked.
 17:34:16> is there a bug for this issue?

<lifeless> 18:47:41> ok
 18:47:50> so a datapoint, doing bz2 in python is very slow for some reason
<MenTaLguY> 18:47:51> it was something new which ~sil was working on recently
 18:48:00> hm
<lifeless> 18:48:02> if the new oops code can write uncompressed data that would be better
<statik> guess we oughta back out that bzip stuff in the oops writer for now
<lifeless> statik: just wrapping all writes to the bz2file object in a mutex will avoid it
<lifeless> because the python mutexes are gil-safe
<statik> oh cool
<lifeless> the problem is the gil-release from within a mutex
 which the C code does

Related branches

Revision history for this message
Elliot Murphy (statik) wrote :

The immediate fix to stop ubuntuone-servers from deadlocking is to switch off the bzip2 serializer, and I'll do that now. The next fix is to introduce a mutex into the bzip2 serializer, and I'll leave that for Naty.

It would be nice to provide a way to configure the oopsmiddleware class so that the calling application could decide what kind of serializer it wanted to use without needing to change code in wsgi-oops itself. I'll leave that for Naty as well :)

Changed in ubuntuone-servers:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Elliot Murphy (statik)
Changed in wsgi-oops:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Naty Bidart (nataliabidart)
tags: added: ubuntuone-karmic
Revision history for this message
Robert Collins (lifeless) wrote :

I'd suggest not doing bz2 in process anyway: its terribly slow.

Revision history for this message
Robert Collins (lifeless) wrote :

(e.g. open a pipe | bz2 > mylog.bz2, then write to the pipe - much faster.)

Elliot Murphy (statik)
Changed in ubuntuone-servers:
status: In Progress → Fix Committed
Revision history for this message
Steve Alexander (stevea) wrote :

We're probably better using gzip anyway... the compressed size isn't all that far off bzip2, and it uses a lot less processor.

Changed in wsgi-oops:
status: Triaged → In Progress
Revision history for this message
Natalia Bidart (nataliabidart) wrote :

Pushed fix into related branch, and it's already waiting for reviews to merge.

Changed in wsgi-oops:
status: In Progress → Fix Committed
Revision history for this message
Natalia Bidart (nataliabidart) wrote :

Branch approved and merged into trunk by Chipaca.

Changed in wsgi-oops:
status: Fix Committed → Fix Released
Revision history for this message
Natalia Bidart (nataliabidart) wrote :

Pending work:

 * revert canonical/oops/wsgi.py so wsgi apps use the Bzip2 serializer again.

Revision history for this message
Natalia Bidart (nataliabidart) wrote :
Changed in ubuntuone-servers:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.