Wine locks up when running multithreaded applications that touch both OpenGL and X11

Bug #1059276 reported by Erich E. Hoover
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
libxcb
Fix Released
Medium
libxcb (Ubuntu)
Fix Released
Medium
Canonical X.org
Precise
Fix Released
Medium
Unassigned

Bug Description

[IMPACT]

Wine needs an update to libxcb to include upstream commit 23911a707b8845bff52cd7853fc5d59fb0823cef. This commit fixes an issue where multithreaded applications lock up when they use both OpenGL and X11. This is being hit with a number of popular games on both quantal and precise.

[TESTCASE 1 - No wine required]

1) download https://bugs.freedesktop.org/attachment.cgi?id=67051
2) compile and run: --- gcc -o test t.c -lxcb -lpthread; ./test
3) When broken (without the patch) you'll just see "o" repeating over and over, when working properly (with the patch) you'll see both "o" and "x" appearing in the output.

[TESTCASE 2 - Wine required]

1) download http://bugs.winehq.org/attachment.cgi?id=41417 and extract it
2) run wine graphics.exe and watch for it to get hung.
3) It should fade out or be hung on a green screen soon after launch without the fix.

[Regression Potential]

This patch is in the stable libxcb 1.9 which was just released last week. There has been no known fallout from it being in quantal for 2 weeks.

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

Now that Wine is no longer locking around all X calls we're seeing lockups in applications that use both OpenGL (through Direct3D) and X11 (through GDI) simultaneously in separate threads. It appears that something strange is happening in _XReply that causes the applications to hang indefinitely in xcb_wait_for_reply when both threads happen to use the same sequence number at the same time.

I'm not really an X expert, so I'd really appreciate some help trying to track this down. On the Wine end we have a bug open (http://bugs.winehq.org/show_bug.cgi?id=31406) to track this issue, but we're pretty sure it's something in xlib or xcb.

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

Created attachment 66895
Hack to work around the problem

Ok, so I've been trying to find enough information about what's going on to provide something useful for fixing this. Working off of the latest git the problem appears to be solvable using the attached hack and is triggered when Wine calls XCheckIfEvent in combination with OpenGL routines.

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

Created attachment 66985
Patch to keep Wine applications from locking up

Ok, the attached patch seems to fix the problem without killing the framerate of games using OpenGL. However, I'm concerned that this sets up a ping-pong scenario between the threads and is therefore just obscuring the issue. Feedback would be greatly appreciated.

Revision history for this message
In , Psychon-d (psychon-d) wrote :

Since IRC didn't feel helpful: Could you provide backtraces for the deadlock that you are seeing? What exactly happens without this patch?

Also, which versions of libxcb are you using?

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

Created attachment 67004
Backtrace of the threads involved in the lockup

(In reply to comment #3)
> Since IRC didn't feel helpful: Could you provide backtraces for the deadlock
> that you are seeing? What exactly happens without this patch?
>
> Also, which versions of libxcb are you using?

Sorry, is IRC your preferred mode of communication? Without the patch we see a lockup in some of Wine's mutexes because the other thread held the mutex and then got stuck in xcb_wait_for_reply (see backtrace). I'm currently using the libxcb from git. If it helps then I can add debug symbols to some of the other libraries you see in the backtrace (libx11 and libxrandr).

Revision history for this message
In , Psychon-d (psychon-d) wrote :

Does it always hang inside of XRRGetCrtcInfo? That function doesn't do anything special and I don't really see why this should indefinitely wait for a reply from the X11 server. Also, unlocking the Xlib display before polling for the reply shouldn't make any difference on this at all. :-/

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

(In reply to comment #5)
> Does it always hang inside of XRRGetCrtcInfo? That function doesn't do anything
> special and I don't really see why this should indefinitely wait for a reply
> from the X11 server.

No, it hangs inside all sorts of different functions. Sometimes it's just a direct _XReply call (not sure how that happens). The most common parent (from maybe 10-20 runs) is:
6 0xf6f332d5 in libgl.so.1 (+0x8f2d4) (0x022de488)

> Also, unlocking the Xlib display before polling for the
> reply shouldn't make any difference on this at all. :-/

I'm not sure what you mean here, do you mean in attachment 66895 where I remove the unlock so that it stays locked while it goes through xcb_wait_for_reply? I suspect what's happening is that one thread is depleting the fd of events, so the other thread hangs inside the poll() virtually indefinitely. (So if there's a lock around the poll() and recv() then that's not possible)

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

Created attachment 67043
Patch to keep Wine applications from locking up

(In reply to comment #2)
> Created attachment 66985 [details] [review]
> Patch to keep Wine applications from locking up
> ...

Whoops, I attached the wrong patch here (no wonder about the confusion!). I'm really sorry about that, I'm not sure how I managed to do that.

Revision history for this message
In , Psychon-d (psychon-d) wrote :

I don't think that patch actually fixes any races, at best it makes them harder to hit, because instead of poll()ing the fd at the same time, the second thread will now just wait for the mutex and then start poll()ing after the first thread is done. This thread then still wouldn't get woken up if there is nothing to be read from the socket, so the same hang can still happen.

Also, this shouldn't be needed. While poll()ing the fd, _xcb_conn_wait will always set c->in.reading to a non-zero value. This means that the next thread that calls _xcb_conn_wait() will sleep in pthread_cond_wait() instead of poll(). (However, it seems like if one thread is only reading and doesn't want to write, another thread could be allowed to read and write at the same time. Could this be the issue at hand? If yes, it should be possible to somehow produce an xcb-only testcase of this...?)

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

(In reply to comment #8)
> ... (However, it seems like if one thread is only reading and doesn't want
> to write, another thread could be allowed to read and write at the same time.
> Could this be the issue at hand? If yes, it should be possible to somehow
> produce an xcb-only testcase of this...?)

That seems like a reasonable possibility from the playing around I've done, is there an easy way that I can confirm that this is what's happening?

Revision history for this message
In , Psychon-d (psychon-d) wrote :

Since you are building your own libxcb anyway, do some printf-debugging: right before the poll() do if(c->in.reading > 1) puts("something is smelly here");
(And double-check that I got the logic right)
(Alternatively use assert())
(Alternatively, check that c->in.reading == 0 before that variable is incremented)

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

Created attachment 67048
Patch to keep Wine applications from locking up [v2]

(In reply to comment #10)
> Since you are building your own libxcb anyway, do some printf-debugging: right
> before the poll() do if(c->in.reading > 1) puts("something is smelly here");
> (And double-check that I got the logic right)
> (Alternatively use assert())
> (Alternatively, check that c->in.reading == 0 before that variable is
> incremented)

Yup, I get a bunch of "c->in.reading != 0, expect funny business!" and then a lockup. I threw together a quick check so that it would not read if a read is already in progress (attached) and with a small number of tests (3) that seems to be working.

Revision history for this message
In , Psychon-d (psychon-d) wrote :

Created attachment 67051
xcb-only test case that (hopefully) reproduces this problem

This test has two threads, one is sending MapWindow requests, the other is sending GetInputFocus requests and waits for the replies. After each request, a "x" or "o" is printed.

The GetInputFocus thread quickly dead-locks in poll().

Revision history for this message
In , Psychon-d (psychon-d) wrote :

After all, this seems to be a bug in xcb, not in Xlib.

I'm not sure about the idea of the proposed patch. If there are seperate reading and writing threads, the reading thread could stop reading and we end up without any reading thread. In theory this could cause a dead-lock where the server refuses to read until some of the already written data gets handled in the client, but this seems quite unlikely.

For the patch itself: fd.fd is not initialized in the "just writing"-case.

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

Created attachment 67053
Patch to keep Wine applications from locking up [v2]

(In reply to comment #13)
> ...
> I'm not sure about the idea of the proposed patch. If there are seperate
> reading and writing threads, the reading thread could stop reading and we end
> up without any reading thread. In theory this could cause a dead-lock where the
> server refuses to read until some of the already written data gets handled in
> the client, but this seems quite unlikely.

If the reading thread stops reading then wouldn't the writing thread open up to being able to read again?

> For the patch itself: fd.fd is not initialized in the "just writing"-case.

My bad, sorry about that - a corrected version is attached (still works for my test case).

Revision history for this message
In , Psychon-d (psychon-d) wrote :

Sorry for not using your patch, but there must always be a thread reading from the socket when there is one that writes.

commit 23911a707b8845bff52cd7853fc5d59fb0823cef
Author: Uli Schlachter <email address hidden>
Date: Mon Sep 24 22:07:51 2012 +0200

    Fix a multi-thread deadlock

    This fixes a deadlock which was seen in-the-wild with wine.

    It could happen that two threads tried to read from the socket at the same time
    and one of the thread got stuck inside of poll()/select().

    The fix works by making sure that the writing thread doesn't steal the reading
    thread's reply.

    Debugged-by: Erich Hoover <email address hidden>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54671
    Signed-off-by: Uli Schlachter <email address hidden>

Revision history for this message
In , Erich E. Hoover (ehoover) wrote :

(In reply to comment #15)
> Sorry for not using your patch, but there must always be a thread reading
> from the socket when there is one that writes.
> ...

No problem, I'm just glad to get this fixed! Would you mind if I contact the Ubuntu folks and ask them to apply this patch to their 1.8.1 package?

Robert Hooker (sarvatt)
Changed in libxcb (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Canonical X.org (canonical-x)
Robert Hooker (sarvatt)
description: updated
Robert Hooker (sarvatt)
tags: added: patch
tags: added: precise quantal
Robert Hooker (sarvatt)
description: updated
description: updated
description: updated
Robert Hooker (sarvatt)
description: updated
Revision history for this message
Robert Hooker (sarvatt) wrote :

Debdiff containing the fix targetted to quantal

Revision history for this message
Robert Hooker (sarvatt) wrote :

Debdiff containing the fix targetted to precise-proposed

Revision history for this message
Robert Hooker (sarvatt) wrote :

Packages for testing before they are sponsored are available in https://launchpad.net/~sarvatt/+archive/sru8 if so desired

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libxcb - 1.8.1-1ubuntu1

---------------
libxcb (1.8.1-1ubuntu1) quantal; urgency=low

  * Add 100-fix-multi-thread-deadlock.patch: Fixes a deadlock affecting
    some wine applications. (LP: #1059276)
 -- Robert Hooker <email address hidden> Mon, 01 Oct 2012 14:08:35 -0400

Changed in libxcb (Ubuntu):
status: Triaged → Fix Released
Changed in libxcb:
importance: Unknown → Medium
status: Unknown → Fix Released
Robert Hooker (sarvatt)
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Erich, or anyone else affected,

Accepted libxcb into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/libxcb/1.8.1-1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Changed in libxcb (Ubuntu Precise):
status: New → Fix Committed
importance: Undecided → Medium
Revision history for this message
Erich E. Hoover (ehoover) wrote :

Thanks Brian, sorry it took me so long to test - trying to get my dissertation done. The package works great in both test scenarios that I've used to check for the bug (TESTCASE 1 and operating a variety of affected apps with Wine).

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libxcb - 1.8.1-1ubuntu0.1

---------------
libxcb (1.8.1-1ubuntu0.1) precise-proposed; urgency=low

  * Add 100-fix-multi-thread-deadlock.patch: Fixes a deadlock affecting
    some wine applications. (LP: #1059276)
 -- Robert Hooker <email address hidden> Mon, 01 Oct 2012 14:08:35 -0400

Changed in libxcb (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.