Server crash in miSpriteRestoreCursor()

Bug #358009 reported by Tom Jaeger
4
Affects Status Importance Assigned to Milestone
X.Org X server
Fix Released
Medium
xorg-server (Ubuntu)
Fix Released
Medium
Bryce Harrington

Bug Description

See also https://lists.ubuntu.com/archives/ubuntu-x/2009-April/000495.html

The issue is associated to slave device cursors, basically an artifact
of how the xserver-1.6 code is derived from master. There is no API to
access device cursors in 1.6, but it turns out that under certain
circumstances (I'm not exactly sure how this happens but it seems to be
some kind of race condition between XI and core grabs), it is possible
that a device cursor will be set anyway. If it's a regular cursor (that
is what the bug report was originally about), this is not a problem
anymore since this will just modify the core cursor, but if it's an
animated cursor, we're in trouble: The device cursor will keep
replacing the core cursor (so the user will notice an animated cursor
that shouldn't be there), and when the client destroys the animated
cursor, the device animated cursor will stay active leading to a server
crash the next time the cursor is updated. There is a patch attached to
the fdo bug report that fixes the issue by basically doing the same
thing for an animated cursor that we do for a regular cursor: Apply the
change to the associated master device. This is safe for 1.6 since this
code path should never be hit in the first place, but unfortunately,
this is not the direction that Peter wants to go for master where each
device has its own sprite (I'm not sure how things are supposed to work
with animated cursors there).

The crash happens randomly when an application that grabs an Xi device
(such as easystroke) is running when clicking on firefox menus, but it's
fairly easy to reproduce reliably by setting up a timeout gesture in
easystroke to rotate the cube in compiz via Control+Shift+Button1 and
invoke the gesture when firefox is loading a page and showing a 'sandbox'.

The patch is available at
http://bugs.freedesktop.org/attachment.cgi?id=21710

Related branches

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

you don't happen to have a reproducable testcase?
Any particular client that causes it?

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

Sorry, I don't know how to reproduce it. It's happened with several different clients before. I'll let you know when I find out more.

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

Created an attachment (id=21079)
screenshot

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

I can reproduce it now. It happens on any kind of drag and drop -- but, only when using my trackpoint, the bug doesn't appear when I'm using the wacom pen.

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

I can only reproduce the bug when the Xi device belonging to the track point is grabbed, for example when the attached client is running.

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

Created an attachment (id=21082)
client that grabs the track point Xi device

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

Created an attachment (id=21083)
client that grabs the track point Xi device

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

Just updated to the latest commit on server-1.6-branch, and the bug disappeared. Thanks.

Xi grabs with GrabModeSync still behave differently on my stylus than on my track point, though.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

Closing as fixed then. Please open a separate bugreport (if it isn't open already) for the other issue.

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

Reopening, see http://<email address hidden>/msg03537.html
I can also reproduce it myself using easystroke, timeout gestures and the compiz cube rotation plugin, but only when the current cursor is animated.

I'll open a bug for the other issue as soon as I find the time to investigate it more.

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

Created an attachment (id=21710)
fix

This patch (plus http://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.6-branch&id=0d12c44d832b98da10dccc3b8bac7676d8ea2c96) fixes the issue for me. What happens in my case is that during processing of an UngrabDevice request, DeactivatePointerGrab calls PostNewCursor with a slave device as its argument, which AnimCurDisplayCursor can't handle. Similar fixes probably need to be applied to AnimCurSetCursorPosition and AnimCurRecolorCursor.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

This patch just papers over the issue though. In theory, cursors for floating
SDs should be just like any cursor but not visibly rendered to the screen. So
the question is more why it crashes and that looks more like there's something
screwed with the private system (something isn't stored properly, or a key is
wrong, or something like that).

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

I see, so the patch doesn't handle floating SDs correctly, but something similar is still needed to get correct behavior on attached SDs: Otherwise the SD's animated cursor keeps overwriting the correct cursor associated with the MD.

The crash looks like a dangling pointer to me:

This is where the animated cursor is freed:

Breakpoint 2, FreeCursor (value=0x9c0a548, cid=58720462) at ../../dix/cursor.c:125
125 for (nscr = 0; nscr < screenInfo.numScreens; nscr++)
(gdb) bt
#0 FreeCursor (value=0x9c0a548, cid=58720462) at ../../dix/cursor.c:125
#1 0x0807448c in FreeResource (id=58720462, skipDeleteFuncType=0) at ../../dix/resource.c:561
#2 0x08087d23 in ProcFreeCursor (client=0x9a75408) at ../../dix/dispatch.c:2956
#3 0x0808cd7f in Dispatch () at ../../dix/dispatch.c:437
#4 0x08071b3d in main (argc=10, argv=0xbfe273d4, envp=Cannot access memory at address 0x20000007
) at ../../dix/main.c:383

And here it is still accessed from the block handler:

Program received signal SIGSEGV, Segmentation fault.
0x3a313158 in ?? ()
(gdb) bt
#0 0x3a313158 in ?? ()
#1 0x0807242c in dixFreePrivates (privates=0x9bbea10) at ../../dix/privates.c:213
#2 0x080815ba in FreeCursor (value=0x9bb99a8, cid=0) at ../../dix/cursor.c:130
#3 0x080f01b7 in xf86_use_hw_cursor_argb (screen=0x8efaab0, cursor=0x9bbe0b8)
    at ../../../../hw/xfree86/modes/xf86Cursors.c:485
#4 0x080f787a in xf86CursorSetCursor (pDev=0x9128800, pScreen=0x8efaab0, pCurs=0x9bbe0b8, x=<value optimized out>,
    y=<value optimized out>) at ../../../../hw/xfree86/ramdac/xf86Cursor.c:332
#5 0x0811b8f8 in miPointerUpdateSprite (pDev=0x9128800) at ../../mi/mipointer.c:407
#6 0x0811bb3d in miPointerDisplayCursor (pDev=0x9128800, pScreen=0x8efaab0, pCursor=0x9bbe0b8)
    at ../../mi/mipointer.c:198
#7 0x08149156 in CursorDisplayCursor (pDev=0x9128800, pScreen=0x8efaab0, pCursor=0x9bbe0b8) at ../../xfixes/cursor.c:148
#8 0x0817a205 in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0xbfe27288, pReadmask=0x81f51e0)
    at ../../render/animcur.c:203
#9 0x08143618 in compBlockHandler (i=0, blockData=0x0, pTimeout=0xbfe27288, pReadmask=0x81f51e0)
    at ../../composite/compinit.c:158
#10 0x080909b8 in BlockHandler (pTimeout=0xbfe27288, pReadmask=0x81f51e0) at ../../dix/dixutils.c:384
#11 0x08130024 in WaitForSomething (pClientsReady=0x9124ec0) at ../../os/WaitFor.c:215
#12 0x0808cabe in Dispatch () at ../../dix/dispatch.c:367
#13 0x08071b3d in main (argc=10, argv=0xbfe273d4, envp=0x0) at ../../dix/main.c:383
(gdb) up 8
#8 0x0817a205 in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0xbfe27288, pReadmask=0x81f51e0)
    at ../../render/animcur.c:203
203 (void) (*pScreen->DisplayCursor) (dev,
(gdb) print animCurState[dev->id].pCursor
$5 = (CursorPtr) 0x9c0a548

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

Easy way to reproduce the issue: Adjust the above program to grab the XI slave device of your choice, and then, in firefox, repeatedly reload a website that takes a while to load (so that an animated cursor is shown) by using a menu (e.g. the history menu or the most visited menu). Now if you place the mouse in another window, that window's cursor will be instantly overwritten by the animated cursor. The X server will crash as soon as firefox is closed.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

Thanks Tom. Just to give you a heads-up, it'll take me a while to get to this
bug, half the input system is reworked and cleaned up, so I'm holding back
with bugs until that work is done.

Revision history for this message
Tom Jaeger (thjaeger) wrote : Server crash in dixLookupPrivate

See also https://lists.ubuntu.com/archives/ubuntu-x/2009-April/000495.html

The issue is associated to slave device cursors, basically an artifact
of how the xserver-1.6 code is derived from master. There is no API to
access device cursors in 1.6, but it turns out that under certain
circumstances (I'm not exactly sure how this happens but it seems to be
some kind of race condition between XI and core grabs), it is possible
that a device cursor will be set anyway. If it's a regular cursor (that
is what the bug report was originally about), this is not a problem
anymore since this will just modify the core cursor, but if it's an
animated cursor, we're in trouble: The device cursor will keep
replacing the core cursor (so the user will notice an animated cursor
that shouldn't be there), and when the client destroys the animated
cursor, the device animated cursor will stay active leading to a server
crash the next time the cursor is updated. There is a patch attached to
the fdo bug report that fixes the issue by basically doing the same
thing for an animated cursor that we do for a regular cursor: Apply the
change to the associated master device. This is safe for 1.6 since this
code path should never be hit in the first place, but unfortunately,
this is not the direction that Peter wants to go for master where each
device has its own sprite (I'm not sure how things are supposed to work
with animated cursors there).

The crash happens randomly when an application that grabs an Xi device
(such as easystroke) is running when clicking on firefox menus, but it's
fairly easy to reproduce reliably by setting up a timeout gesture in
easystroke to rotate the cube in compiz via Control+Shift+Button1 and
invoke the gesture when firefox is loading a page and showing a 'sandbox'.

The patch is available at
http://bugs.freedesktop.org/attachment.cgi?id=21710

Changed in xorg-server:
status: Unknown → Confirmed
Bryce Harrington (bryce)
summary: - Server crash in dixLookupPrivate
+ Server crash in miSpriteRestoreCursor()
Revision history for this message
Bryce Harrington (bryce) wrote :

Boy... on the one hand I hate to be putting in non-trivial changes this close to freeze, but on the other hand you've clearly done some good work chasing this down and it would be a shame to leave it out.

Normally, I'd want to see this incorporated into the upstream tree, however at least you've gotten the change reviewed by whot, and it sounds like he didn't spot obvious flaws, just that he has larger plans in motion. I certainly don't mind papering over issues - anything to make crashes go away!

I'd like to have seen it more widely tested as well, but your analysis and testing sounds like it was done carefully. I'll incorporate it, provisionally that it is not implicated with any regressions. We can always revert the patch if we find problems with it.

Changed in xorg-server (Ubuntu):
assignee: nobody → Bryce Harrington (bryceharrington)
importance: Undecided → Medium
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg-server - 2:1.6.0-0ubuntu14

---------------
xorg-server (2:1.6.0-0ubuntu14) jaunty; urgency=low

  * Add 177_animated_cursor_change_master.patch: Fixes crash when using
    animated cursors.
    (LP: #358009)

 -- Bryce Harrington <email address hidden> Wed, 08 Apr 2009 18:52:56 -0700

Changed in xorg-server (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Tom Jaeger (thjaeger) wrote : Re: [Bug 358009] Re: Server crash in miSpriteRestoreCursor()

Thank you.

Bryce Harrington wrote:
> Boy... on the one hand I hate to be putting in non-trivial changes this
> close to freeze, but on the other hand you've clearly done some good
> work chasing this down and it would be a shame to leave it out.
>
> Normally, I'd want to see this incorporated into the upstream tree,
> however at least you've gotten the change reviewed by whot, and it
> sounds like he didn't spot obvious flaws, just that he has larger plans
> in motion. I certainly don't mind papering over issues - anything to
> make crashes go away!
>
> I'd like to have seen it more widely tested as well, but your analysis
> and testing sounds like it was done carefully. I'll incorporate it,
> provisionally that it is not implicated with any regressions. We can
> always revert the patch if we find problems with it.
>
>
> ** Changed in: xorg-server (Ubuntu)
> Importance: Undecided => Medium
>
> ** Changed in: xorg-server (Ubuntu)
> Status: New => Fix Committed
>
> ** Changed in: xorg-server (Ubuntu)
> Assignee: (unassigned) => Bryce Harrington (bryceharrington)
>

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

Tom: can you please give me an update if you still see this issue? I just tried it (with thunderbird instead of firefox) on my XI2 branch and couldn't reproduce it.

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

I'm hopelessly bad at building X servers from source. I would guess, however, that you can reproduce the bug by calling XChangeDeviceCursor on an attached SD with an animated cursor as argument.

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

Created an attachment (id=25782)
potential test case

Since I don't have an xi2 server running, I couldn't test this and I'm sure it needs a few adjustments.

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

On second thought, we probably need to call XChangeDeviceCursor on the root window to see the bug.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

can you give me another update on this? Have you seen this bug recently? I just tried your test program on the root window + the window, with an attached and floating slave device and didn't see any crashes.

Revision history for this message
In , Tom Jaeger (thjaeger) wrote :

(In reply to comment #20)
> can you give me another update on this? Have you seen this bug recently? I just
> tried your test program on the root window + the window, with an attached and
> floating slave device and didn't see any crashes.

The issue of an animated SD cursor living longer than it should and overriding the correct cursor still exists on master (I'll see if I can create a test case when I have time), but it's not very critical as it doesn't cause crashes any longer and a lot less annoying since it usually goes away fairly quickly as a result of switching the application or similar.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

I'm fixing this right now, the server should return BadDevice for any
ChangeCursor call on an SD, only master pointers can have cursors set.
that's one part of the fix, the other one is obviously to remove the
device-specific cursor when the master pointer disappears.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

Fixed, AFAICT. Commit is 65354e5a698a5b527db09afc431110afba0e14b2.
Please reopen if you can still reproduce this.

Changed in xorg-server:
status: Confirmed → Fix Released
Changed in xorg-server:
importance: Unknown → Medium
Changed in xorg-server:
importance: Medium → Unknown
Changed in xorg-server:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.