[spread] crash when sending applications to other workspaces

Bug #753269 reported by Richard Dale
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
unity-2d
Fix Released
Critical
Ugo Riboni

Bug Description

I get a fairly regular crash when I start an application and then try to move it to another workspace. It doesn't always happen, but it doesn't take long to reproduce. I am running the current Natty under vmware on an iMac. I have built unity-2d from the trunk branch and am testing that.

This is the stack trace:

Thread 1 (Thread 0xb780a890 (LWP 5801)):
#0 0x004fb416 in __kernel_vsyscall ()
#1 0x0150fe71 in raise () from /lib/i386-linux-gnu/libc.so.6
#2 0x0151334e in abort () from /lib/i386-linux-gnu/libc.so.6
---Type <return> to continue, or q <return> to quit---
#3 0x083840b5 in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/i386-linux-gnu/libstdc++.so.6
#4 0x08381fa5 in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#5 0x08381fe2 in std::terminate() ()
   from /usr/lib/i386-linux-gnu/libstdc++.so.6
#6 0x083821ab in __cxa_rethrow () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#7 0x00940920 in QMetaObject::activate(QObject*, QMetaObject const*, int, void**) () from /usr/lib/libQtCore.so.4
#8 0x02ee1506 in WindowInfo::workspaceChanged (this=0x90a1e90, _t1=1)
    at /home/rdale/src/unity-2d/build/libunity-2d-private/Unity2d/moc_windowinfo.cxx:236
#9 0x02edcf9c in WindowInfo::updateWorkspace (this=0x90a1e90)
    at /home/rdale/src/unity-2d/libunity-2d-private/Unity2d/windowinfo.cpp:286
#10 0x02edcf69 in WindowInfo::onWorkspaceChanged (window=0x8e25498,
    user_data=0x90a1e90)
    at /home/rdale/src/unity-2d/libunity-2d-private/Unity2d/windowinfo.cpp:280
#11 0x0060848c in g_cclosure_marshal_VOID__VOID ()
   from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#12 0x005ec372 in g_closure_invoke ()
   from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#13 0x005ff048 in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#14 0x00607b29 in g_signal_emit_valist ()
---Type <return> to continue, or q <return> to quit---
   from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#15 0x00607cc2 in g_signal_emit ()
   from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#16 0x019e668c in ?? () from /usr/lib/libwnck-1.so.22
#17 0x019e7218 in ?? () from /usr/lib/libwnck-1.so.22
#18 0x00663311 in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#19 0x00667aa8 in g_main_context_dispatch ()
   from /lib/i386-linux-gnu/libglib-2.0.so.0
#20 0x00668270 in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#21 0x00668524 in g_main_context_iteration ()
   from /lib/i386-linux-gnu/libglib-2.0.so.0
#22 0x0095753c in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4
#23 0x00c48775 in ?? () from /usr/lib/libQtGui.so.4
#24 0x00929289 in QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4
#25 0x00929522 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) ()
   from /usr/lib/libQtCore.so.4
#26 0x0092decc in QCoreApplication::exec() () from /usr/lib/libQtCore.so.4
#27 0x00b928e7 in QApplication::exec() () from /usr/lib/libQtGui.so.4
#28 0x0804d22c in main (argc=1, argv=0xbfe57194)
    at /home/rdale/src/unity-2d/spread/app/spread.cpp:101

I've added some debug logging and it appears to be crashing in WindowsList::updateWorkspaceRole() when it emits a dataChanged() signal to forward the change to the QSortFilterProxyModelQMLs. in the QML.

Related branches

Richard Dale (rdale)
Changed in unity-2d:
assignee: nobody → Richard Dale (rdale)
Revision history for this message
Richard Dale (rdale) wrote :

This is the debug logging I added:

void WindowsList::updateWorkspaceRole(int workspace)
{
    Q_UNUSED(workspace);

    qDebug() << Q_FUNC_INFO << this;
    WindowInfo *window = qobject_cast<WindowInfo*>(sender());
    qDebug() << Q_FUNC_INFO << this << window << window->title();
    if (window != NULL) {
        int row = m_windows.indexOf(window);
        qDebug() << Q_FUNC_INFO << this << "row:" << row;
        if (row != -1) {
            QModelIndex changedItem = index(row);
            qDebug() << Q_FUNC_INFO << this << "changedItem:" << changedItem;
            Q_EMIT dataChanged(changedItem, changedItem);
        }
    }
    qDebug() << Q_FUNC_INFO << this << "EXIT";
}

And here is a sample of the output before the crash:

unity-2d-spread: [DEBUG] void WindowsList::load() WindowsList(0x83fe920) WindowInfo(0x863af60) "rdale@ubuntu: /tmp"
unity-2d-spread: [DEBUG] void WindowsList::load() WindowsList(0x83fe920) WindowInfo(0x864ea38) "rdale@ubuntu: ~/src/unity-2d/build"
unity-2d-spread: [DEBUG] void WindowsList::load() WindowsList(0x83fe920) WindowInfo(0x8617dd8) "rdale@ubuntu: ~"
unity-2d-spread: [DEBUG] void WindowsList::load() WindowsList(0x83fe920) WindowInfo(0x845e4e0) "windowslist.cpp - unity-2d - Qt Creator"
unity-2d-spread: [DEBUG] void WindowsList::load() WindowsList(0x83fe920) WindowInfo(0x8583af0) "Ubuntu Start Page - Mozilla Firefox"
unity-2d-spread: [WARNING] Wnck: Received a timestamp of 0; window activation may not function properly.

unity-2d-spread: [DEBUG] virtual WindowInfo::~WindowInfo() WindowInfo(0x863af60) "rdale@ubuntu: /tmp"
unity-2d-spread: [DEBUG] virtual WindowInfo::~WindowInfo() WindowInfo(0x864ea38) "rdale@ubuntu: ~/src/unity-2d/build"
unity-2d-spread: [DEBUG] virtual WindowInfo::~WindowInfo() WindowInfo(0x8617dd8) "rdale@ubuntu: ~"
unity-2d-spread: [DEBUG] virtual WindowInfo::~WindowInfo() WindowInfo(0x845e4e0) "windowslist.cpp - unity-2d - Qt Creator"
unity-2d-spread: [DEBUG] virtual WindowInfo::~WindowInfo() WindowInfo(0x8583af0) "Ubuntu Start Page - Mozilla Firefox"
unity-2d-spread: [DEBUG] void WindowsList::addWindow(BamfView*) WindowsList(0x83fe920) WindowInfo(0x83bbe90) "Qt Creator"
unity-2d-spread: [DEBUG] void WindowsList::updateWorkspaceRole(int) WindowsList(0x83fe920)
unity-2d-spread: [DEBUG] void WindowsList::updateWorkspaceRole(int) WindowsList(0x83fe920) WindowInfo(0x83bbe90) "Qt Creator"
unity-2d-spread: [DEBUG] void WindowsList::updateWorkspaceRole(int) WindowsList(0x83fe920) row: 0
unity-2d-spread: [DEBUG] void WindowsList::updateWorkspaceRole(int) WindowsList(0x83fe920) changedItem: QModelIndex(0,0,0x0,WindowsList(0x83fe920) )
terminate called after throwing an instance of 'std::bad_alloc'
  what(): std::bad_alloc

Changed in unity-2d:
milestone: none → 3.8.2
importance: Undecided → Critical
Revision history for this message
Richard Dale (rdale) wrote :

While I was investigating this bug, I noticed that in WindowsList::removeWindow(), a 'WindowInfo*' pointer is being removed from the m_windows list, and isn't deleted:

    for (int i = 0; i < m_windows.length(); i++) {
        if (m_windows.at(i)->isSameBamfWindow(window)) {
            beginRemoveRows(QModelIndex(), i, i);
            m_windows.removeAt(i);
            endRemoveRows();
            return;
        }
    }

Shouldn't it be 'delete m_windows.takeAt(i);' instead, to avoid a memory leak?

Changed in unity-2d:
milestone: 3.8.2 → 3.10
Revision history for this message
Ugo Riboni (uriboni) wrote :

Hi Richard,
you are right about the leak in removeWindow, it should be fixed. If you plan to submit a patch for that, I'll be happy to review it.

Regarding the bug itself,have you tried verifying that by simply removing the connections to updateWorkspaceRole in WindowwList (there are two) the crasher goes away ?

If after that test you still think that method is still the actual point of the crash, I would read the documentation of QObject::sender() and pay attention to what it say about using sender() with certain specific types of signal-slot connections, and make sure that the connections to updateWorkspaceRole are of the right type. (It's the only suspicious thing i can find in that method from a quick look).

Another thing worth checking if none of the above is working is running the debug version of QT and then stepping into frame #8 from your original backtrace:
#7 0x00940920 in QMetaObject::activate(QObject*, QMetaObject const*, int, void**) () from /usr/lib/libQtCore.so.4
By doing this you should be able to see what's the slot that's being called on what object, just before the actual crash.
This might give you some extra clues.

Changed in unity-2d:
milestone: 3.10 → 3.8.4
Revision history for this message
Richard Dale (rdale) wrote :
Download full text (12.9 KiB)

I've made some progress. There is a gdb command 'catch throw' and I was able to use that to get a backtrace of where the std::bad_alloc exception was being thrown. It is in some code called by WindowImageProvider::requestImage(). So maybe the next step is to put some debug logging in that method to see what is going on.

(gdb) t a a bt

Thread 4 (Thread 0xb6e7db70 (LWP 3053)):
#0 0x00bc2416 in __kernel_vsyscall ()
#1 0x07c8cf76 in __poll (fds=0x8981af0, nfds=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#2 0x0563d84b in g_poll () from /lib/i386-linux-gnu/libglib-2.0.so.0
#3 0x0562d1af in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#4 0x0562d524 in g_main_context_iteration ()
   from /lib/i386-linux-gnu/libglib-2.0.so.0
#5 0x0029253c in QEventDispatcherGlib::processEvents (this=0x8984078,
    flags=...) at kernel/qeventdispatcher_glib.cpp:422
#6 0x00264289 in QEventLoop::processEvents (this=0xb6e7d130, flags=...)
    at kernel/qeventloop.cpp:149
#7 0x00264522 in QEventLoop::exec (this=0xb6e7d130, flags=...)
    at kernel/qeventloop.cpp:201
#8 0x0016e2a0 in QThread::exec (this=0x8987920) at thread/qthread.cpp:492
#9 0x004d3beb in QDeclarativePixmapReader::run (this=0x8987920)
    at util/qdeclarativepixmapcache.cpp:557
#10 0x00170da2 in QThreadPrivate::start (arg=0x8987920)
    at thread/qthread_unix.cpp:320
#11 0x074f0e99 in start_thread (arg=0xb6e7db70) at pthread_create.c:304
#12 0x07c9b73e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

---Type <return> to continue, or q <return> to quit---
Thread 3 (Thread 0xb767eb70 (LWP 3014)):
#0 0x00bc2416 in __kernel_vsyscall ()
#1 0x07c8cf76 in __poll (fds=0x87368b8, nfds=3, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#2 0x0563d84b in g_poll () from /lib/i386-linux-gnu/libglib-2.0.so.0
#3 0x0562d1af in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#4 0x0562d92b in g_main_loop_run () from /lib/i386-linux-gnu/libglib-2.0.so.0
#5 0x00d13434 in ?? () from /usr/lib/i386-linux-gnu/libgio-2.0.so.0
#6 0x056562df in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#7 0x074f0e99 in start_thread (arg=0xb767eb70) at pthread_create.c:304
#8 0x07c9b73e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

Thread 2 (Thread 0xb64b0b70 (LWP 3016)):
#0 0x00bc2416 in __kernel_vsyscall ()
#1 0x074f548c in pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:169
#2 0x01b38817 in QTWTF::TCMalloc_PageHeap::scavengerThread (this=0x1c4c1c0)
    at ../3rdparty/javascriptcore/JavaScriptCore/wtf/FastMalloc.cpp:2359
#3 0x01b38851 in QTWTF::TCMalloc_PageHeap::runScavengerThread (
    context=0x1c4c1c0)
    at ../3rdparty/javascriptcore/JavaScriptCore/wtf/FastMalloc.cpp:1464
#4 0x074f0e99 in start_thread (arg=0xb64b0b70) at pthread_create.c:304
---Type <return> to continue, or q <return> to quit---
#5 0x07c9b73e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

Thread 1 (Thread 0xb78a7890 (LWP 3013)):
#0 0x024d80e5 in __cxa_throw () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#1 0x00165f85 in qBadAlloc () at global/qglobal.cpp:2019
#2 0x010a5865 in QX11PixmapData::toImage (this=0x899de18,...

Bill Filler (bfiller)
Changed in unity-2d:
assignee: Richard Dale (rdale) → Olivier Tilloy (osomon)
Revision history for this message
Florian Boucault (fboucault) wrote :

Assigning to Ugo so that he can sync up with Richard.

Changed in unity-2d:
assignee: Olivier Tilloy (osomon) → Ugo Riboni (uriboni)
Ugo Riboni (uriboni)
summary: - unity-2d-spread crashes on change of workspace
+ [spread] crash on change of workspace
Ugo Riboni (uriboni)
summary: - [spread] crash on change of workspace
+ [spread] crash when sending applications to other workspaces
Changed in unity-2d:
status: New → Confirmed
Revision history for this message
Ugo Riboni (uriboni) wrote :

@RIchard

Thanks for your great analysis so far. It was extremely helpful in figuring out this bug.
From my debugging it looks like the root cause of the bug is the following:

- We send a window to another workspace, and this cause the WindowInfo object associated with it to be removed from the proxy model managing the origin workspace and to be added to the proxy model managing the destination workspace.

- The WM also unmaps the window when it is sent to the other workspace.

- When this happens, QML creates a new delegate for the window in QML associated to the new model. This causes a new request to the WindowImageProvider to get the image for the window (this is actually a major waste, see notes at the end).

- When loading the image there are essentially two paths, one if the window is still mapped that will directly use the window's X11 drawable. The other is when the window is already unmapped and it will try to use an X11 pixmap that metacity has stored in a property of the window with a screenshot of the image before it was unmapped. In both cases the drawable is then converted to a QImage so that is can be used freely in QML without worrying about its X11 counterpart.

- When the crash happens it is because of a timing issue: when we check if the window is mapped, it is still in the current workspace and thus mapped. So we just grab the window drawable. However by the time we get to convert this drawable to a QImage the WM has unmapped the window, so the drawable is not valid anymore and the call to XGetImage inside QX11PixmapData::toImage fails and returns NULL. QT has a Q_CHECK_PTR just after that which raises the exception you discovered.

On a higher level it appears that the wnck event "workspace-changed" that we get from WNCK is wrong and the window isn't fully on the other workspace yet when it is emitted.

This seems to be an issue in Qt more than anything else since it shouldn't raise an exception there, it should just return an empty QImage (which can then be checked for .isNull()). We can however try to catch the exception and try the entire function again.

One more indirect solution is to wait to change the "workspace" property of the WindowInfo until we receive the notification that the window has been unmapped.

Note that there's also another issue here: no new windows should be added to the WindowsList while the spread is not visible !
Fixing that other problem will also make very very rare the possibility of having this crash happen again, since while the spread is visible there's no way for the user to change the workspace of a window. So we should be safe unless in two cases:
- some other process moves windows around between workspaces while we're in the spread. or the window moves itself.
- in the future we implement this feature in the spread (with drag and drop like Unity already has)

Revision history for this message
Richard Dale (rdale) wrote :

@Ugo

Thanks for the explanation. I need to go back and study the code further to fully understand.

I saw the Q_CHECK_PTR() in the Qt code against the X call returning a null pointer and I agree it does seem wrong. You end up with a C++ 'bad alloc' exception which is misleading as it doesn't really reflect what had happened. I tried to find out how you check for X errors after a call like XGetImage() has failed, but with no luck. I was expecting to find something like errno, or a call you can make to get the last X error. I didn't find anything logged in the ~/.xsession-errors log either.

I've been going through the Unity-2d bug reports to try and find something I can work on, and I noticed several of them were crashes as a result of X errors. I wonder if they have a common cause related to coordinating X calls with Unity-2d workspace management, like this bug.

I did expect the spread to handle drag and drop between workspaces, and that will be a better way of moving windows than using a menu option on the title bar.

Revision history for this message
Ugo Riboni (uriboni) wrote :

The X11 errors should be reported to the console (or .xsession_errors) but we disabled that.
There's a call to XSetErrorHandler in Unity2dPlugin::initializeEngine that basically sets up an X11 error handler that suppress all errors. We should really remove that at some point and install error handlers that suppress errors only around these parts of code that we know may generate errors. However the fact X11 is async makes it a bit more complicated than that.
Anyway, if you remove that line you will see the errors on console regarding this specific issue.

Regarding the other X11 related bugs, I am investigating a few of them and looks like the cause is quite different from this bug, so I wouldn't suspect a common root cause.

Revision history for this message
Ugo Riboni (uriboni) wrote :

I split off the problem of the WindowInfo objects being created when the spread is inactive to bug 761661 since it's not the root cause of this issue, it makes it just easier to reproduce.

Changed in unity-2d:
status: Confirmed → In Progress
Changed in unity-2d:
status: In Progress → Fix Committed
Revision history for this message
Richard Dale (rdale) wrote :

I had a look at the libwnck library to see when the 'workspace-changed' signal is emitted. The wnck library checks whether the atom _NET_WM_DESKTOP has changed. The description of that atom is here:

http://standards.freedesktop.org/wm-spec/1.3/ar01s05.html

"Cardinal to determine the desktop the window is in (or wants to be) starting with 0 for the first desktop. A Client MAY choose not to set this property, in which case the Window Manager SHOULD place it as it wishes."

It looks like it doesn't mean that the workspace id has changed, it just means that the window is in the process of moving to a new desktop. The signal might be better named 'workspace-changing'. I think the unity-2d code is correct to make the assumption that the window may or may not have moved. I couldn't see any other signal which was emitted once the window had moved.

Changed in unity-2d:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.