Clients all crash (and sometimes server too)

Bug #1629275 reported by Alan Griffiths
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Invalid
Undecided
Unassigned
MirAL
Invalid
High
Unassigned
mir (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

mir-24 on yakkety (or mir-0.21 on xenial, or lp:mir on yakkety)

Start a Mir-on-X11 session:

    $ mir_demo_server_minimal &
    $ mirrun gnome-terminal

In the terminal:

    $ mir_demo_client_all&

(A script that launches all the Mir demo clients - attached)

Expect: all the clients open
Actual (most times): gnome-terminal and all the clients crash
Actual (frequently): gnome-terminal, all the clients, and the server crash
Actual (occasionally): all the clients open

The "frequent" server crash is lp:1607812, this bug is for the client crash

Tags: gtk-mir
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

A bit more experimentation:

If the mir_demo_client_all script is run from a command line session that isn't running under miral then things are a lot more stable. Although the miral gnome-terminal session is still prone to crashing.

So I think there are likely two problems:

1. gnome-terminal crashes when a lot of new clients start (WHY?!)
2. the server crashes as a result of gnome-terminal bringing down a lot of child processes

I can also reproduce using mir_demo_shell_minimal - so probably a mir problem, not miral

description: updated
Changed in miral:
assignee: Alan Griffiths (alan-griffiths) → nobody
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I think this is a duplicate. Somewhere we have an existing bug report which involves a client bringing down the server. I saw the bug report again last week but can't seem to find it right now.

description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I can't seem to reproduce this with the above instructions either.

One minor observation: You suggest running the server from lp:mir on yakkety yet your script launches the clients from $PATH. There might be a protocol incompatibility between those older clients in $PATH and the newer server in lp:mir.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also, obviously, got a stack trace or any information from the crashed server?

Changed in mir:
status: New → Incomplete
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

"One minor observation: You suggest running the server from lp:mir on yakkety yet your script launches the clients from $PATH. There might be a protocol incompatibility between those older clients in $PATH and the newer server in lp:mir."

On *my* yakkety box lp:mir has been installed in /usr/local, so I'm getting the right demo clients.

But it is interesting that you can't reproduce.

description: updated
affects: ubuntu → mir (Ubuntu)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

The reason the server exits is:

Thread 1 "mir_demo_server" received signal SIGTERM, Terminated.
0x00007ffff6ea6200 in __libc_sendmsg (fd=40, msg=0x7fffffffc3f0, flags=16384) at ../sysdeps/unix/sysv/linux/sendmsg.c:28
28 ../sysdeps/unix/sysv/linux/sendmsg.c: No such file or directory.
(gdb) info threads
  Id Target Id Frame
* 1 Thread 0x7ffff7fa47c0 (LWP 23976) "mir_demo_server" 0x00007ffff6ea6200 in __libc_sendmsg (fd=40, msg=0x7fffffffc3f0,
    flags=16384) at ../sysdeps/unix/sysv/linux/sendmsg.c:28
  2 Thread 0x7fffee814700 (LWP 23980) "Mir/Snapshot" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  3 Thread 0x7fffee013700 (LWP 23981) "Mir/Comp" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  4 Thread 0x7fffed812700 (LWP 23983) "Mir/Input Reade" 0x00007ffff6e9910d in poll () at ../sysdeps/unix/syscall-template.S:84
  5 Thread 0x7fffed011700 (LWP 23984) "Mir/IPC" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  6 Thread 0x7fffec810700 (LWP 23985) "Mir/IPC" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
(gdb) c
Continuing.
[Thread 0x7fffec810700 (LWP 23985) exited]
[Thread 0x7fffed011700 (LWP 23984) exited]
[Thread 0x7fffed812700 (LWP 23983) exited]
[Thread 0x7fffee013700 (LWP 23981) exited]
[Thread 0x7fffee814700 (LWP 23980) exited]
ERROR: /home/alan/display_server/mir/src/platforms/mesa/server/buffer_allocator.cpp(151): Throw in function virtual void {anonymous}::DMABufTextureBinder::ensure_egl_image()
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::system_error> >
std::exception::what: Failed to get PRIME fd from gbm bo: No such file or directory

Thread 1 "mir_demo_server" received signal SIGSEGV, Segmentation fault.
0x00007ffff038b62c in ?? ()
(gdb) bt
#0 0x00007ffff038b62c in ?? ()
#1 0x00007ffff740a141 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007ffff6dd765a in __cxa_finalize (d=0x7ffff7dd60a0) at cxa_finalize.c:56
#3 0x00007ffff7761d23 in __do_global_dtors_aux () from /usr/local/lib/libmirserver.so.42
#4 0x00007fffffffe320 in ?? ()
#5 0x00007ffff7de8efa in _dl_fini () at dl-fini.c:235
Backtrace stopped: frame did not save the PC

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

And this is where the "signal comes from:

#1 0x00007ffff77672fa in mir::terminate_with_current_exception ()
    at /home/alan/display_server/mir/src/server/terminate_with_current_exception.cpp:52
#2 0x00007ffff78af4c2 in mir::compositor::CompositingFunctor::operator() (this=0x5555559fcae0)
    at /home/alan/display_server/mir/src/server/compositor/multi_threaded_compositor.cpp:180
#3 0x00007ffff78b2a7c in std::_Function_handler<void (), std::reference_wrapper<mir::compositor::CompositingFunctor> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/6/functional:1770
#4 0x00007ffff7769846 in std::function<void ()>::operator()() const (this=0x7fffee012d00) at /usr/include/c++/6/functional:2136
#5 0x00007ffff79ea451 in (anonymous namespace)::Task::execute (this=0x7fffee012d00)
    at /home/alan/display_server/mir/src/server/thread/basic_thread_pool.cpp:40
#6 0x00007ffff79ea7ba in (anonymous namespace)::Worker::operator() (this=0x5555559fcbd0)
    at /home/alan/display_server/mir/src/server/thread/basic_thread_pool.cpp:91
#7 0x00007ffff79ed66c in std::__invoke_impl<void, (anonymous namespace)::Worker&> (__f=...) at /usr/include/c++/6/functional:218
#8 0x00007ffff79ed62e in std::__invoke<(anonymous namespace)::Worker&> (__fn=...) at /usr/include/c++/6/functional:260
#9 0x00007ffff79ed5d8 in std::reference_wrapper<(anonymous namespace)::Worker>::operator()<>(void) const (this=0x5555559fcf08)
    at /usr/include/c++/6/functional:474
#10 0x00007ffff79ed5b6 in std::_Bind_simple<std::reference_wrapper<(anonymous namespace)::Worker>()>::_M_invoke<>(std::_Index_tuple<>) (this=0x5555559fcf08) at /usr/include/c++/6/functional:1400
#11 0x00007ffff79ed540 in std::_Bind_simple<std::reference_wrapper<(anonymous namespace)::Worker>()>::operator()(void) (
    this=0x5555559fcf08) at /usr/include/c++/6/functional:1389
#12 0x00007ffff79ed510 in std::thread::_State_impl<std::_Bind_simple<std::reference_wrapper<(anonymous namespace)::Worker>()> >::_M_run(void) (this=0x5555559fcf00) at /usr/include/c++/6/thread:196
#13 0x00007ffff743650f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007ffff490170a in start_thread (arg=0x7fffee013700) at pthread_create.c:333
#15 0x00007ffff6ea50ff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

And that appears to be from failing to handle an exception from DMABufTextureBinder::ensure_egl_image() ("ERROR: /home/alan/display_server/mir/src/platforms/mesa/server/buffer_allocator.cpp(151):" in comment #7)

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

It is also worth noting that starting the mir clients outside the gnome-terminal seems stable. So whatever weirdness is going on client-side is associated with gnome-terminal.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Using qterminal in place of gnome-terminal works, so I think the server crash may be the only Mir problem here.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Found it.

You logged this a couple of months ago :)

description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK, see also: bug 1607812

summary: - Clients and server all crash
+ Clients all crash (and sometimes server too)
Changed in miral:
status: Triaged → Invalid
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Console log from gnome-terminal (with MIR_CLIENT_RPC_REPORT=log)

[2016-10-04 15:41:58.721290] <DEBUG> rpc: Invocation request: id: 52 method_name: submit_buffer
[2016-10-04 15:41:58.721397] <DEBUG> rpc: Invocation succeeded: id: 52 method_name: submit_buffer
[2016-10-04 15:41:58.727375] <DEBUG> rpc: Invocation request: id: 53 method_name: submit_buffer
[2016-10-04 15:41:58.727437] <DEBUG> rpc: Invocation succeeded: id: 53 method_name: submit_buffer
[2016-10-04 15:41:58.737226] <DEBUG> rpc: Result received: id: 52
[2016-10-04 15:41:58.737269] <DEBUG> rpc: Complete response: id: 52
[2016-10-04 15:41:58.739094] <DEBUG> rpc: Result received: id: 53
[2016-10-04 15:41:58.739187] <DEBUG> rpc: Complete response: id: 53
*** Error in `/usr/lib/gnome-terminal/gnome-terminal-server': double free or corruption (out): 0x00007f90bcb0c010 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7911b)[0x7f90d96d311b]
/lib/x86_64-linux-gnu/libc.so.6(+0x827aa)[0x7f90d96dc7aa]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f90d96e01dc]

tags: added: gtk-mir
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Not reproducible on zesty - probably fixed by changes to gtk-3 (or gnome-terminal). Marking invalid in Mir.

Changed in mir:
status: Incomplete → Invalid
Changed in mir (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.