Mir

Loading libmirclient.so twice leads to a segfault in libmirprotobuf.so

Bug #1391976 reported by Alexandros Frantzis
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
High
Daniel van Vugt
0.14
Fix Released
High
Daniel van Vugt
mir (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Can be reproduced with: load_twice libmircommon.so.1 (or .2)
For recent versions of mir use: load_twice libmirclient.so.X (currently .8)

load_twice.c:

#include <stdio.h>
#include <dlfcn.h>

int main(int argc, char** argv)
{
    void *dl;
    int i;

    for (i = 0; i < 2; i++)
    {
       dl = dlopen (argv[1], RTLD_LAZY);
       printf ("%d open dl: %p\n", i, dl);
       if (dl)
           dlclose (dl);
    }
}

Related branches

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Is it a use case we need to worry about right now? We already know the singleton init logic in protobuf is troublesome. That's why it got separated out of mircommon for Mir 0.9.

Obviously with a single driver loaded we have no issues (yet). But multi-driver support in future would probably hit this. Is there any more realistic use case that will trigger it?

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> Is there any more realistic use case that will trigger it?

Yes. The problem occurs even if libmircommon is indirectly loaded/unloaded/reloaded. For example the crash happens with:

./load_twice libclutter-glx-1.0.so.0

because clutter->gdk->mirclient->mircommon. We can't control how third-party programs load and unload their libraries (e.g. plugin systems). This is important because it causes crashes in mir/protobuf code from code that is seemingly unrelated to mir itself.

Changed in mir:
assignee: nobody → Alexandros Frantzis (afrantzis)
importance: Undecided → High
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Workaround(?):

Intentionally leak a handle to one of the libraries up the chain (even protobuf itself). That should ensure libprotobuf never gets re-inited and no crash.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I mean if we simply: dlopen("libprotobuf", ...) and throw away the result, it should remain loaded for the lifetime of the process. And should never crash then.

Robert Carr (robertcarr)
Changed in mir:
status: New → Confirmed
Revision history for this message
Stu (stu-axon) wrote :

Hi,
   This is affecting me in a python program that doesn't directly use mir .. but calls into it via pypy -> pgi > mir *crash* !!

https://bitbucket.org/pypy/pypy/issue/1955/segfault-on-running-program

S

summary: - Loading libmircommon.so twice leads to a segfault in protobuf code
+ Loading libmircommon.so twice leads to a segfault in libprotobuf.so
Changed in mir (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Alexandros Frantzis (afrantzis) wrote : Re: Loading libmircommon.so twice leads to a segfault in libprotobuf.so

Analysis
--------

The core of the problem is that it's not safe to reuse libprotobuf after
calling google::protobuf::ShutdownProtobufLibrary() without unloading and
reloading the library first. Note that libraries/executables that use protobuf
(e.g. libmirprotobuf) use libprotobuf code also during static initialization,
that's why we see the issue when just loading/unloading these libraries.

libprotobuf uses flags to ensure initialization code is run once (see
GoogleInitOnce() and friends), but these flags are not reset during shutdown.
So, if the library is not reloaded and we try to call libprotobuf code
after shutdown, the code thinks that everything is already initialized and
ends up accessing data structures that have been deallocated.

In the scenario described in the bug, libprotobuf is not unloaded when
unloading the mir libraries because libstdc++ is bound to (i.e., needs a symbol from)
libprotobuf. libprotobuf contains an instantiation of a string template
function [1] from the standard c++ library and exports that instantiation as a
weak symbol (as expected). libstdc++ itself also needs the same template
function instantiation, and the linker resolves the symbol by choosing the
instantiation from libprotobuf (which is acceptable behavior).

Proposed solution
-----------------

Ensure libprotobuf can be reused after shutdown by properly resetting "once"
flags on ShutdownProtobufLibrary(). First experiments are promising, but "once"
flags are used extensively in the code base and also in generated code, so
great care needs to be taken. It's also not known if upstream is willing to
accept such patches.

[1] _ZNSs12_S_constructIPcEES0_T_S1_RKSaIcESt20forward_iterator_tag

description: updated
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

Packages (for 2.6.1, vivid) with an _experimental_ patch that allows protobuf to re-initialize properly after google::protobuf::ShutdownProtobufLibrary() is called:

https://launchpad.net/~afrantzis/+archive/ubuntu/staging/+packages

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

I have started experimenting with an alternative solution which is much simpler than what I originally proposed. The solution is to ensure libprotobuf exposes only the symbols it has to (i.e. google protobuf related), to avoid binding to libstdc++, and thus working around the problem.

I have pushed package protobuf-2.6.1-1alf2 with the experimental fix at:

https://launchpad.net/~afrantzis/+archive/ubuntu/staging/+packages

Revision history for this message
Brandon Schaefer (brandontschaefer) wrote :

This is an issue with SDL2. As with SDL2 it needs to load the OpenGL drivers to check if 3d acceleration is enabled. Once it checks it unloads the library. Then the app starts which needs to load the library again annnnd seg fault.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The trivial one-line workaround proposed in comments #3/#4 might work. I think I've used it before as a workaround on other Unixes too.

Changed in mir:
status: Confirmed → Triaged
Changed in mir (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Rather than running load_twice on libmirclient.so.* it should be run on libmirprotobuf.so.* directly. That's the source of the problem and does crash, but only if you change RTLD_LAZY to RTLD_NOW.

Loading libmirclient.so.* is just an indirect fudge to force it to try and resolve more symbols, but not necessary if you use RTLD_NOW on libmirprotobuf.so.* directly.

summary: - Loading libmircommon.so twice leads to a segfault in libprotobuf.so
+ Loading libmirclient.so twice leads to a segfault in libmirprotobuf.so
Changed in mir:
assignee: Alexandros Frantzis (afrantzis) → Daniel van Vugt (vanvugt)
milestone: none → 0.14.0
status: Triaged → In Progress
Changed in mir:
milestone: 0.14.0 → 0.15.0
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.15.0

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → In Progress
Changed in mir:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mir - 0.14.0+15.10.20150722-0ubuntu1

---------------
mir (0.14.0+15.10.20150722-0ubuntu1) wily; urgency=medium

  [ Andreas Pokorny ]
  * Fix missing ABI renaming in Mirplatform
  * Bump Mirserver platform graphics to 3
  * Fix mirprotobuf ABI break

  [ CI Train Bot ]
  * New rebuild forced.

 -- CI Train Bot <email address hidden> Wed, 22 Jul 2015 18:01:49 +0000

Changed in mir (Ubuntu):
status: Triaged → Fix Released
Changed in mir:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.