unity-system-compositor crashed with std::runtime_error in mir::compositor::CompositingFunctor::wait_until_started() from usc::MirScreen::set_screen_power_mode (mir_power_mode_on)

Bug #1528384 reported by errors.ubuntu.com bug bridge
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Fix Released
Critical
kevin gunn
Mir
Fix Released
Critical
Alan Griffiths
0.19
Fix Released
Undecided
Unassigned
0.20
Fix Released
Critical
Alan Griffiths
Unity System Compositor
Fix Released
Critical
Alan Griffiths
mir (Ubuntu)
Fix Released
Critical
Unassigned
unity-system-compositor (Ubuntu)
Fix Released
Critical
Unassigned

Bug Description

Top crash over past week (ww02) on errors.u.c for rc-proposed channels.

Started with u-s-c 0.2.0+15.04.20151216.1-0ubuntu1

The Ubuntu Error Tracker has been receiving reports about a problem regarding unity-system-compositor. This problem was most recently seen with version 0.2.0+15.04.20151216.1-0ubuntu1, the problem page at https://errors.ubuntu.com/problem/7bcfcf599b35b264c0be45d5290ad9ae3c50adcf contains more details.

Tags: vivid wily

Related branches

no longer affects: unity-system-compositor
description: updated
Changed in canonical-devices-system-image:
importance: Undecided → High
status: New → Confirmed
assignee: nobody → kevin gunn (kgunn72)
milestone: none → ww04-2016
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: unity-system-compositor crashed with std::runtime_error in mir::compositor::CompositingFunctor::wait_until_started()

Maybe USC is blocked on startup? Related to its powerd logic?

summary: - /usr/sbin/unity-system-
- compositor:6:__gnu_cxx::__verbose_terminate_handler:__cxxabiv1::__terminate:std::terminate:__cxxabiv1::__cxa_throw:boost::throw_exception
+ unity-system-compositor crashed with std::runtime_error in
+ mir::compositor::CompositingFunctor::wait_until_started()
Changed in mir:
importance: Undecided → High
Changed in unity-system-compositor:
importance: Undecided → High
Changed in mir (Ubuntu):
importance: Undecided → High
Changed in unity-system-compositor (Ubuntu):
importance: Undecided → High
Changed in mir:
milestone: none → 0.19.0
Stephen M. Webb (bregma)
Changed in mir:
milestone: 0.19.0 → 0.20.0
summary: unity-system-compositor crashed with std::runtime_error in
- mir::compositor::CompositingFunctor::wait_until_started()
+ mir::compositor::CompositingFunctor::wait_until_started() from
+ usc::MirScreen::set_screen_power_mode (mir_power_mode_on)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Looking at the stack traces, it seems to generally happen on phones when the screen tries to turn on due to a proximity event from dbus/powerd.

The only reasonable explanation I can find is due to the fact that MultiThreadedCompositor::start/stop() are not locked. Instead they use atomic compare/exchange so are written assuming stop/start are called from the same thread. If stop and start are not called from the same thread under USC then it's possible we're trying to start the compositor while it's still stopping. And that might result in us waiting for incorrect CompositingFunctors.

kevin gunn (kgunn72)
Changed in canonical-devices-system-image:
milestone: ww04-2016 → ww08-2016
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mir (Ubuntu):
status: New → Confirmed
Changed in unity-system-compositor (Ubuntu):
status: New → Confirmed
Revision history for this message
Olivier Tilloy (osomon) wrote :

Got this crash over the week-end on my krillin running the latest rc-proposed.
I got a phone call, grabbed the phone from inside my jacket, but I was never able to answer it as the screen remained blank.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I don't think MultiThreadedCompositor::start()/stop() locking is the real issue. They are only called by MirScreen's "*_l()" functions which (if I understand the convention correctly) should only be invoked under lock of its mutex.

However, usc::MirScreen::power_off_alarm_notification() calls configure_display_l() without first acquiring a lock. (Which shows how fragile this convention is.)

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Actually, I've misunderstood. power_off_alarm_notification() is only called from MirScreen::PowerOffLockableCallback - which acquires the lock.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Digging into the code (without being able to reproduce) the likeliest cause seems to be incorrect error handling in mir::compositor::CompositingFunctor::operator()().

In Mir 0.18 this function can plausibly terminate via an exception before setting the promise (on line 116) - it has a catch block, but this does not set an exception on the promise. I can't identify all the threads from the error tracker, but I don't see this one, so I suspect this is what has happened.

In Mir 0.19 this function has been modified to set the exception on the promise, which should change the behaviour. But as the future not read (just waited on) the behaviour is still suspect. The current failure detected in wait_until_started() should go and the problem will be ignored by the Compositor. However, as mir::terminate_with_current_exception() is called the server should close down and we'll end up with a different failure mode. Maybe that will give us a little more information on the underlying error condition?

In my travels I also noticed some hard-to-validate logic in MultiThreadedCompositor::start()/stop() - which, while it does successfully use atomic for synchronization, has the questionable behaviour of simply ignoring both start() and stop() requests while in state CompositorState::starting or CompositorState::stopping.

Likewise in usc::MirScreen the locking conventions around "_l" functions are not followed consistently.

Changed in mir:
assignee: nobody → Alan Griffiths (alan-griffiths)
Changed in unity-system-compositor:
assignee: nobody → Alan Griffiths (alan-griffiths)
Changed in mir:
status: New → In Progress
Changed in unity-system-compositor:
status: New → In Progress
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Note:

   lp:~alan-griffiths/mir/better-error-reporting-in-MultiThreadedCompositor-start
   lp:~alan-griffiths/unity-system-compositor/report-warnings-instead-of-crashing

These linked branches may not be a full solution but:

1. they log the underlying problem (which is "eaten" by bad error handling at present); and,
2. they avoid crashing USC when the issue is encountered and allow a second attempt to turn the screen on.

Changed in canonical-devices-system-image:
milestone: ww08-2016 → ww04-2016
importance: High → Critical
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.20.0

Changed in mir:
status: In Progress → Fix Committed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I don't think that was a fix that landed. Just some helpful progress.

Changed in mir:
status: Fix Committed → In Progress
Revision history for this message
kevin gunn (kgunn72) wrote :

We're needing more data on this bug, and we're going to eventually land the MP's to improve some reporting.

But this bug is critical, and we'd like to get information as soon as possible.
_Only_ if you are willing to risk not getting autoupdates (and have to flash your phone in future) - you can try the improved reporting for mir & unity-system-compositor by installing
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/landing-014
you'd need to be on OTA9.5 or latest rc-proposed image.

If you install this ci-train silo and experience this bug, we are highly interested in the /var/log/lightdm/unity-system-compositor.log

I'll follow up once we land these reporting improvements in the image and the silo is no longer needed.

Changed in mir:
importance: High → Critical
Changed in unity-system-compositor:
importance: High → Critical
Changed in mir (Ubuntu):
importance: High → Critical
Changed in unity-system-compositor (Ubuntu):
importance: High → Critical
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:unity-system-compositor at revision 274, scheduled for release in unity-system-compositor, milestone Unknown

Changed in unity-system-compositor:
status: In Progress → Fix Committed
Changed in unity-system-compositor:
status: Fix Committed → In Progress
Revision history for this message
kevin gunn (kgunn72) wrote :

OK, we've made a point release of mir 0.19.1 and unity-system-compositor 0.4.1 which are now in the stable overlay.
Which means will be built into the next image and available on the rc-proposed channel.
Also, this should make it into the OTA9.5 release.
If you update and know you are on these point releases and are someone who is seeing this bug occur, please attach your /var/log/lightdm/unity-system-compositor.log

Changed in unity-system-compositor:
milestone: none → 0.4.1
milestone: 0.4.1 → 0.4.2
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Fix landed in 0.19.2

Changed in mir:
status: In Progress → Fix Committed
Changed in canonical-devices-system-image:
status: Confirmed → Fix Committed
milestone: ww04-2016 → 9.1
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.3 KiB)

This bug was fixed in the package mir - 0.20.0+16.04.20160219-0ubuntu1

---------------
mir (0.20.0+16.04.20160219-0ubuntu1) xenial; urgency=medium

  [ Alan Griffiths ]
  * New upstream release 0.20.0 (https://launchpad.net/mir/+milestone/0.20.0)
    - ABI summary: Only servers need rebuilding;
      . mirclient ABI unchanged at 9
      . mirserver ABI bumped to 38
      . mircommon ABI unchanged at 5
      . mirplatform ABI unchanged at 11
      . mirprotobuf ABI unchanged at 3
      . mirplatformgraphics ABI bumped to 8
      . mirclientplatform ABI unchanged at 4
      . mirinputplatform ABI unchanged at 5
    - Enhancements:
      . Allow screencasting to create a virtual output (for Miracast)
      . Separate the protocol version number from the client API version macros.
        They're not meant to be related concepts.
      . Add UBSanitizer to the list of build types.
      . logging: Human readable timestamps in DumbConsoleLogger.
      . examples: AdorningDisplayBufferCompositor::composite() no long ignores
        output boundaries and occlusions.
      . examples: Add -a <app name> option to eglapps.
      . common, client: a more flexible way to probe modules: once we've found
        a good current platform we don't even try to load an older one.
      . Fix build and test run with CMAKE_BUILD_TYPE=ThreadSanitizer (missing
        locks).
      . Add MIR_USE_LD_GOLD build option.
    - Bug fixes:
      . unity-system-compositor crashed with std::runtime_error in
        mir::compositor::CompositingFunctor::wait_until_started() from
        usc::MirScreen::set_screen_power_mode (mir_power_mode_on)
        (LP: #1528384)
      . Phone not usable while a call comes in - followed by "restart"
        (LP: #1532607)
      . ui freezes when simultaneously moving mouse & plug/unplug hdmi
        (LP: #1538632)
      . Mir fails to build on xenial today: android_graphic_buffer_allocator.h
        fatal error: hardware/hardware.h: No such file or directory
        (LP: #1539338)
      . [mali] egl_demo_client_flicker has graphics corruption on android
        (LP: #1517205)
      . [testsfail] Intermittent failure in
        TestClientCursorAPI.cursor_passed_through_nested_server (LP: #1525003)
      . [android] External monitor slows rendering (LP: #1532202)
      . Display::create_gl_context may create context with incorrect attributes
        (LP: #1539268)
      . unity-system-compositor locked up in __libc_do_syscall() (LP: #1543594)
      . NestedServer.client_sees_set_scaling_factor intermittent failure
        (LP: #1537798)
      . [android] External monitor slows rendering - part 2 (LP: #1535894)
      . scene: make sure not to set the swapinterval to 0 when an independent
        stream is created. The default should be 1 (like the stream created as
        part of surface creation).
      . Track the displays plugged state to avoid reporting configurations in
        case they are unplugged (LP #1531503). [Cherrypicked from 0.21]
      . mouse pointer support on emulator is broken (LP: #1517597).
        [Cherrypicked from 0.21]
      . move an android-only test that ended up in tests/unit-tests/graphics.
        (LP: ##154...

Read more...

Changed in mir (Ubuntu):
status: Confirmed → Fix Released
Changed in mir:
status: Fix Committed → Fix Released
Changed in unity-system-compositor (Ubuntu):
status: Confirmed → Fix Released
Changed in canonical-devices-system-image:
status: Fix Committed → Fix Released
Changed in unity-system-compositor:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.