[qtcomp] Random crash in Mir input when running AP tests: [terminate called after throwing an instance of '...' what(): assign: File exists] when constructing a mir::AsioMainLoop::FDHandler

Bug #1346952 reported by Gerry Boland
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Critical
Andreas Pokorny
0.5
Fix Released
Critical
Andreas Pokorny
mir (Ubuntu)
Fix Released
Critical
Unassigned
qtmir (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Steps to repro:
1. Install QtCompositor - right now from silo6, see https://wiki.ubuntu.com/Unity8/QtComp
2. Use https://wiki.ubuntu.com/Touch/Testing to get your device set up for running an autopilot test. That can be summarized roughly into running these commands on your PC:

    phablet-config edges-intro --disable
    adb shell powerd-cli display on bright & #keeps the display on

3. Run an AP test. This one tends to expose the bug easily:
     phablet-test-run -p ubuntu-html5-ui-toolkit-autopilot ubuntu_html5_ui_toolkit

The crash we get has this stack trace:
http://paste.ubuntu.com/7836063/

"terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
what(): assign: File exists"

Which points onto the constructor of boost::asio::posix::basic_stream_descriptor which in case of the default epoll reactor will call epoll_ctl to add the given file descriptor. The error number EEXIST indicates that the file descriptor is already inside the set.

Tags: qtcomp

Related branches

Gerry Boland (gerboland)
summary: - Random crash in Mir input when running AP tests
+ [qtcomp] Random crash in Mir input when running AP tests
tags: added: qtcomp
description: updated
summary: - [qtcomp] Random crash in Mir input when running AP tests
+ [qtcomp] Random crash in Mir input when running AP tests: [terminate
+ called after throwing an instance of '...' what(): assign: File exists]
Changed in mir:
importance: Undecided → Critical
milestone: none → 0.6.0
summary: [qtcomp] Random crash in Mir input when running AP tests: [terminate
called after throwing an instance of '...' what(): assign: File exists]
+ when constructing a mir::AsioMainLoop::FDHandler
Revision history for this message
Daniel d'Andrada (dandrader) wrote :

The crash happens because the autopilot test sends key events very fast, one right next to the other. If we change it so that it waits a bit before generating the next key event, there's no crash:

--- /usr/lib/python2.7/dist-packages/ubuntu_html5_ui_toolkit/tests/__init__.py 2014-07-23 13:16:36.000000000 +0000
+++ /usr/lib/python2.7/dist-packages/ubuntu_html5_ui_toolkit/tests/__init__modified.py 2014-07-23 13:17:28.000000000 +0000
@@ -187,7 +187,7 @@
         addressbar = self.get_addressbar()
         self.assertThat(addressbar.activeFocus, Eventually(Equals(True)))

- self.keyboard.type(url, 0.001)
+ self.keyboard.type(url, 0.2)

         self.pointer.click_object(self.get_webview())
         time.sleep(1)

With that patch there's no crash but the typing is too slow (to the tests take way longer to complete). Some inbetween value should still work. Haven't tried any other though.

Revision history for this message
Andreas Pokorny (andreas-pokorny) wrote :

This crash is caused through interleaved register and unregister calls of fds. mirserver unregisters the FD as soon as the surface disappears or when all outstanding input responds have been received. The exception is thrown when the registration of an FD happens before the old FDHandler has been removed.

For this bug to work the following things are necessary:
 * user input is handled and dispatched in multiple threads - i.e. input responds are currently handled in main loop (we discussed about moving that into a separate thread - would be no difference since input is read in a dedicated thread)
 * user input and scene reconfigurations happen in multiple threads - imagine a surface gets removed while a user input event is sent
 * fd registration happens out of order in the callers thread, while the removals happen in main thread.

I am tempted to make fd registration like the removal, it would solve that issue but that might not guarantee the right order and have different side effects.

Changed in mir:
assignee: nobody → Andreas Pokorny (andreas-pokorny)
status: New → In Progress
Changed in qtmir:
status: New → Invalid
Changed in mir (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir/devel at revision None, scheduled for release in mir, milestone Unknown

Changed in mir:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mir - 0.5.1+14.10.20140728-0ubuntu1

---------------
mir (0.5.1+14.10.20140728-0ubuntu1) utopic; urgency=medium

  [ Kevin Gunn ]
  * Fixed: Crash due to racing input registration & surface removal
    (LP: #1346952)

  [ Ubuntu daily release ]
  * New rebuild forced
 -- Ubuntu daily release <email address hidden> Mon, 28 Jul 2014 02:49:50 +0000

Changed in mir (Ubuntu):
status: Triaged → Fix Released
Changed in mir:
status: Fix Committed → Fix Released
Michał Sawicz (saviq)
affects: qtmir → qtmir (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.