Mir

mir-ubuntu-vivid-armhf-ci fails consistently

Bug #1407863 reported by Cemil Azizoglu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Medium
Alexandros Frantzis
mir (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

GLibMainLoopTest.propagates_exception_from_signal_handler [ FAILED ]
GLibMainLoopTest.propagates_exception_from_fd_handler [ FAILED ] GLibMainLoopTest.propagates_exception_from_server_action [ FAILED ]
GLibMainLoopTest.can_be_rerun_after_exception [ FAILED ]
GLibMainLoopAlarmTest.propagates_exception_from_alarm [ FAILED ]
 GLibMainLoopForkTest.handles_signals_when_created_in_forked_process [FAILED]

Tags: testsfail

Related branches

Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

Happens consistently in mir 0.10 MP : https://code.launchpad.net/~mir-team/mir/development-branch/+merge/245589

Doesn't/didn't happen in the silo.

Changed in mir:
importance: Undecided → High
tags: added: testsfail
Changed in mir:
milestone: none → 0.10.0
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

The problem is also seen with a "null" changeset: https://code.launchpad.net/~alan-griffiths/mir/test/+merge/245639

This would appear to indicate this is a consequence of CI changes since the last release.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The bug appears to have been around a while. The same happened in November, threatening to block the release of 0.9.0. So we just merged it manually:
   https://code.launchpad.net/~mir-team/mir/0.9/+merge/242146

Changed in mir:
importance: High → Medium
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Reproduced the bug locally, I think. Just edit the cross-compile script to force -DCMAKE_BUILD_TYPE=Coverage and then try running the resulting tests on another host (e.g. a phone). Then I get failures that are simple gcov path lookup failures (those paths don't exist on the test host, only the build host):

[ RUN ] GLibMainLoopTest.propagates_exception_from_signal_handler
profiling:/home/dan:Cannot create directory
profiling:/home/dan/bzr/mir/cov/build-android-arm/3rd_party/xcursor/CMakeFiles/xcursorloader.dir/xcursor.c.gcda:Skip
profiling:/home/dan:Cannot create directory
...

It's not just GLibMainLoopTest though. If I disable that, the same issue appears in other tests.

summary: - GLibMainLoopTest fails
+ Tests fail on armhf with CMAKE_BUILD_TYPE=Coverage
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: mir-ubuntu-vivid-armhf-ci fails consistently (broken gcov support)

Oh, of course. The failing job "mir-ubuntu-vivid-armhf-ci" is one we don't run on merge proposals to lp:mir. Only proposals to lp:mir/ubuntu

:P

summary: - Tests fail on armhf with CMAKE_BUILD_TYPE=Coverage
+ mir-ubuntu-vivid-armhf-ci fails consistently (broken gcov support)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I'm not convinced that you're seeing the same problem:

Looking at the console output (https://jenkins.qa.ubuntu.com/job/mir-ubuntu-vivid-armhf-ci/14/consoleFull) it doesn't appear that the tests are being run on a different host to the build. (And it looks as though gcovr is installed and detected correctly.)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Maybe so, but it is suspicious that the technique in comment #4 makes the same tests fail.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Sure we've seen various problems with the toolchain integration on armhf - if dropping gcovr from this job works I don't think we need be too concerned short term.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It's also a little bit suspicious that we're hitting the Valgrind unhandled instruction problem at exactly the same time:

[ RUN ] GLibMainLoopTest.propagates_exception_from_signal_handler
==15334==
==15334== HEAP SUMMARY:
==15334== in use at exit: 32,289 bytes in 522 blocks
==15334== total heap usage: 31,911 allocs, 31,389 frees, 1,658,903 bytes allocated
==15334==
==15334== LEAK SUMMARY:
==15334== definitely lost: 0 bytes in 0 blocks
==15334== indirectly lost: 0 bytes in 0 blocks
==15334== possibly lost: 5,836 bytes in 159 blocks
==15334== still reachable: 26,453 bytes in 363 blocks
==15334== suppressed: 0 bytes in 0 blocks
==15334== Reachable blocks (those to which a pointer was found) are not shown.
==15334== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15334==
==15334== For counts of detected and suppressed errors, rerun with: -v
==15334== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3 from 3)
unknown file: Failure
C++ exception with description "Timeout while waiting for child to change state" thrown in the test body.
disInstr(thumb): unhandled instruction: 0xDEFF 0xF893
[ FAILED ] GLibMainLoopTest.propagates_exception_from_signal_handler (5953 ms)

[https://jenkins.qa.ubuntu.com/job/mir-ubuntu-vivid-armhf-ci/21/consoleFull]

Unhandled instructions will of course (should) lead to test failures.

Changed in mir:
milestone: 0.10.0 → 0.11.0
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

It's also interesting that a bit before the GLibMainLoop test failures we get some memory errors in other tests. Perhaps the stack has been corrupted?

8: [ RUN ] GMock.return_by_move
8: ==10949== Invalid write of size 4
8: ==10949== at 0x4D7AC06: ??? (in /lib/arm-linux-gnueabihf/libgcc_s.so.1)
8: ==10949== Address 0xbdb8c740 is on thread 1's stack
8: ==10949== 16 bytes below stack pointer
8: ==10949==
8: [ OK ] GMock.return_by_move (431 ms)
8: [----------] 1 test from GMock (496 ms total)

8: [ RUN ] RecursiveReadWriteMutex.can_be_read_locked_on_multiple_threads
8: ==10949== Thread 2:
8: ==10949== Invalid write of size 4

Changed in mir:
assignee: nobody → Alexandros Frantzis (afrantzis)
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> It's also interesting that a bit before the GLibMainLoop test failures we get some memory errors in other tests

Note I am referring to the latest instances of this bug:

http://jenkins.qa.ubuntu.com/job/mir-mediumtests-vivid-touch/980/console
http://jenkins.qa.ubuntu.com/job/mir-mediumtests-vivid-touch/978/console
http://jenkins.qa.ubuntu.com/job/mir-mediumtests-vivid-touch/977/console

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

Another interesting data point is that both gcc and libglib were upgraded when the failure started to occur:

build 976 (the last build that succeeds): gcc 4.9.2-10ubuntu1 , libglib2.0 2.43.2-1ubuntu1
build 977 (the first build that fails): gcc 4.9.2-10ubuntu2 , libglib2.0 2.43.3-1

Changed in mir:
status: New → In Progress
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

I can reproduce the issue locally. It seems to be a problem of valgrind being very slow when dealing with forks. Increasing the timeout fixes this problem locally.

summary: - mir-ubuntu-vivid-armhf-ci fails consistently (broken gcov support)
+ mir-ubuntu-vivid-armhf-ci fails consistently
Changed in mir:
milestone: 0.11.0 → 0.12.0
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.12.0

Changed in mir:
status: In Progress → Fix Committed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Merged into lp:mir/0.11 at revision 2283.

Changed in mir:
milestone: 0.12.0 → 0.11.0
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package mir - 0.11.0+15.04.20150209.1-0ubuntu1

---------------
mir (0.11.0+15.04.20150209.1-0ubuntu1) vivid; urgency=medium

  [ Daniel van Vugt ]
  * New upstream release 0.11.0 (https://launchpad.net/mir/+milestone/0.11.0)
    - Enhancements:
      . Lots more major plumbing in the Android code, on the path to
        supporting external displays.
      . Add support for clang 3.6.
      . Major redesign of server classes in mir::shell,scene and friends
        (still in progress).
      . Added client API for creating dialogs and tooltips.
      . Added new surface states: mir_surface_state_hidden and
        mir_surface_state_horizmaximized.
      . Performance: Use optimally efficient fragment shading when possible.
      . Performance: (Desktop) Composite using double buffering instead of
        triple to reduce visible lag.
      . mir_proving_server: Can now resize windows from any edge or corner
        using the existing Alt+middlebuttondrag.
      . mir_proving_server: Added some demo custom shaders (negative and
        high contrast modes: Super+N/C).
      . mir_proving_server: Can now close clients politely via Alt+F4.
      . Added MirPointerInputEvent (part of the new input API, the old
        MirMotionEvent is still supported also for now).
    - ABI summary: Servers need rebuilding, but clients do not;
      . Mirclient ABI unchanged at 8
      . Mircommon ABI unchanged at 3
      . Mirplatform ABI bumped to 6
      . Mirserver ABI bumped to 29
    - Bug fixes:
      . [regression] mir_demo_server exits immediately with boost
        bad_any_cast exception (LP: #1414630)
      . need way to position menus and tooltips (relative positioning to
        parent) (LP: #1324101)
      . GLibMainLoopTest failure seen in CI (LP: #1413748)
      . Clang builds fail in CI (LP: #1416317)
      . segfault in mir::compositor::GLProgramFamily::Shader::init()
        (LP: #1416482)
      . GLRenderer: The default fragment shader is sub-optimal for alpha=1.0
        (LP: #1350674)
      . mesa::DisplayBuffer::post_update is triple buffered - more laggy than
        it needs to be (LP: #1350725)
      . Cannot connect to nested server when started from a differen vt
        (LP: #1379266)
      . [testfail] AsioMainLoopAlarmTest fails in CI (LP: #1392256)
      . Compositor report inconsistently reports frame time during bypass,
        and render time otherwise (LP: #1408906)
      . [regression] mir_demo_client_fingerpaint doesn't paint anything any
        more (with the mouse) (LP: #1413139)
      . Hardware cursor is always slightly ahead of the composited image
        (LP: #1274408)
      . integration tests are outputting (too many) DisplayServer log
        messages (LP: #1408231)
      . [regression] deploy-and-test.sh doesn't work any more (unless you
        have umockdev installed already) (LP: #1413479)
      . Color Inverse on display. Toggle Negative Image (LP: #1400580)
      . mir-ubuntu-vivid-armhf-ci fails consistently (LP: #1407863)
      . Double-buffered surfaces may lag or freeze if event driven and not
        constantly redrawing (LP: #1395581)
      . Pointer motion and crossing events...

Read more...

Changed in mir (Ubuntu):
status: New → Fix Released
Changed in mir:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.