Mir

Intermittent test failure in ClientSurfaceEvents.client_can_query_current_orientation

Bug #1335741 reported by Alexandros Frantzis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Fix Released
High
Alan Griffiths
mir (Ubuntu)
Fix Released
High
Unassigned

Bug Description

ClientSurfaceEvents.client_can_query_current_orientation fails intermittently in CI (e.g., [1]).

I can reproduce locally (although not easily) with:

 valgrind bin/mir_acceptance_tests --gtest_filter=ClientSurfaceEvents.client_can_query_current_orientation --gtest_repeat=1000 --gtest_break_on_failure

or without valgrind if the system is under sufficient load.

[1] https://jenkins.qa.ubuntu.com/job/mir-team-mir-development-branch-utopic-amd64-ci/542/console

Tags: testsfail

Related branches

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :
Download full text (3.6 KiB)

Trying to reproduce with the above command line I don't see the above failure mode. I do see an "Invalid write". I won't file a separate bug until we've investigated further:

Repeating all tests (iteration 12) . . .

Note: Google Test filter = ClientSurfaceEvents.client_can_query_current_orientation
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ClientSurfaceEvents
[ RUN ] ClientSurfaceEvents.client_can_query_current_orientation
/home/alan/display_server/mir1/tests/acceptance-tests/test_client_surface_events.cpp:254: Failure
Value of: wait_for_event(mir_event_type_orientation, std::chrono::seconds(1))
  Actual: false
Expected: true
==3769== Invalid write of size 4
==3769== at 0x77D288: testing::UnitTest::AddTestPartResult(testing::TestPartResult::Type, char const*, int, std::string const&, std::string const&) (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x77212C: testing::internal::AssertHelper::operator=(testing::Message const&) const (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x601A77: ClientSurfaceEvents_client_can_query_current_orientation_Test::TestBody() (test_client_surface_events.cpp:254)
==3769== by 0x794335: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x78F58B: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x7772E2: testing::Test::Run() (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x777ABD: testing::TestInfo::Run() (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x77814D: testing::TestCase::Run() (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x77E797: testing::internal::UnitTestImpl::RunAllTests() (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x795713: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x7903E5: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== by 0x77D456: testing::UnitTest::Run() (in /home/alan/display_server/mir1/build/bin/mir_acceptance_tests)
==3769== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==3769==
==3769==
==3769== HEAP SUMMARY:
==3769== in use at exit: 296,994 bytes in 4,389 blocks
==3769== total heap usage: 154,055 allocs, 149,666 frees, 8,123,620 bytes allocated
==3769==
==3769== LEAK SUMMARY:
==3769== ...

Read more...

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

The failure I see (which does come from the same expectation that exists in the CI job) is "fixed" by adding the following to the start of the test:

    EXPECT_FALSE(wait_for_event(mir_event_type_orientation, std::chrono::milliseconds(1)));

I'm not yet sure why that affects behaviour but it strongly suggests some race during setup.

Changed in mir:
assignee: Alexandros Frantzis (afrantzis) → Alan Griffiths (alan-griffiths)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

The problem is that a surface gets mir_event_type_surface events following creation. But there is no ordering constraint that ensures all of these have arrived before the first mir_event_type_orientation event.

As a consequence we can get the mir_event_type_orientation event followed by a mir_event_type_surface before we call wait_for_event(). Which leads to a timeout and a test failure.

Changed in mir:
status: New → In Progress
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir/devel at revision None, scheduled for release in mir, milestone Unknown

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
milestone: none → 0.4.0
Changed in mir:
milestone: 0.4.0 → 0.5.0
Changed in mir (Ubuntu):
importance: Undecided → High
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mir - 0.5.0+14.10.20140717-0ubuntu1

---------------
mir (0.5.0+14.10.20140717-0ubuntu1) utopic; urgency=medium

  [ Daniel van Vugt ]
  * New upstream release 0.5.0 (https://launchpad.net/mir/+milestone/0.5.0)
    - mirclient ABI unchanged at 8. Clients do not need rebuilding.
    - mirserver ABI bumped to 23. Servers need rebuilding, but probably don't
      need modification:
      . DefaultServerConfiguration/Cursor API: Cursor interfaces changed, most
        notably CursorImages moved from ::mir::graphics to ::mir::input.
      . DefaultServerConfiguration: New "prompt" API.
      . DefaultServerConfiguration: "clock" member is now static.
      . SessionAuthorizer: New functions.
      . ServerConfiguration: New function added: the_prompt_connector().
    - Enhancements:
      . Add AddressSanitizer cmake build type.
      . frontend, client API, tests: add support for prompt session
        permissions and for client detecting errors.
      . server: Ensure our emergency cleanup handling infrastructure is
        signal-safe.
      . Implement and enable an xcursor based image loader for cursors.
      . Fix warnings raised by the new g++-4.9.
      . shared, scene: Introduce a generic listener collection.
      . MirMotionEvent: Define a struct typedef to allow for
        pointer_coordinates to be used individually.
    - Bugs fixed:
      . Nexus 10 leaks during overlay operations (LP: #1331769)
      . MultiThreadedCompositor deadlocks (LP: #1335311)
      . Intermittent test failure in ClientSurfaceEvents can client query
        orientation (LP: #1335741)
      . Intermittent test failure in ClientSurfaceEvents/OrientationEvents
        (LP: #1335752)
      . Intermittent memory error in ClientSurfaceEvents on
        orientation query (LP: #1335819)
      . mir_unit_tests.EventDistributorTest.* SEGFAULT (LP: #1338902)
      . [regression] Device locks randomly on welcome screen (LP: #1339700)
      . Intermittent deadlock when switching to session with custom display
        config & closing other session (LP: #1340669)
      . Mir cursor has no hotspot setting, assumes (0, 0) (LP: #1189775)
      . clang built mir_unit_tests.ProtobufSocketCommunicatorFD crashes
        intermittently (LP: #1300653)
      . g++-4.9 binary incompatibilities with libraries built with g++-4.8
        (LP: #1329089)
      . [test regression] SurfaceLoop fails sporadically on deleting surfaces
        for a disconnecting client (LP: #1335747)
      . Intermittent test failure ServerShutdown when clients are blocked
        (LP: #1335873)
      . [regression] mir_demo_client_multiwin is displayed with obviously
        wrong colours (LP: #1339471)
      . Partially onscreen surfaces not occluded when covered by another
        surface (LP: #1340078)
      . SurfaceConfigurator::attribute_set always say "unfocused" for focus
        property changes (LP: #1336548)
 -- Ubuntu daily release <email address hidden> Thu, 17 Jul 2014 07:58:53 +0000

Changed in mir (Ubuntu):
status: Triaged → Fix Released
Changed in mir:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.