Mir

TestClientInput/DemoPrivateProtobuf memory leak is causing regular CI test failures

Bug #1295231 reported by Kevin DuBois
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
High
Daniel van Vugt
mir (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Twice now in CI I've had branches (that are far away from the input stack) fail with a memory leak reported in TestClientInput in the acceptance tests.

https://jenkins.qa.ubuntu.com/job/mir-team-mir-development-branch-trusty-amd64-ci/848/console
https://jenkins.qa.ubuntu.com/job/mir-team-mir-development-branch-trusty-amd64-ci/851/console

I have not been able to reproduce this problem locally with valgrind and this test.

update: more failures on different branches
https://jenkins.qa.ubuntu.com/job/mir-team-mir-development-branch-trusty-amd64-ci/862/console
https://jenkins.qa.ubuntu.com/job/mir-team-mir-development-branch-trusty-amd64-ci/860/console

This seems to be racy... some branches are passing and some are not

Tags: testsfail

Related branches

Kevin DuBois (kdub)
description: updated
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Other sporadic CI test failures in the past have hinted that there are races in the input system/tests. Should be easy to discover with: valgrind --tool=helgrind

Changed in mir:
status: New → Confirmed
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I've had a look through the helgrind issues presented by one of these tests: InProcessTestClientInput.clients_receive_key_input

There's the usual stuff about std::atomic<bool> and in the asio code, but none gives an obvious cause of these problems. (It would be so much easier to be sure if these tests ran in a single process.)

I've also tried switching on use_debflags and running a 500 iterations of the test under valgrind without reproducing the problem.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I did the same and immediately found bug 1296544. Although I'm not 100% sure that the resulting deadlock (hung server) will cause this bug.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It appears to be an order problem. DemoPrivateProtobuf is leaking state into TestClientInput which comes soon after it:

[----------] 4 tests from DemoPrivateProtobuf
[ RUN ] DemoPrivateProtobuf.client_calls_server
[ OK ] DemoPrivateProtobuf.client_calls_server (469 ms)
[ RUN ] DemoPrivateProtobuf.wrapping_message_processor
[ OK ] DemoPrivateProtobuf.wrapping_message_processor (53 ms)
[ RUN ] DemoPrivateProtobuf.server_receives_function_call
[ OK ] DemoPrivateProtobuf.server_receives_function_call (108 ms)
[ RUN ] DemoPrivateProtobuf.client_receives_result
[ OK ] DemoPrivateProtobuf.client_receives_result (58 ms)
[----------] 4 tests from DemoPrivateProtobuf (688 ms total)

...

[----------] 9 tests from TestClientInput
[ RUN ] TestClientInput.clients_receive_key_input
==28645==
==28645== HEAP SUMMARY:
==28645== in use at exit: 19,006 bytes in 279 blocks
==28645== total heap usage: 47,756 allocs, 47,477 frees, 3,006,909 bytes allocated
==28645==
==28645== 3,442 (1,008 direct, 2,434 indirect) bytes in 1 blocks are definitely lost in loss record 246 of 246
==28645== at 0x4C2A980: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28645== by 0x5649B3F: mir_default_connect(char const*, char const*, void (*)(MirConnection*, void*), void*) (mir_client_library.cpp:104)
==28645== by 0x5648A94: mir_connect (mir_client_library.cpp:146)
==28645== by 0x56493C5: mir_connect_sync (mir_client_library.cpp:172)
==28645== by 0x61C80B: DemoPrivateProtobuf_client_calls_server_Test::TestBody() (test_protobuf.cpp:186)
...
[ FAILED ] TestClientInput.clients_receive_key_input (4229 ms)

Changed in mir:
assignee: nobody → Daniel van Vugt (vanvugt)
status: Confirmed → In Progress
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Method to reproduce the leak:

1. Edit TEST_F(DemoPrivateProtobuf, client_calls_server) and insert some output/inspection of "connection" immediately before
"mir_connection_release(connection);". This seems to be enough to trigger some kind of race.
2. Run: valgrind --leak-check=full bin/mir_acceptance_tests --gtest_filter="DemoPrivateProtobuf.client_calls_server"

Changed in mir:
milestone: none → 0.1.8
summary: - TestClientInput memory leak in CI
+ TestClientInput memory leak is causing regular CI test failures
summary: - TestClientInput memory leak is causing regular CI test failures
+ TestClientInput/DemoPrivateProtobuf memory leak is causing regular CI
+ test failures
Changed in mir:
importance: Medium → High
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir/devel at revision None, scheduled for release in mir, milestone Unknown

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → Fix Released
Changed in mir (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package mir - 0.1.8+14.04.20140411-0ubuntu1

---------------
mir (0.1.8+14.04.20140411-0ubuntu1) trusty; urgency=medium

  [ Daniel van Vugt ]
  * New upstream release 0.1.8 (https://launchpad.net/mir/+milestone/0.1.8)
    - mirclient ABI unchanged, still at 7. Clients do not need rebuilding.
    - mirserver ABI bumped to 18. Shells need rebuilding.
    - Server API changes affecting shells:
      . GLRenderer::tessellate() changed syntax.
      . graphics::Platform::create_display() has a new parameter allowing you
        to customize the compositor's (E)GL configutation.
      . Renderable::buffer(unsigned long frameno) is now:
        Renderable::buffer(void const* user_id). See below.
      . Renderable::should_be_rendered_in() is replaced by a more natural:
        Renderable::visble()
      . input::Surface::name() returns by value instead of reference now,
        to ensure future thread safety.
    - Switched EventHub device enumeration and hotplug to Udev. NOTE! This
      means mir_test_* can't run natively on touch devices any more without
      some setup first:
        sudo mount -o remount,rw /
        sudo apt-get update
        sudo apt-get install -y umockdev
        umockdev-run -- bin/mir_unit_tests
    - Added logging for HWC events.
    - Continued consolidation of Surface classes toward a simpler architecture.
    - Introduced "RenderableList" as the way to sample the Scene contents,
      and started using that in the default compositor.
    - Introduced physical length units and conversion (geometry::Length) in
      preparation for arbitrary DPI rendering.
    - Added some decorations to demo-shell; shadows and basic title bars, all
      anti-aliased and high-DPI scalable.
    - Multi-monitor frame sync has been redesigned to eliminate the need for
      frame number tracking.
    - Bugs (and enhancements) resolved:
      . [enhancement] Please move input detection to libudev (LP: #1237784)
      . [enhancement] Add a clamping resize mode to GLRenderer (LP: #1259887)
      . [regression] Intermittent loss of multimonitor frame sync
        (LP: #1290306)
      . [enhancement] Make GL config options configurable (LP: #1290780)
      . memcheck-test doesn't test anything when DISABLED_GTEST_DISCOVERY is
        enabled (LP: #1291876)
      . "Error opening DRM device" is always followed by "Unknown error -(some
        negative number)" (LP: #1292384)
      . Rendering/composition gets stopped early (LP: #1293896)
      . Ubuntu Touch Settings and terminal apps are not rendering correctly on
        rotate. (LP: #1294048)
      . [regression] Apps are much slower to open (LP: #1294051)
      . Settings app opens to a blank screen unless given enough time to render
        or the app is touched (LP: #1294053)
      . TestClientInput/DemoPrivateProtobuf memory leak is causing regular CI
        test failures (LP: #1295231)
      . OSK touch events "fall through" and hit surface behind them
        (LP: #1297878)
      . [enhancement] add a test for composite of last client post
        (LP: #1298596)
      . [regression] Surfaces vanish as soon as their edges touch the edge of
        screen (L...

Read more...

Changed in mir (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.