Mir

Merge lp:~kdub/mir/fix-1560909 into lp:mir

Proposed by Kevin DuBois on 2017-01-18
Status: Rejected
Rejected by: Kevin DuBois on 2017-01-23
Proposed branch: lp:~kdub/mir/fix-1560909
Merge into: lp:mir
Diff against target: 30 lines (+10/-1)
1 file modified
examples/server_example_test_client.cpp (+10/-1)
To merge this branch: bzr merge lp:~kdub/mir/fix-1560909
Reviewer Review Type Date Requested Status
Alan Griffiths 2017-01-18 Disapprove on 2017-01-23
Daniel van Vugt Needs Information on 2017-01-23
Alberto Aguirre (community) Abstain on 2017-01-21
Mir CI Bot continuous-integration Needs Fixing on 2017-01-21
Cemil Azizoglu (community) Approve on 2017-01-19
Review via email: mp+315034@code.launchpad.net

Commit message

fix LP: #1560909 by increasing the time we give clients to shut down. Turns out that some clients like multistream can have a long tail end of stuff to do when shutting down, and on slow devices (eg krillin), this can cause the long shutdown to reported as failure.

Description of the change

fix LP: #1560909 by increasing the time we give clients to shut down. Turns out that some clients like multistream can have a long tail end of stuff to do when shutting down, and on slow devices (eg krillin), this can cause the long shutdown to reported as failure.

To post a comment you must log in.
Mir CI Bot (mir-ci-bot) wrote :

FAILED: Continuous integration, rev:3957
https://mir-jenkins.ubuntu.com/job/mir-ci/2759/
Executed test runs:
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-mir/3613/console
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/3687
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=vivid+overlay/3677
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial+overlay/3677
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/3677
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3640/artifact/output/*zip*/output.zip
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3640/console
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3640/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3640/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3640/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3640/artifact/output/*zip*/output.zip

Click here to trigger a rebuild:
https://mir-jenkins.ubuntu.com/job/mir-ci/2759/rebuild

review: Needs Fixing (continuous-integration)
Alan Griffiths (alan-griffiths) wrote :

Worth a try. (But we need to re-enable smoking mir_demo_client_multistream to evaluate.)

review: Needs Information
Daniel van Vugt (vanvugt) wrote :

I think +3 seconds is not significantly enough for a really slow loaded device. It should be 30 seconds or more.

review: Needs Fixing
Daniel van Vugt (vanvugt) wrote :

This is a recurring problem... some CI machines are occasionally so heavily loaded that waiting a few seconds for simple tests to complete is not enough. Although using 60s or 30s should be.

Kevin DuBois (kdub) wrote :

> Worth a try. (But we need to re-enable smoking mir_demo_client_multistream to
> evaluate.)

I could reproduce reliably with on device by using gdb/valgrind, and the increased timeout works there. Sure, will have to figure out a way to run CI on this branch with the test on.

Kevin DuBois (kdub) wrote :

I don't mind 30s for the test runner, but it does make the test server shutdown painful when running manually. Let me see if I can come up with a compromise.

Mir CI Bot (mir-ci-bot) wrote :

PASSED: Continuous integration, rev:3960
https://mir-jenkins.ubuntu.com/job/mir-ci/2775/
Executed test runs:
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-mir/3637
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/3712
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=vivid+overlay/3702
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial+overlay/3702
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/3702
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3664/artifact/output/*zip*/output.zip

Click here to trigger a rebuild:
https://mir-jenkins.ubuntu.com/job/mir-ci/2775/rebuild

review: Approve (continuous-integration)
Cemil Azizoglu (cemil-azizoglu) wrote :

Sounds good to me.

review: Approve
Daniel van Vugt (vanvugt) wrote :

Still needs fixing. If there's one thing we've learnt from all these CI failures it's that 3-5 second timeouts are not enough for heavily loaded machines. We need 30-60 seconds instead.

If the test is properly written then a higher timeout costs us nothing - the timeout is never reached and the test completes sooner.

review: Needs Fixing
Alberto Aguirre (albaguirre) wrote :

In this case, the timeout does cost, because it's a one-shot alarm that fires to check for the exit code.

So I'm good with this option, we can set the shutdown-timeout in the mir-smoke-test-runner.sh script.

review: Approve
Mir CI Bot (mir-ci-bot) wrote :

FAILED: Autolanding.
More details in the following jenkins job:
https://mir-jenkins.ubuntu.com/job/mir-autolanding/1009/
Executed test runs:
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-mir/3686/console
    None: https://mir-jenkins.ubuntu.com/job/generic-land-mp/1064/console
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/3765
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=vivid+overlay/3755
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial+overlay/3755
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/3755
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3713/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3713/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3713/artifact/output/*zip*/output.zip
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3713/console
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3713/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3713/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3713/artifact/output/*zip*/output.zip

review: Needs Fixing (continuous-integration)
Alberto Aguirre (albaguirre) wrote :

Bah... it looks like the test has an actual issue.

17:44:19 running client mir_demo_client_multistream
17:44:19 [timestamp] Start : mir_demo_client_multistream 2017-01-21T17:44:19+0000
17:44:19 mir_demo_server --test-timeout=3 --shutdown-timeout=10 --test-client mir_demo_client_multistream

...

17:44:22 Quitting; have a nice day.
17:44:32 [2017-01-21 17:44:32.552497] server_example_test_client.cpp: Terminating client
17:44:32 [2017-01-21 17:44:32.553048] mirserver: Stopping
17:44:32 I: [FAILED] mir_demo_client_multistream
17:44:32 [timestamp] End : mir_demo_client_multistream 2017-01-21T17:44:32+0000
17:44:32 running client mir_demo_client_prompt_session

The test clearly received the signal, but even waiting 10s wasn't enough (and the test is the only thing running on the device so no actual super heavy load issues like in CI build machines).

There's at least 3 places where the test could hang after printing "Quitting; have a nice day."

  mir_buffer_stream_release_sync
  mir_window_release_sync
  mir_connection_release

review: Abstain
Daniel van Vugt (vanvugt) wrote :

Hmm, OK. It's a bit inaccurate to call this 'fix-1560909' when the actual fix will come later from CI tweaks. But at least this is half of it.

review: Abstain
Alberto Aguirre (albaguirre) wrote :

Actually after digging into the issue (see https://bugs.launchpad.net/mir/+bug/1560909/comments/17) extending the shutdown timeout will not help.

See https://code.launchpad.net/~albaguirre/mir/fix-1560909/+merge/315322 for a fix.

Daniel van Vugt (vanvugt) wrote :

So do we need this branch? Alberto has commandeered bug 1560909

review: Needs Information
Alan Griffiths (alan-griffiths) wrote :

Alberto's analysis in lp:1560909 implies this won't help.

review: Disapprove

Unmerged revisions

3960. By Kevin DuBois on 2017-01-19

merge in mir

3959. By Kevin DuBois on 2017-01-19

typo

3958. By Kevin DuBois on 2017-01-19

make shutdown timeout configurable

3957. By Kevin DuBois on 2017-01-18

increase test client shutdown alarm to allow some clients to take longer to shutdown

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'examples/server_example_test_client.cpp'
2--- examples/server_example_test_client.cpp 2017-01-18 02:29:37 +0000
3+++ examples/server_example_test_client.cpp 2017-01-19 13:42:06 +0000
4@@ -95,8 +95,13 @@
5 static const char* const test_timeout_opt = "test-timeout";
6 static const char* const test_timeout_descr = "Seconds to run before sending SIGTERM to client";
7
8+ static const char* const test_shutdown_timeout_opt = "shutdown-timeout";
9+ static const char* const test_shutdown_timeout_descr =
10+ "Seconds to wait after sending SIGTERM to the client before considering the client hung";
11+
12 server.add_configuration_option(test_client_opt, test_client_descr, mir::OptionType::string);
13 server.add_configuration_option(test_timeout_opt, test_timeout_descr, 10);
14+ server.add_configuration_option(test_shutdown_timeout_opt, test_shutdown_timeout_descr, 1);
15
16 server.add_init_callback([&server, &context]
17 {
18@@ -142,7 +147,11 @@
19 });
20
21 context.client_kill_action->reschedule_in(std::chrono::seconds(options->get<int>(test_timeout_opt)));
22- context.server_stop_action->reschedule_in(std::chrono::seconds(options->get<int>(test_timeout_opt)+1));
23+ //give a generous 3s for clients to finish their shutdown to avoid burdened
24+ //test machines from reporting failure when the clients are just taking a long time
25+ //to shutdown (LP: #1560909)
26+ context.server_stop_action->reschedule_in(std::chrono::seconds(
27+ options->get<int>(test_timeout_opt)+ options->get<int>(test_shutdown_timeout_opt)));
28 }
29 else
30 {

Subscribers

People subscribed via source and target branches