Mir

Merge lp:~kdub/mir/fix-1560909 into lp:mir

Proposed by Kevin DuBois
Status: Rejected
Rejected by: Kevin DuBois
Proposed branch: lp:~kdub/mir/fix-1560909
Merge into: lp:mir
Diff against target: 30 lines (+10/-1)
1 file modified
examples/server_example_test_client.cpp (+10/-1)
To merge this branch: bzr merge lp:~kdub/mir/fix-1560909
Reviewer Review Type Date Requested Status
Alan Griffiths Disapprove
Daniel van Vugt Needs Information
Alberto Aguirre (community) Abstain
Mir CI Bot continuous-integration Needs Fixing
Cemil Azizoglu (community) Approve
Review via email: mp+315034@code.launchpad.net

Commit message

fix LP: #1560909 by increasing the time we give clients to shut down. Turns out that some clients like multistream can have a long tail end of stuff to do when shutting down, and on slow devices (eg krillin), this can cause the long shutdown to reported as failure.

Description of the change

fix LP: #1560909 by increasing the time we give clients to shut down. Turns out that some clients like multistream can have a long tail end of stuff to do when shutting down, and on slow devices (eg krillin), this can cause the long shutdown to reported as failure.

To post a comment you must log in.
Revision history for this message
Mir CI Bot (mir-ci-bot) wrote :

FAILED: Continuous integration, rev:3957
https://mir-jenkins.ubuntu.com/job/mir-ci/2759/
Executed test runs:
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-mir/3613/console
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/3687
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=vivid+overlay/3677
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial+overlay/3677
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/3677
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3640/artifact/output/*zip*/output.zip
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3640/console
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3640/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3640/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3640/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3640
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3640/artifact/output/*zip*/output.zip

Click here to trigger a rebuild:
https://mir-jenkins.ubuntu.com/job/mir-ci/2759/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Worth a try. (But we need to re-enable smoking mir_demo_client_multistream to evaluate.)

review: Needs Information
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I think +3 seconds is not significantly enough for a really slow loaded device. It should be 30 seconds or more.

review: Needs Fixing
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This is a recurring problem... some CI machines are occasionally so heavily loaded that waiting a few seconds for simple tests to complete is not enough. Although using 60s or 30s should be.

Revision history for this message
Kevin DuBois (kdub) wrote :

> Worth a try. (But we need to re-enable smoking mir_demo_client_multistream to
> evaluate.)

I could reproduce reliably with on device by using gdb/valgrind, and the increased timeout works there. Sure, will have to figure out a way to run CI on this branch with the test on.

Revision history for this message
Kevin DuBois (kdub) wrote :

I don't mind 30s for the test runner, but it does make the test server shutdown painful when running manually. Let me see if I can come up with a compromise.

Revision history for this message
Mir CI Bot (mir-ci-bot) wrote :

PASSED: Continuous integration, rev:3960
https://mir-jenkins.ubuntu.com/job/mir-ci/2775/
Executed test runs:
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-mir/3637
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/3712
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=vivid+overlay/3702
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial+overlay/3702
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/3702
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3664/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3664
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3664/artifact/output/*zip*/output.zip

Click here to trigger a rebuild:
https://mir-jenkins.ubuntu.com/job/mir-ci/2775/rebuild

review: Approve (continuous-integration)
Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

Sounds good to me.

review: Approve
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Still needs fixing. If there's one thing we've learnt from all these CI failures it's that 3-5 second timeouts are not enough for heavily loaded machines. We need 30-60 seconds instead.

If the test is properly written then a higher timeout costs us nothing - the timeout is never reached and the test completes sooner.

review: Needs Fixing
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

In this case, the timeout does cost, because it's a one-shot alarm that fires to check for the exit code.

So I'm good with this option, we can set the shutdown-timeout in the mir-smoke-test-runner.sh script.

review: Approve
Revision history for this message
Mir CI Bot (mir-ci-bot) wrote :

FAILED: Autolanding.
More details in the following jenkins job:
https://mir-jenkins.ubuntu.com/job/mir-autolanding/1009/
Executed test runs:
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-mir/3686/console
    None: https://mir-jenkins.ubuntu.com/job/generic-land-mp/1064/console
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/3765
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=vivid+overlay/3755
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial+overlay/3755
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/3755
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3713/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3713/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3713/artifact/output/*zip*/output.zip
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3713/console
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3713/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3713/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3713
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3713/artifact/output/*zip*/output.zip

review: Needs Fixing (continuous-integration)
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

Bah... it looks like the test has an actual issue.

17:44:19 running client mir_demo_client_multistream
17:44:19 [timestamp] Start : mir_demo_client_multistream 2017-01-21T17:44:19+0000
17:44:19 mir_demo_server --test-timeout=3 --shutdown-timeout=10 --test-client mir_demo_client_multistream

...

17:44:22 Quitting; have a nice day.
17:44:32 [2017-01-21 17:44:32.552497] server_example_test_client.cpp: Terminating client
17:44:32 [2017-01-21 17:44:32.553048] mirserver: Stopping
17:44:32 I: [FAILED] mir_demo_client_multistream
17:44:32 [timestamp] End : mir_demo_client_multistream 2017-01-21T17:44:32+0000
17:44:32 running client mir_demo_client_prompt_session

The test clearly received the signal, but even waiting 10s wasn't enough (and the test is the only thing running on the device so no actual super heavy load issues like in CI build machines).

There's at least 3 places where the test could hang after printing "Quitting; have a nice day."

  mir_buffer_stream_release_sync
  mir_window_release_sync
  mir_connection_release

review: Abstain
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Hmm, OK. It's a bit inaccurate to call this 'fix-1560909' when the actual fix will come later from CI tweaks. But at least this is half of it.

review: Abstain
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

Actually after digging into the issue (see https://bugs.launchpad.net/mir/+bug/1560909/comments/17) extending the shutdown timeout will not help.

See https://code.launchpad.net/~albaguirre/mir/fix-1560909/+merge/315322 for a fix.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

So do we need this branch? Alberto has commandeered bug 1560909

review: Needs Information
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Alberto's analysis in lp:1560909 implies this won't help.

review: Disapprove

Unmerged revisions

3960. By Kevin DuBois

merge in mir

3959. By Kevin DuBois

typo

3958. By Kevin DuBois

make shutdown timeout configurable

3957. By Kevin DuBois

increase test client shutdown alarm to allow some clients to take longer to shutdown

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'examples/server_example_test_client.cpp'
--- examples/server_example_test_client.cpp 2017-01-18 02:29:37 +0000
+++ examples/server_example_test_client.cpp 2017-01-19 13:42:06 +0000
@@ -95,8 +95,13 @@
95 static const char* const test_timeout_opt = "test-timeout";95 static const char* const test_timeout_opt = "test-timeout";
96 static const char* const test_timeout_descr = "Seconds to run before sending SIGTERM to client";96 static const char* const test_timeout_descr = "Seconds to run before sending SIGTERM to client";
9797
98 static const char* const test_shutdown_timeout_opt = "shutdown-timeout";
99 static const char* const test_shutdown_timeout_descr =
100 "Seconds to wait after sending SIGTERM to the client before considering the client hung";
101
98 server.add_configuration_option(test_client_opt, test_client_descr, mir::OptionType::string);102 server.add_configuration_option(test_client_opt, test_client_descr, mir::OptionType::string);
99 server.add_configuration_option(test_timeout_opt, test_timeout_descr, 10);103 server.add_configuration_option(test_timeout_opt, test_timeout_descr, 10);
104 server.add_configuration_option(test_shutdown_timeout_opt, test_shutdown_timeout_descr, 1);
100105
101 server.add_init_callback([&server, &context]106 server.add_init_callback([&server, &context]
102 {107 {
@@ -142,7 +147,11 @@
142 });147 });
143148
144 context.client_kill_action->reschedule_in(std::chrono::seconds(options->get<int>(test_timeout_opt)));149 context.client_kill_action->reschedule_in(std::chrono::seconds(options->get<int>(test_timeout_opt)));
145 context.server_stop_action->reschedule_in(std::chrono::seconds(options->get<int>(test_timeout_opt)+1));150 //give a generous 3s for clients to finish their shutdown to avoid burdened
151 //test machines from reporting failure when the clients are just taking a long time
152 //to shutdown (LP: #1560909)
153 context.server_stop_action->reschedule_in(std::chrono::seconds(
154 options->get<int>(test_timeout_opt)+ options->get<int>(test_shutdown_timeout_opt)));
146 }155 }
147 else156 else
148 {157 {

Subscribers

People subscribed via source and target branches