Mir

Merge lp:~afrantzis/mir/fix-1201436-more into lp:mir

fix-1201436-more
Merge into development-branch

Proposed by Alexandros Frantzis on 2013-10-18

Status:

Merged

Approved by:

Daniel van Vugt on 2013-10-22

Approved revision:

no longer in the source branch.

Merged at revision:

1154

Proposed branch:

lp:~afrantzis/mir/fix-1201436-more

Merge into:

Diff against target:

351 lines (+165/-49)

4 files modified

src/client/mir_client_library.cpp (+76/-24)
src/client/mir_connection.cpp (+13/-24)
src/client/rpc/mir_socket_rpc_channel.cpp (+1/-1)
tests/acceptance-tests/test_server_disconnect.cpp (+75/-0)

To merge this branch:

bzr merge lp:~afrantzis/mir/fix-1201436-more

Related bugs:

Bug #1201436: Intermittent hang in ClientPidTestFixture.authorizer_may_prevent_connection_of_clients test

High

Fix Released

Remove

Link a bug report

Reviewer	Review Type	Date Requested	Status
Daniel van Vugt			Abstain on 2013-10-22
Robert Carr (community)			Approve on 2013-10-21
Alan Griffiths		2013-10-18	Approve on 2013-10-18
PS Jenkins bot (community)	continuous-integration		Approve on 2013-10-18
Review via email: mp+191784@code.launchpad.net

Commit message

client: Allow clients to call API functions after a connection break has been detected

When a client tries to call an API function after a connection break has
been detected in a previous API call, the client blocks in the new call.
This happens because in MirSocketRpcChannel::notify_disconnected() the
pending RPC calls are not forced to complete, since the channel has already
been marked as 'disconnected' by the failure in the previous call.

Note that if the break is first detected while calling an API function,
then that call doesn't block, since this is the first time we call
MirSocketRpcChannel::notify_disconnected() and the pending RPC calls are
forced to complete.

This commit solves this problem by always forcing requests to complete
when a communication failure occurs, even if a disconnection has
already been handled. This is preferred over the alternative of
manually calling the completion callback in a try-catch block when
calling an RPC method because of:

  1. Correctness: In case the communication problem first occurs in that
     call, the callback will be called twice, once by notify_disconnected()
     and once manually.

  2. Consistency: The callback is called from one place regardless of
     whether the communication problem is first detected during that
     call or not.

Description of the change

client: Allow clients to call API functions after a connection break has been detected

When a client tries to call an API function after a connection break has
been detected in a previous API call, the client blocks in the new call.
This happens because in MirSocketRpcChannel::notify_disconnected() the
pending RPC calls are not forced to complete, since the channel has already
been marked as 'disconnected' by the failure in the previous call.

Note that if the break is first detected while calling an API function,
then that call doesn't block, since this is the first time we call
MirSocketRpcChannel::notify_disconnected() and the pending RPC calls are
forced to complete.

This commit solves this problem by always forcing requests to complete
when a communication failure occurs, even if a disconnection has
already been handled. This is preferred over the alternative of
manually calling the completion callback in a try-catch block when
calling an RPC method because of:

  1. Correctness: In case the communication problem first occurs in that
     call, the callback will be called twice, once by notify_disconnected()
     and once manually.

  2. Consistency: The callback is called from one place regardless of
     whether the communication problem is first detected during that
     call or not.

I took this opportunity to do some other (somewhat related) cleanup in the client API implemenation.

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2013-10-18:

#

PASSED: Continuous integration, rev:1150
http://jenkins.qa.ubuntu.com/job/mir-team-mir-development-branch-ci/195/
Executed test runs:
    SUCCESS: http://jenkins.qa.ubuntu.com/job/mir-android-saucy-i386-build/2417
    SUCCESS: http://jenkins.qa.ubuntu.com/job/mir-clang-saucy-amd64-build/2302
    SUCCESS: http://jenkins.qa.ubuntu.com/job/mir-team-mir-development-branch-saucy-amd64-ci/192
        deb: http://jenkins.qa.ubuntu.com/job/mir-team-mir-development-branch-saucy-amd64-ci/192/artifact/work/output/*zip*/output.zip

Click here to trigger a rebuild:
http://10.97.0.26:8080/job/mir-team-mir-development-branch-ci/195/rebuild

review: Approve (continuous-integration)

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2013-10-18:

#

PASSED: Continuous integration, rev:1150
http://jenkins.qa.ubuntu.com/job/mir-ci/1596/
Executed test runs:
    SUCCESS: http://jenkins.qa.ubuntu.com/job/mir-android-saucy-i386-build/2416
    SUCCESS: http://jenkins.qa.ubuntu.com/job/mir-clang-saucy-amd64-build/2301
    SUCCESS: http://jenkins.qa.ubuntu.com/job/mir-saucy-amd64-ci/840
        deb: http://jenkins.qa.ubuntu.com/job/mir-saucy-amd64-ci/840/artifact/work/output/*zip*/output.zip
    SUCCESS: http://jenkins.qa.ubuntu.com/job/mir-saucy-armhf-ci/97
        deb: http://jenkins.qa.ubuntu.com/job/mir-saucy-armhf-ci/97/artifact/work/output/*zip*/output.zip

Click here to trigger a rebuild:
http://10.97.0.26:8080/job/mir-ci/1596/rebuild

review: Approve (continuous-integration)

Revision history for this message

Alan Griffiths (alan-griffiths) wrote on 2013-10-18:

#

16 {
17 - return mir_connect_impl(socket_file, name, callback, context);
18 + try
19 + {
20 + return mir_connect_impl(socket_file, name, callback, context);
21 + }
22 + catch (std::exception const&)
23 + {
24 + return nullptr;
25 + }
26 }

Two things - this could be a function try block and save a couple of lines. And, surely mir_connect_impl() should not be throwing.

28 void mir_connection_release(MirConnection *connection)
29 {
30 - return mir_connection_release_impl(connection);
31 + try
32 + {
33 + return mir_connection_release_impl(connection);
34 + }
35 + catch (std::exception const&)
36 + {
37 + }
38 }

Ditto

Revision history for this message

Alan Griffiths (alan-griffiths) wrote on 2013-10-18:

#

128 + surf->configure(mir_surface_attrib_state,
129 + mir_surface_state_unknown)->wait_for_all();

I guess it is pre-existing - but this is a blocking call in a function that isn't flagged "_sync".

Revision history for this message

Alan Griffiths (alan-griffiths) wrote on 2013-10-18:

#

There's more work to do around here, but this doesn't create any new problems

review: Approve

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2013-10-21:

#

I suspect this conflicts with potential resolutions of bug 1239978, but don't have a strong opinion (yet).

review: Abstain

Revision history for this message

Robert Carr (robertcarr) wrote on 2013-10-21:

#

Good to unravel this. One of these days the stress test is going to pass...(I thought this might be related to the hanging client issue which remains there, but can't rationalize any path).

I do feel like we should be validating arguments better, but probably to return error codes, not to crash/throw as in 1239978. Let's save this as a topic for later though.

review: Approve

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2013-10-22:

#

I don't think it's a good idea to allow buggy clients to keep executing and silently skipping subsequent operations. Then Mir gets blamed for bugs in clients and we waste precious time tracking them down. See bug 1239978.

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2013-10-22:

#

Although, losing a connection for reasons beyond the client's control is not a bug in the client. I agree we should unblock it in that case. And now I look again, I don't think this is fundamentally incompatible with bug 1239978.

I'd only note that this being a C API, we should return a C error value (NULL) rather than nullptr.

review: Abstain

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alexandros Frantzis

Emanuele Antonio Faraone

Gerry Boland

Mir development team