Mir

Merge lp:~afrantzis/mir/fix-1201436-more into lp:mir

Proposed by Alexandros Frantzis
Status: Merged
Approved by: Daniel van Vugt
Approved revision: no longer in the source branch.
Merged at revision: 1154
Proposed branch: lp:~afrantzis/mir/fix-1201436-more
Merge into: lp:mir
Diff against target: 351 lines (+165/-49)
4 files modified
src/client/mir_client_library.cpp (+76/-24)
src/client/mir_connection.cpp (+13/-24)
src/client/rpc/mir_socket_rpc_channel.cpp (+1/-1)
tests/acceptance-tests/test_server_disconnect.cpp (+75/-0)
To merge this branch: bzr merge lp:~afrantzis/mir/fix-1201436-more
Reviewer Review Type Date Requested Status
Daniel van Vugt Abstain
Robert Carr (community) Approve
Alan Griffiths Approve
PS Jenkins bot (community) continuous-integration Approve
Review via email: mp+191784@code.launchpad.net

Commit message

client: Allow clients to call API functions after a connection break has been detected

When a client tries to call an API function after a connection break has
been detected in a previous API call, the client blocks in the new call.
This happens because in MirSocketRpcChannel::notify_disconnected() the
pending RPC calls are not forced to complete, since the channel has already
been marked as 'disconnected' by the failure in the previous call.

Note that if the break is first detected while calling an API function,
then that call doesn't block, since this is the first time we call
MirSocketRpcChannel::notify_disconnected() and the pending RPC calls are
forced to complete.

This commit solves this problem by always forcing requests to complete
when a communication failure occurs, even if a disconnection has
already been handled. This is preferred over the alternative of
manually calling the completion callback in a try-catch block when
calling an RPC method because of:

  1. Correctness: In case the communication problem first occurs in that
     call, the callback will be called twice, once by notify_disconnected()
     and once manually.

  2. Consistency: The callback is called from one place regardless of
     whether the communication problem is first detected during that
     call or not.

Description of the change

client: Allow clients to call API functions after a connection break has been detected

When a client tries to call an API function after a connection break has
been detected in a previous API call, the client blocks in the new call.
This happens because in MirSocketRpcChannel::notify_disconnected() the
pending RPC calls are not forced to complete, since the channel has already
been marked as 'disconnected' by the failure in the previous call.

Note that if the break is first detected while calling an API function,
then that call doesn't block, since this is the first time we call
MirSocketRpcChannel::notify_disconnected() and the pending RPC calls are
forced to complete.

This commit solves this problem by always forcing requests to complete
when a communication failure occurs, even if a disconnection has
already been handled. This is preferred over the alternative of
manually calling the completion callback in a try-catch block when
calling an RPC method because of:

  1. Correctness: In case the communication problem first occurs in that
     call, the callback will be called twice, once by notify_disconnected()
     and once manually.

  2. Consistency: The callback is called from one place regardless of
     whether the communication problem is first detected during that
     call or not.

I took this opportunity to do some other (somewhat related) cleanup in the client API implemenation.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

16 {
17 - return mir_connect_impl(socket_file, name, callback, context);
18 + try
19 + {
20 + return mir_connect_impl(socket_file, name, callback, context);
21 + }
22 + catch (std::exception const&)
23 + {
24 + return nullptr;
25 + }
26 }

Two things - this could be a function try block and save a couple of lines. And, surely mir_connect_impl() should not be throwing.

28 void mir_connection_release(MirConnection *connection)
29 {
30 - return mir_connection_release_impl(connection);
31 + try
32 + {
33 + return mir_connection_release_impl(connection);
34 + }
35 + catch (std::exception const&)
36 + {
37 + }
38 }

Ditto

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

128 + surf->configure(mir_surface_attrib_state,
129 + mir_surface_state_unknown)->wait_for_all();

I guess it is pre-existing - but this is a blocking call in a function that isn't flagged "_sync".

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

There's more work to do around here, but this doesn't create any new problems

review: Approve
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I suspect this conflicts with potential resolutions of bug 1239978, but don't have a strong opinion (yet).

review: Abstain
Revision history for this message
Robert Carr (robertcarr) wrote :

Good to unravel this. One of these days the stress test is going to pass...(I thought this might be related to the hanging client issue which remains there, but can't rationalize any path).

I do feel like we should be validating arguments better, but probably to return error codes, not to crash/throw as in 1239978. Let's save this as a topic for later though.

review: Approve
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I don't think it's a good idea to allow buggy clients to keep executing and silently skipping subsequent operations. Then Mir gets blamed for bugs in clients and we waste precious time tracking them down. See bug 1239978.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Although, losing a connection for reasons beyond the client's control is not a bug in the client. I agree we should unblock it in that case. And now I look again, I don't think this is fundamentally incompatible with bug 1239978.

I'd only note that this being a C API, we should return a C error value (NULL) rather than nullptr.

review: Abstain

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/client/mir_client_library.cpp'
2--- src/client/mir_client_library.cpp 2013-10-17 05:51:24 +0000
3+++ src/client/mir_client_library.cpp 2013-10-18 11:29:15 +0000
4@@ -105,7 +105,7 @@
5 error_connections.insert(error_connection);
6 error_connection->set_error_message(x.what());
7 callback(error_connection, context);
8- return 0;
9+ return nullptr;
10 }
11 }
12
13@@ -137,12 +137,25 @@
14
15 MirWaitHandle* mir_connect(char const* socket_file, char const* name, mir_connected_callback callback, void * context)
16 {
17- return mir_connect_impl(socket_file, name, callback, context);
18+ try
19+ {
20+ return mir_connect_impl(socket_file, name, callback, context);
21+ }
22+ catch (std::exception const&)
23+ {
24+ return nullptr;
25+ }
26 }
27
28 void mir_connection_release(MirConnection *connection)
29 {
30- return mir_connection_release_impl(connection);
31+ try
32+ {
33+ return mir_connection_release_impl(connection);
34+ }
35+ catch (std::exception const&)
36+ {
37+ }
38 }
39
40 MirConnection *mir_connect_sync(char const *server,
41@@ -194,7 +207,7 @@
42 catch (std::exception const&)
43 {
44 // TODO callback with an error surface
45- return 0; // TODO
46+ return nullptr;
47 }
48
49 }
50@@ -222,7 +235,14 @@
51 MirSurface * surface,
52 mir_surface_callback callback, void * context)
53 {
54- return surface->release_surface(callback, context);
55+ try
56+ {
57+ return surface->release_surface(callback, context);
58+ }
59+ catch (std::exception const&)
60+ {
61+ return nullptr;
62+ }
63 }
64
65 void mir_surface_release_sync(MirSurface *surface)
66@@ -351,7 +371,7 @@
67 }
68 catch (std::exception const&)
69 {
70- return 0;
71+ return nullptr;
72 }
73
74 void mir_surface_swap_buffers_sync(MirSurface *surface)
75@@ -387,9 +407,16 @@
76 }
77
78 MirWaitHandle* mir_surface_set_type(MirSurface *surf,
79- MirSurfaceType type)
80+ MirSurfaceType type)
81 {
82- return surf ? surf->configure(mir_surface_attrib_type, type) : NULL;
83+ try
84+ {
85+ return surf ? surf->configure(mir_surface_attrib_type, type) : nullptr;
86+ }
87+ catch (std::exception const&)
88+ {
89+ return nullptr;
90+ }
91 }
92
93 MirSurfaceType mir_surface_get_type(MirSurface *surf)
94@@ -410,25 +437,38 @@
95
96 MirWaitHandle* mir_surface_set_state(MirSurface *surf, MirSurfaceState state)
97 {
98- return surf ? surf->configure(mir_surface_attrib_state, state) : NULL;
99+ try
100+ {
101+ return surf ? surf->configure(mir_surface_attrib_state, state) : nullptr;
102+ }
103+ catch (std::exception const&)
104+ {
105+ return nullptr;
106+ }
107 }
108
109 MirSurfaceState mir_surface_get_state(MirSurface *surf)
110 {
111 MirSurfaceState state = mir_surface_state_unknown;
112
113- if (surf)
114+ try
115 {
116- int s = surf->attrib(mir_surface_attrib_state);
117-
118- if (s == mir_surface_state_unknown)
119+ if (surf)
120 {
121- surf->configure(mir_surface_attrib_state,
122- mir_surface_state_unknown)->wait_for_all();
123- s = surf->attrib(mir_surface_attrib_state);
124+ int s = surf->attrib(mir_surface_attrib_state);
125+
126+ if (s == mir_surface_state_unknown)
127+ {
128+ surf->configure(mir_surface_attrib_state,
129+ mir_surface_state_unknown)->wait_for_all();
130+ s = surf->attrib(mir_surface_attrib_state);
131+ }
132+
133+ state = static_cast<MirSurfaceState>(s);
134 }
135-
136- state = static_cast<MirSurfaceState>(s);
137+ }
138+ catch (std::exception const&)
139+ {
140 }
141
142 return state;
143@@ -437,8 +477,16 @@
144 MirWaitHandle* mir_surface_set_swapinterval(MirSurface* surf, int interval)
145 {
146 if ((interval < 0) || (interval > 1))
147- return NULL;
148- return surf ? surf->configure(mir_surface_attrib_swapinterval, interval) : NULL;
149+ return nullptr;
150+
151+ try
152+ {
153+ return surf ? surf->configure(mir_surface_attrib_swapinterval, interval) : nullptr;
154+ }
155+ catch (std::exception const&)
156+ {
157+ return nullptr;
158+ }
159 }
160
161 int mir_surface_get_swapinterval(MirSurface* surf)
162@@ -462,8 +510,12 @@
163
164 MirWaitHandle* mir_connection_apply_display_config(MirConnection *connection, MirDisplayConfiguration* display_configuration)
165 {
166- if (!connection)
167- return NULL;
168-
169- return connection->configure_display(display_configuration);
170+ try
171+ {
172+ return connection ? connection->configure_display(display_configuration) : nullptr;
173+ }
174+ catch (std::exception const&)
175+ {
176+ return nullptr;
177+ }
178 }
179
180=== modified file 'src/client/mir_connection.cpp'
181--- src/client/mir_connection.cpp 2013-10-15 08:53:10 +0000
182+++ src/client/mir_connection.cpp 2013-10-18 11:29:15 +0000
183@@ -185,21 +185,17 @@
184
185 SurfaceRelease surf_release{surface, new_wait_handle, callback, context};
186
187- try
188- {
189- mir::protobuf::SurfaceId message;
190- message.set_value(surface->id());
191- server.release_surface(0, &message, &void_response,
192- gp::NewCallback(this, &MirConnection::released, surf_release));
193- }
194- catch (std::exception const& x)
195- {
196- set_error_message(std::string("release_surface: ") + x.what());
197- released(surf_release);
198- }
199-
200- std::lock_guard<std::mutex> rel_lock(release_wait_handle_guard);
201- release_wait_handles.push_back(new_wait_handle);
202+ mir::protobuf::SurfaceId message;
203+ message.set_value(surface->id());
204+
205+ {
206+ std::lock_guard<std::mutex> rel_lock(release_wait_handle_guard);
207+ release_wait_handles.push_back(new_wait_handle);
208+ }
209+
210+ server.release_surface(0, &message, &void_response,
211+ gp::NewCallback(this, &MirConnection::released, surf_release));
212+
213
214 return new_wait_handle;
215 }
216@@ -282,15 +278,8 @@
217 {
218 std::lock_guard<std::recursive_mutex> lock(mutex);
219
220- try
221- {
222- server.disconnect(0, &ignored, &ignored,
223- google::protobuf::NewCallback(this, &MirConnection::done_disconnect));
224- }
225- catch (std::exception const& x)
226- {
227- set_error_message(std::string("disconnect: ") + x.what());
228- }
229+ server.disconnect(0, &ignored, &ignored,
230+ google::protobuf::NewCallback(this, &MirConnection::done_disconnect));
231
232 return &disconnect_wait_handle;
233 }
234
235=== modified file 'src/client/rpc/mir_socket_rpc_channel.cpp'
236--- src/client/rpc/mir_socket_rpc_channel.cpp 2013-10-17 05:51:24 +0000
237+++ src/client/rpc/mir_socket_rpc_channel.cpp 2013-10-18 11:29:15 +0000
238@@ -92,8 +92,8 @@
239 io_service.stop();
240 socket.close();
241 lifecycle_control->call_lifecycle_event_handler(mir_lifecycle_connection_lost);
242- pending_calls.force_completion();
243 }
244+ pending_calls.force_completion();
245 }
246
247 void mclr::MirSocketRpcChannel::init()
248
249=== modified file 'tests/acceptance-tests/test_server_disconnect.cpp'
250--- tests/acceptance-tests/test_server_disconnect.cpp 2013-10-04 10:51:37 +0000
251+++ tests/acceptance-tests/test_server_disconnect.cpp 2013-10-18 11:29:15 +0000
252@@ -20,6 +20,7 @@
253
254 #include "mir_test_framework/display_server_test_fixture.h"
255 #include "mir_test_framework/cross_process_sync.h"
256+#include "mir_test/cross_process_action.h"
257
258 #include <gtest/gtest.h>
259 #include <gmock/gmock.h>
260@@ -27,6 +28,7 @@
261 #include <atomic>
262
263 namespace mtf = mir_test_framework;
264+namespace mt = mir::test;
265
266 using ServerDisconnect = mtf::BespokeDisplayServerTestFixture;
267
268@@ -75,6 +77,57 @@
269
270 mtf::CrossProcessSync sync;
271 };
272+
273+struct DisconnectingTestingClientConfiguration : mtf::TestingClientConfiguration
274+{
275+ void exec() override
276+ {
277+ MirConnection* connection{nullptr};
278+
279+ connect.exec([&] {
280+ connection = mir_connect_sync(mtf::test_socket_file().c_str() , __PRETTY_FUNCTION__);
281+ EXPECT_TRUE(mir_connection_is_valid(connection));
282+ /*
283+ * Set a null callback to avoid killing the process
284+ * (default callback raises SIGTERM).
285+ */
286+ mir_connection_set_lifecycle_event_callback(connection,
287+ null_lifecycle_callback,
288+ nullptr);
289+ });
290+
291+ create_surface.exec([&] {
292+ MirSurfaceParameters const parameters =
293+ {
294+ __PRETTY_FUNCTION__,
295+ 1, 1,
296+ mir_pixel_format_abgr_8888,
297+ mir_buffer_usage_hardware,
298+ mir_display_output_id_invalid
299+ };
300+ mir_connection_create_surface_sync(connection, &parameters);
301+ });
302+
303+ configure_display.exec([&] {
304+ auto config = mir_connection_create_display_config(connection);
305+ mir_wait_for(mir_connection_apply_display_config(connection, config));
306+ mir_display_config_destroy(config);
307+ });
308+
309+ disconnect.exec([&] {
310+ mir_connection_release(connection);
311+ });
312+ }
313+
314+ static void null_lifecycle_callback(MirConnection*, MirLifecycleState, void*)
315+ {
316+ }
317+
318+ mt::CrossProcessAction connect;
319+ mt::CrossProcessAction create_surface;
320+ mt::CrossProcessAction configure_display;
321+ mt::CrossProcessAction disconnect;
322+};
323 }
324
325 TEST_F(ServerDisconnect, client_detects_server_shutdown)
326@@ -91,3 +144,25 @@
327 shutdown_server_process();
328 });
329 }
330+
331+TEST_F(ServerDisconnect, client_can_call_connection_functions_after_connection_break_is_detected)
332+{
333+ TestingServerConfiguration server_config;
334+ launch_server_process(server_config);
335+
336+ DisconnectingTestingClientConfiguration client_config;
337+ launch_client_process(client_config);
338+
339+ run_in_test_process([this, &client_config]
340+ {
341+ client_config.connect();
342+ shutdown_server_process();
343+ /* While trying to create a surface the connection break will be detected */
344+ client_config.create_surface();
345+
346+ /* Trying to configure the display shouldn't block */
347+ client_config.configure_display();
348+ /* Trying to disconnect at this point shouldn't block */
349+ client_config.disconnect();
350+ });
351+}

Subscribers

People subscribed via source and target branches