Merge lp:~vanvugt/mir/fix-1653658 into lp:mir
- fix-1653658
- Merge into development-branch
Status: | Superseded |
---|---|
Proposed branch: | lp:~vanvugt/mir/fix-1653658 |
Merge into: | lp:mir |
Diff against target: |
312 lines (+110/-11) 11 files modified
src/client/mir_connection.cpp (+8/-3) src/client/rpc/mir_basic_rpc_channel.cpp (+23/-8) src/client/rpc/mir_basic_rpc_channel.h (+8/-0) src/client/rpc/mir_protobuf_rpc_channel.cpp (+24/-0) src/client/rpc/mir_protobuf_rpc_channel.h (+3/-0) tests/acceptance-tests/test_client_library.cpp (+14/-0) tests/include/mir/test/doubles/stub_connection_configuration.h (+6/-0) tests/unit-tests/client/test_client_mir_surface.cpp (+6/-0) tests/unit-tests/client/test_mir_connection.cpp (+9/-0) tests/unit-tests/client/test_mir_render_surface.cpp (+3/-0) tests/unit-tests/platforms/android/client/test_android_client_platform.cpp (+6/-0) |
To merge this branch: | bzr merge lp:~vanvugt/mir/fix-1653658 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Mir CI Bot | continuous-integration | Approve | |
Daniel van Vugt | Abstain | ||
Chris Halse Rogers | Approve | ||
Review via email: mp+314055@code.launchpad.net |
Commit message
Fix race between MirConnection's destructor and its RPC callbacks.
(LP: #1653658)
This ensures RPC callbacks never occur (so never touch the MirConnection) after destruction of the MirConnection has commenced.
Description of the change
Mir CI Bot (mir-ci-bot) wrote : | # |
Mir CI Bot (mir-ci-bot) wrote : | # |
FAILED: Continuous integration, rev:3905
https:/
Executed test runs:
FAILURE: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
FAILURE: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
Click here to trigger a rebuild:
https:/
Daniel van Vugt (vanvugt) wrote : | # |
Weird. I just ran those tests on my own krillin and they pass just fine.
Then again those above crashes on krillin are not new. I had seen them several times before this branch ever existed.
Chris Halse Rogers (raof) wrote : | # |
Looks good.
Nonblocking nits:
We should fix it so that MirConnection:
It's a bit weird that complete_response searches through pending_calls twice, but I can see why you've done it that way. Doing it better probably requires deeper changes.
Mir CI Bot (mir-ci-bot) wrote : | # |
FAILED: Continuous integration, rev:3906
https:/
Executed test runs:
FAILURE: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
FAILURE: https:/
FAILURE: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
Click here to trigger a rebuild:
https:/
Daniel van Vugt (vanvugt) wrote : | # |
Hmm the only unexpected failure is still krillin:
05:52:44 [2017-01-05 05:52:44.153385] server_
05:52:44 [2017-01-05 05:52:44.153647] mirserver: Stopping
05:52:44 I: [FAILED] mir_demo_
05:52:44 [timestamp] End : mir_demo_
But it works on my krillin :/
Mir CI Bot (mir-ci-bot) wrote : | # |
FAILED: Continuous integration, rev:3907
https:/
Executed test runs:
FAILURE: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
FAILURE: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
FAILURE: https:/
Click here to trigger a rebuild:
https:/
Daniel van Vugt (vanvugt) wrote : | # |
Points for consistency. All tests pass on krillin but some are leaving behind core files, or something, which makes it a failure...
Daniel van Vugt (vanvugt) wrote : | # |
OK, two issues found so far:
(1) My krillin can't generate core files even with ulimit set. Had to also set /proc/sys/
(2) Bug 1654478 might be related.
Mir CI Bot (mir-ci-bot) wrote : | # |
FAILED: Continuous integration, rev:3908
https:/
Executed test runs:
FAILURE: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
FAILURE: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
FAILURE: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
Click here to trigger a rebuild:
https:/
Daniel van Vugt (vanvugt) wrote : | # |
Finally. I think the clang amd64 run is showing the bug here.
Daniel van Vugt (vanvugt) : | # |
Mir CI Bot (mir-ci-bot) wrote : | # |
FAILED: Continuous integration, rev:3909
https:/
Executed test runs:
FAILURE: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
FAILURE: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
FAILURE: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
Click here to trigger a rebuild:
https:/
Daniel van Vugt (vanvugt) : | # |
Mir CI Bot (mir-ci-bot) wrote : | # |
PASSED: Continuous integration, rev:3911
https:/
Executed test runs:
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
Click here to trigger a rebuild:
https:/
Preview Diff
1 | === modified file 'src/client/mir_connection.cpp' |
2 | --- src/client/mir_connection.cpp 2017-01-04 14:53:35 +0000 |
3 | +++ src/client/mir_connection.cpp 2017-01-06 05:34:08 +0000 |
4 | @@ -309,9 +309,11 @@ |
5 | |
6 | MirConnection::~MirConnection() noexcept |
7 | { |
8 | - // We don't die while if are pending callbacks (as they touch this). |
9 | - // But, if after 500ms we don't get a call, assume it won't happen. |
10 | - connect_wait_handle.wait_for_pending(std::chrono::milliseconds(500)); |
11 | + if (channel) // some tests don't have one |
12 | + { |
13 | + channel->discard_future_calls(); |
14 | + channel->wait_for_outstanding_calls(); |
15 | + } |
16 | |
17 | std::lock_guard<decltype(mutex)> lock(mutex); |
18 | surface_map.reset(); |
19 | @@ -643,6 +645,9 @@ |
20 | server.disconnect(ignored.get(), ignored.get(), |
21 | google::protobuf::NewCallback(this, &MirConnection::done_disconnect)); |
22 | |
23 | + if (channel) |
24 | + channel->discard_future_calls(); |
25 | + |
26 | return &disconnect_wait_handle; |
27 | } |
28 | |
29 | |
30 | === modified file 'src/client/rpc/mir_basic_rpc_channel.cpp' |
31 | --- src/client/rpc/mir_basic_rpc_channel.cpp 2016-07-18 07:38:38 +0000 |
32 | +++ src/client/rpc/mir_basic_rpc_channel.cpp 2017-01-06 05:34:08 +0000 |
33 | @@ -83,9 +83,10 @@ |
34 | auto call = pending_calls.find(result.id()); |
35 | if (call != pending_calls.end()) |
36 | { |
37 | - completion = call->second; |
38 | + completion = std::move(call->second); |
39 | pending_calls.erase(call); |
40 | } |
41 | + ++running_callbacks; |
42 | } |
43 | |
44 | if (!completion.complete) |
45 | @@ -97,29 +98,43 @@ |
46 | rpc_report->complete_response(result); |
47 | completion.complete->Run(); |
48 | } |
49 | + |
50 | + { |
51 | + std::unique_lock<std::mutex> lock(mutex); |
52 | + --running_callbacks; |
53 | + pending_calls_shrank.notify_all(); |
54 | + } |
55 | } |
56 | |
57 | void mclrd::PendingCallCache::force_completion() |
58 | { |
59 | std::unique_lock<std::mutex> lock(mutex); |
60 | - for (auto& call : pending_calls) |
61 | + ++running_callbacks; |
62 | + while (!pending_calls.empty()) |
63 | { |
64 | - auto& completion = call.second; |
65 | + auto i = pending_calls.begin(); |
66 | + auto completion = std::move(i->second); |
67 | + pending_calls.erase(i); |
68 | + lock.unlock(); |
69 | completion.complete->Run(); |
70 | + lock.lock(); |
71 | } |
72 | - |
73 | - pending_calls.erase(pending_calls.begin(), pending_calls.end()); |
74 | + --running_callbacks; |
75 | + pending_calls_shrank.notify_all(); |
76 | } |
77 | |
78 | - |
79 | - |
80 | bool mclrd::PendingCallCache::empty() const |
81 | { |
82 | std::unique_lock<std::mutex> lock(mutex); |
83 | return pending_calls.empty(); |
84 | } |
85 | |
86 | - |
87 | +void mclrd::PendingCallCache::wait_till_complete() const |
88 | +{ |
89 | + std::unique_lock<std::mutex> lock(mutex); |
90 | + while (running_callbacks || !pending_calls.empty()) |
91 | + pending_calls_shrank.wait(lock); |
92 | +} |
93 | |
94 | mclr::MirBasicRpcChannel::MirBasicRpcChannel() : |
95 | next_message_id(0), |
96 | |
97 | === modified file 'src/client/rpc/mir_basic_rpc_channel.h' |
98 | --- src/client/rpc/mir_basic_rpc_channel.h 2016-01-29 08:18:22 +0000 |
99 | +++ src/client/rpc/mir_basic_rpc_channel.h 2017-01-06 05:34:08 +0000 |
100 | @@ -24,6 +24,7 @@ |
101 | #include <mutex> |
102 | #include <atomic> |
103 | #include <vector> |
104 | +#include <condition_variable> |
105 | |
106 | namespace google |
107 | { |
108 | @@ -77,6 +78,8 @@ |
109 | |
110 | bool empty() const; |
111 | |
112 | + void wait_till_complete() const; |
113 | + |
114 | private: |
115 | |
116 | struct PendingCall |
117 | @@ -94,6 +97,8 @@ |
118 | }; |
119 | |
120 | std::mutex mutable mutex; |
121 | + std::condition_variable mutable pending_calls_shrank; |
122 | + int running_callbacks = 0; |
123 | std::map<int, PendingCall> pending_calls; |
124 | std::shared_ptr<RpcReport> const rpc_report; |
125 | }; |
126 | @@ -110,6 +115,9 @@ |
127 | google::protobuf::MessageLite* response, |
128 | google::protobuf::Closure* complete) = 0; |
129 | |
130 | + virtual void discard_future_calls() = 0; |
131 | + virtual void wait_for_outstanding_calls() = 0; |
132 | + |
133 | protected: |
134 | MirBasicRpcChannel(); |
135 | mir::protobuf::wire::Invocation invocation_for( |
136 | |
137 | === modified file 'src/client/rpc/mir_protobuf_rpc_channel.cpp' |
138 | --- src/client/rpc/mir_protobuf_rpc_channel.cpp 2016-12-23 07:58:44 +0000 |
139 | +++ src/client/rpc/mir_protobuf_rpc_channel.cpp 2017-01-06 05:34:08 +0000 |
140 | @@ -197,6 +197,20 @@ |
141 | google::protobuf::MessageLite* response, |
142 | google::protobuf::Closure* complete) |
143 | { |
144 | + if (discard) |
145 | + { |
146 | + /* |
147 | + * Until recently we had no explicit plan for what to do in this case. |
148 | + * Callbacks would race with destruction of the MirConnection and either |
149 | + * succeed, deadlock, crash or corrupt memory. However the one apparent |
150 | + * intent in the old plan was that we close all closures so that the |
151 | + * user doesn't leak any memory. So do that... |
152 | + */ |
153 | + if (complete) |
154 | + complete->Run(); |
155 | + return; |
156 | + } |
157 | + |
158 | // Only send message when details saved for handling response |
159 | std::vector<mir::Fd> fds; |
160 | if (parameters->GetTypeName() == "mir.protobuf.BufferRequest") |
161 | @@ -228,6 +242,16 @@ |
162 | send_message(invocation, invocation, fds); |
163 | } |
164 | |
165 | +void mclr::MirProtobufRpcChannel::discard_future_calls() |
166 | +{ |
167 | + discard = true; |
168 | +} |
169 | + |
170 | +void mclr::MirProtobufRpcChannel::wait_for_outstanding_calls() |
171 | +{ |
172 | + pending_calls.wait_till_complete(); |
173 | +} |
174 | + |
175 | void mclr::MirProtobufRpcChannel::send_message( |
176 | mir::protobuf::wire::Invocation const& body, |
177 | mir::protobuf::wire::Invocation const& invocation, |
178 | |
179 | === modified file 'src/client/rpc/mir_protobuf_rpc_channel.h' |
180 | --- src/client/rpc/mir_protobuf_rpc_channel.h 2016-12-13 06:00:32 +0000 |
181 | +++ src/client/rpc/mir_protobuf_rpc_channel.h 2017-01-06 05:34:08 +0000 |
182 | @@ -98,10 +98,13 @@ |
183 | google::protobuf::MessageLite const* parameters, |
184 | google::protobuf::MessageLite* response, |
185 | google::protobuf::Closure* complete) override; |
186 | + void discard_future_calls() override; |
187 | + void wait_for_outstanding_calls() override; |
188 | |
189 | private: |
190 | std::shared_ptr<RpcReport> const rpc_report; |
191 | detail::PendingCallCache pending_calls; |
192 | + std::atomic_bool discard{false}; |
193 | |
194 | static constexpr size_t size_of_header = 2; |
195 | detail::SendBuffer header_bytes; |
196 | |
197 | === modified file 'tests/acceptance-tests/test_client_library.cpp' |
198 | --- tests/acceptance-tests/test_client_library.cpp 2016-12-17 00:52:45 +0000 |
199 | +++ tests/acceptance-tests/test_client_library.cpp 2017-01-06 05:34:08 +0000 |
200 | @@ -246,6 +246,20 @@ |
201 | mir_connection_release(connection); |
202 | } |
203 | |
204 | +TEST_F(ClientLibrary, shutdown_race_is_resolved_safely) |
205 | +{ // An attempt at a regression test for race LP: #1653658 |
206 | + connection = mir_connect_sync(new_connection().c_str(), __PRETTY_FUNCTION__); |
207 | + auto const spec = mir_connection_create_spec_for_normal_surface( |
208 | + connection, 123, 456, mir_pixel_format_abgr_8888); |
209 | + surface = mir_surface_create_sync(spec); |
210 | + mir_surface_spec_release(spec); |
211 | + |
212 | + EXPECT_THAT(surface, IsValid()); |
213 | + |
214 | + mir_surface_release(surface, [](MirSurface*, void*){ sleep(1); }, NULL); |
215 | + mir_connection_release(connection); |
216 | +} |
217 | + |
218 | TEST_F(ClientLibrary, can_set_surface_state) |
219 | { |
220 | connection = mir_connect_sync(new_connection().c_str(), __PRETTY_FUNCTION__); |
221 | |
222 | === modified file 'tests/include/mir/test/doubles/stub_connection_configuration.h' |
223 | --- tests/include/mir/test/doubles/stub_connection_configuration.h 2016-12-23 05:15:02 +0000 |
224 | +++ tests/include/mir/test/doubles/stub_connection_configuration.h 2017-01-06 05:34:08 +0000 |
225 | @@ -42,6 +42,12 @@ |
226 | channel_call_count++; |
227 | c->Run(); |
228 | } |
229 | + void discard_future_calls() override |
230 | + { |
231 | + } |
232 | + void wait_for_outstanding_calls() override |
233 | + { |
234 | + } |
235 | mir::Fd watch_fd() const |
236 | { |
237 | int fd[2]; |
238 | |
239 | === modified file 'tests/unit-tests/client/test_client_mir_surface.cpp' |
240 | --- tests/unit-tests/client/test_client_mir_surface.cpp 2016-12-23 07:58:44 +0000 |
241 | +++ tests/unit-tests/client/test_client_mir_surface.cpp 2017-01-06 05:34:08 +0000 |
242 | @@ -261,6 +261,12 @@ |
243 | { |
244 | delete closure; |
245 | } |
246 | + void discard_future_calls() override |
247 | + { |
248 | + } |
249 | + void wait_for_outstanding_calls() override |
250 | + { |
251 | + } |
252 | }; |
253 | |
254 | void null_connected_callback(MirConnection* /*connection*/, void * /*client_context*/) |
255 | |
256 | === modified file 'tests/unit-tests/client/test_mir_connection.cpp' |
257 | --- tests/unit-tests/client/test_mir_connection.cpp 2016-12-23 07:58:44 +0000 |
258 | +++ tests/unit-tests/client/test_mir_connection.cpp 2017-01-06 05:34:08 +0000 |
259 | @@ -156,6 +156,9 @@ |
260 | complete->Run(); |
261 | } |
262 | |
263 | + MOCK_METHOD0(discard_future_calls, void()); |
264 | + MOCK_METHOD0(wait_for_outstanding_calls, void()); |
265 | + |
266 | MOCK_METHOD2(on_buffer_stream_create, void(mp::BufferStream&, google::protobuf::Closure* complete)); |
267 | MOCK_METHOD2(connect, void(mp::ConnectParameters const*,mp::Connection*)); |
268 | MOCK_METHOD1(configure_display_sent, void(mp::DisplayConfiguration const*)); |
269 | @@ -716,6 +719,12 @@ |
270 | { |
271 | delete closure; |
272 | } |
273 | + void discard_future_calls() override |
274 | + { |
275 | + } |
276 | + void wait_for_outstanding_calls() override |
277 | + { |
278 | + } |
279 | }; |
280 | TestConnectionConfiguration conf{ |
281 | mock_platform, std::make_shared<NiceMock<FakeRpcChannel>>(), mock_buffer_allocator }; |
282 | |
283 | === modified file 'tests/unit-tests/client/test_mir_render_surface.cpp' |
284 | --- tests/unit-tests/client/test_mir_render_surface.cpp 2016-12-13 06:00:32 +0000 |
285 | +++ tests/unit-tests/client/test_mir_render_surface.cpp 2017-01-06 05:34:08 +0000 |
286 | @@ -87,6 +87,9 @@ |
287 | complete->Run(); |
288 | } |
289 | |
290 | + MOCK_METHOD0(discard_future_calls, void()); |
291 | + MOCK_METHOD0(wait_for_outstanding_calls, void()); |
292 | + |
293 | MOCK_METHOD2(on_buffer_stream_create, void(mp::BufferStream&, google::protobuf::Closure* complete)); |
294 | |
295 | MOCK_CONST_METHOD0(watch_fd, mir::Fd()); |
296 | |
297 | === modified file 'tests/unit-tests/platforms/android/client/test_android_client_platform.cpp' |
298 | --- tests/unit-tests/platforms/android/client/test_android_client_platform.cpp 2016-12-23 07:58:44 +0000 |
299 | +++ tests/unit-tests/platforms/android/client/test_android_client_platform.cpp 2017-01-06 05:34:08 +0000 |
300 | @@ -204,6 +204,12 @@ |
301 | channel_call_count++; |
302 | c->Run(); |
303 | } |
304 | + void discard_future_calls() override |
305 | + { |
306 | + } |
307 | + void wait_for_outstanding_calls() override |
308 | + { |
309 | + } |
310 | mir::Fd watch_fd() const |
311 | { |
312 | int fd[2]; |
FAILED: Continuous integration, rev:3904 /mir-jenkins. ubuntu. com/job/ mir-ci/ 2514/ /mir-jenkins. ubuntu. com/job/ build-mir/ 3274/console /mir-jenkins. ubuntu. com/job/ build-0- fetch/3341 /mir-jenkins. ubuntu. com/job/ build-1- sourcepkg/ release= vivid+overlay/ 3333 /mir-jenkins. ubuntu. com/job/ build-1- sourcepkg/ release= xenial+ overlay/ 3333 /mir-jenkins. ubuntu. com/job/ build-1- sourcepkg/ release= yakkety/ 3333 /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= amd64,compiler= clang,platform= mesa,release= yakkety/ 3303 /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= amd64,compiler= clang,platform= mesa,release= yakkety/ 3303/artifact/ output/ *zip*/output. zip /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= amd64,compiler= gcc,platform= mesa,release= xenial+ overlay/ 3303 /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= amd64,compiler= gcc,platform= mesa,release= xenial+ overlay/ 3303/artifact/ output/ *zip*/output. zip /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= amd64,compiler= gcc,platform= mesa,release= yakkety/ 3303 /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= amd64,compiler= gcc,platform= mesa,release= yakkety/ 3303/artifact/ output/ *zip*/output. zip /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= cross-armhf, compiler= gcc,platform= android, release= vivid+overlay/ 3303/console /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= cross-armhf, compiler= gcc,platform= android, release= vivid+overlay/ 3303/artifact/ output/ *zip*/output. zip /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= i386,compiler= gcc,platform= android, release= vivid+overlay/ 3303 /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= i386,compiler= gcc,platform= android, release= vivid+overlay/ 3303/artifact/ output/ *zip*/output. zip /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= i386,compiler= gcc,platform= mesa,release= xenial+ overlay/ 3303 /mir-jenkins. ubuntu. com/job/ build-2- binpkg- mir/arch= i386,compiler= gcc,platform= mesa,release= xenial+ overlay/ 3303/artifact/ output/ *zip*/output. zip
https:/
Executed test runs:
FAILURE: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
FAILURE: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
SUCCESS: https:/
deb: https:/
Click here to trigger a rebuild: /mir-jenkins. ubuntu. com/job/ mir-ci/ 2514/rebuild
https:/