Mir

Merge lp:~alan-griffiths/mir/fix-client-hang-on-server-death into lp:mir

Proposed by Alan Griffiths
Status: Merged
Approved by: kevin gunn
Approved revision: no longer in the source branch.
Merged at revision: 1141
Proposed branch: lp:~alan-griffiths/mir/fix-client-hang-on-server-death
Merge into: lp:mir
Diff against target: 251 lines (+92/-66)
3 files modified
src/client/logging/rpc_report.cpp (+1/-1)
src/client/rpc/mir_socket_rpc_channel.cpp (+90/-65)
src/client/rpc/mir_socket_rpc_channel.h (+1/-0)
To merge this branch: bzr merge lp:~alan-griffiths/mir/fix-client-hang-on-server-death
Reviewer Review Type Date Requested Status
Alan Griffiths Abstain
PS Jenkins bot (community) continuous-integration Approve
Daniel van Vugt Needs Information
Kevin DuBois (community) Approve
Review via email: mp+190935@code.launchpad.net

Commit message

client: fix hang(s) in client API when server dies

Description of the change

client: fix hang(s) in client API when server dies

Depending upon what calls are in progress when the server dies the client API could fail to return/callback

To post a comment you must log in.
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

There's a strange interaction between TestClientInput.* and the tests that follow (ServerShutdown/OnSignal.*).

With the original timeout and running under valgrind I sometimes see the latter fail after valgrind reports errors in the former.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Kevin DuBois (kdub) wrote :

lgtm, or at least can't see anything wrong. :)

review: Approve
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

1. Slightly confused but it's probably nothing:
Bug 1227743 says it's the server that hangs, while this MP seems to say it fixes client hangs. Is it both?

2. Your description seems to suggest this might fix bug 1237710 too. Will have to test...

Revision history for this message
Daniel van Vugt (vanvugt) :
review: Needs Information
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Correction:

2. Your comment seems to suggest bug 1227683 and/or bug 1237710 might benefit from this. Will have to test...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Sorry, I got excited. Neither bug 1237710 or 1227683 are solved here.

Still needs information;
Bug 1227743 says it's the server that hangs, while this MP seems to say it fixes client hangs. Is it both?

review: Needs Information
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I didn't find time to get to the bottom of the test failures

review: Needs Fixing
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Alan,

No need to block on that test failure. You didn't cause it. See bug 1227683.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> Alan,
>
> No need to block on that test failure. You didn't cause it. See bug 1227683.

I think it is bug 1237710

review: Abstain
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> Still needs information;
> Bug 1227743 says it's the server that hangs, while this MP seems to say it
> fixes client hangs. Is it both?

The server hangs because the clients are holding some (unidentified) resource. The clients are stalled by the Mir API - this fix allows the clients to release resources (and optionally exit).

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/client/logging/rpc_report.cpp'
2--- src/client/logging/rpc_report.cpp 2013-09-26 13:50:11 +0000
3+++ src/client/logging/rpc_report.cpp 2013-10-15 10:44:56 +0000
4@@ -104,7 +104,7 @@
5 /* TODO: Log more information about event */
6 ss << "Event parsed";
7
8- logger->log<ml::Logger::error>(ss.str(), component);
9+ logger->log<ml::Logger::debug>(ss.str(), component);
10 }
11
12 void mcll::RpcReport::event_parsing_failed(
13
14=== modified file 'src/client/rpc/mir_socket_rpc_channel.cpp'
15--- src/client/rpc/mir_socket_rpc_channel.cpp 2013-10-14 06:33:42 +0000
16+++ src/client/rpc/mir_socket_rpc_channel.cpp 2013-10-15 10:44:56 +0000
17@@ -39,6 +39,12 @@
18 namespace mcl = mir::client;
19 namespace mclr = mir::client::rpc;
20
21+namespace
22+{
23+std::chrono::milliseconds const timeout(200);
24+boost::posix_time::milliseconds const boost_timeout(2000);
25+}
26+
27 mclr::MirSocketRpcChannel::MirSocketRpcChannel(
28 std::string const& endpoint,
29 std::shared_ptr<mcl::SurfaceMap> const& surface_map,
30@@ -49,6 +55,7 @@
31 pending_calls(rpc_report),
32 work(io_service),
33 socket(io_service),
34+ timer(io_service),
35 surface_map(surface_map),
36 display_configuration(disp_config),
37 lifecycle_control(lifecycle_control),
38@@ -68,6 +75,7 @@
39 pending_calls(rpc_report),
40 work(io_service),
41 socket(io_service),
42+ timer(io_service),
43 surface_map(surface_map),
44 display_configuration(disp_config),
45 lifecycle_control(lifecycle_control),
46@@ -82,6 +90,7 @@
47 if (!disconnected.exchange(true))
48 {
49 io_service.stop();
50+ socket.close();
51 lifecycle_control->call_lifecycle_event_handler(mir_lifecycle_connection_lost);
52 pending_calls.force_completion();
53 }
54@@ -89,7 +98,7 @@
55
56 void mclr::MirSocketRpcChannel::init()
57 {
58- io_service_thread = std::thread([&]
59+ io_service_thread = std::thread([this]
60 {
61 // Our IO threads must not receive any signals
62 sigset_t all_signals;
63@@ -122,6 +131,7 @@
64
65 mclr::MirSocketRpcChannel::~MirSocketRpcChannel()
66 {
67+ timer.cancel();
68 io_service.stop();
69
70 if (io_service_thread.joinable())
71@@ -134,67 +144,69 @@
72 void mclr::MirSocketRpcChannel::receive_file_descriptors(google::protobuf::Message* response,
73 google::protobuf::Closure* complete)
74 {
75- auto surface = dynamic_cast<mir::protobuf::Surface*>(response);
76- if (surface)
77- {
78- surface->clear_fd();
79-
80- if (surface->fds_on_side_channel() > 0)
81- {
82- std::vector<int32_t> fds(surface->fds_on_side_channel());
83- receive_file_descriptors(fds);
84- for (auto &fd: fds)
85- surface->add_fd(fd);
86-
87- rpc_report->file_descriptors_received(*response, fds);
88- }
89- }
90-
91- auto buffer = dynamic_cast<mir::protobuf::Buffer*>(response);
92- if (!buffer)
93- {
94- if (surface && surface->has_buffer())
95- buffer = surface->mutable_buffer();
96- }
97-
98- if (buffer)
99- {
100- buffer->clear_fd();
101-
102- if (buffer->fds_on_side_channel() > 0)
103- {
104- std::vector<int32_t> fds(buffer->fds_on_side_channel());
105- receive_file_descriptors(fds);
106- for (auto &fd: fds)
107- buffer->add_fd(fd);
108-
109- rpc_report->file_descriptors_received(*response, fds);
110- }
111- }
112-
113- auto platform = dynamic_cast<mir::protobuf::Platform*>(response);
114- if (!platform)
115- {
116- auto connection = dynamic_cast<mir::protobuf::Connection*>(response);
117- if (connection && connection->has_platform())
118- platform = connection->mutable_platform();
119- }
120-
121- if (platform)
122- {
123- platform->clear_fd();
124-
125- if (platform->fds_on_side_channel() > 0)
126- {
127- std::vector<int32_t> fds(platform->fds_on_side_channel());
128- receive_file_descriptors(fds);
129- for (auto &fd: fds)
130- platform->add_fd(fd);
131-
132- rpc_report->file_descriptors_received(*response, fds);
133- }
134- }
135-
136+ if (!disconnected.load())
137+ {
138+ auto surface = dynamic_cast<mir::protobuf::Surface*>(response);
139+ if (surface)
140+ {
141+ surface->clear_fd();
142+
143+ if (surface->fds_on_side_channel() > 0)
144+ {
145+ std::vector<int32_t> fds(surface->fds_on_side_channel());
146+ receive_file_descriptors(fds);
147+ for (auto &fd: fds)
148+ surface->add_fd(fd);
149+
150+ rpc_report->file_descriptors_received(*response, fds);
151+ }
152+ }
153+
154+ auto buffer = dynamic_cast<mir::protobuf::Buffer*>(response);
155+ if (!buffer)
156+ {
157+ if (surface && surface->has_buffer())
158+ buffer = surface->mutable_buffer();
159+ }
160+
161+ if (buffer)
162+ {
163+ buffer->clear_fd();
164+
165+ if (buffer->fds_on_side_channel() > 0)
166+ {
167+ std::vector<int32_t> fds(buffer->fds_on_side_channel());
168+ receive_file_descriptors(fds);
169+ for (auto &fd: fds)
170+ buffer->add_fd(fd);
171+
172+ rpc_report->file_descriptors_received(*response, fds);
173+ }
174+ }
175+
176+ auto platform = dynamic_cast<mir::protobuf::Platform*>(response);
177+ if (!platform)
178+ {
179+ auto connection = dynamic_cast<mir::protobuf::Connection*>(response);
180+ if (connection && connection->has_platform())
181+ platform = connection->mutable_platform();
182+ }
183+
184+ if (platform)
185+ {
186+ platform->clear_fd();
187+
188+ if (platform->fds_on_side_channel() > 0)
189+ {
190+ std::vector<int32_t> fds(platform->fds_on_side_channel());
191+ receive_file_descriptors(fds);
192+ for (auto &fd: fds)
193+ platform->add_fd(fd);
194+
195+ rpc_report->file_descriptors_received(*response, fds);
196+ }
197+ }
198+ }
199
200 complete->Run();
201 }
202@@ -227,8 +239,14 @@
203 message->cmsg_level = SOL_SOCKET;
204 message->cmsg_type = SCM_RIGHTS;
205
206+ using std::chrono::steady_clock;
207+ auto time_limit = steady_clock::now() + timeout;
208+
209 while (recvmsg(socket.native_handle(), &header, 0) < 0)
210- ; // FIXME: Avoid spinning forever
211+ {
212+ if (steady_clock::now() > time_limit)
213+ return;
214+ }
215
216 // Copy file descriptors back to caller
217 n_fds = (message->cmsg_len - sizeof(struct cmsghdr)) / sizeof(int);
218@@ -288,12 +306,19 @@
219 notify_disconnected();
220 BOOST_THROW_EXCEPTION(std::runtime_error("Failed to send message to server: " + error.message()));
221 }
222- else
223- rpc_report->invocation_succeeded(invocation);
224+
225+ rpc_report->invocation_succeeded(invocation);
226+ timer.expires_from_now(boost_timeout);
227+ timer.async_wait([this](const boost::system::error_code& error)
228+ {
229+ // Timed out waiting for server response: kill the connection
230+ if (!error) socket.cancel();
231+ });
232 }
233
234 void mclr::MirSocketRpcChannel::on_header_read(const boost::system::error_code& error)
235 {
236+ timer.cancel();
237 if (error)
238 {
239 // If we've not got a response to a call pending
240
241=== modified file 'src/client/rpc/mir_socket_rpc_channel.h'
242--- src/client/rpc/mir_socket_rpc_channel.h 2013-10-07 09:27:32 +0000
243+++ src/client/rpc/mir_socket_rpc_channel.h 2013-10-15 10:44:56 +0000
244@@ -77,6 +77,7 @@
245 boost::asio::io_service io_service;
246 boost::asio::io_service::work work;
247 boost::asio::local::stream_protocol::socket socket;
248+ boost::asio::deadline_timer timer;
249
250 static size_t const size_of_header = 2;
251 unsigned char header_bytes[size_of_header];

Subscribers

People subscribed via source and target branches