Merge lp:~alan-griffiths/mir/remove-endpoint-first-when-shutting-down into lp:mir
- remove-endpoint-first-when-shutting-down
- Merge into development-branch
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Alexandros Frantzis | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | 1119 | ||||
Proposed branch: | lp:~alan-griffiths/mir/remove-endpoint-first-when-shutting-down | ||||
Merge into: | lp:mir | ||||
Diff against target: |
286 lines (+146/-3) 8 files modified
examples/render_surfaces.cpp (+1/-0) include/server/mir/frontend/connector.h (+1/-0) src/server/frontend/published_socket_connector.cpp (+10/-1) src/server/frontend/published_socket_connector.h (+4/-0) src/server/run_mir.cpp (+36/-0) tests/acceptance-tests/test_server_shutdown.cpp (+92/-2) tests/integration-tests/shell/test_session.cpp (+1/-0) tests/integration-tests/test_display_server_main_loop_events.cpp (+1/-0) |
||||
To merge this branch: | bzr merge lp:~alan-griffiths/mir/remove-endpoint-first-when-shutting-down | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Alexandros Frantzis (community) | Approve | ||
Daniel van Vugt | Approve | ||
PS Jenkins bot (community) | continuous-integration | Approve | |
Kevin DuBois (community) | Approve | ||
Eleni Maria Stea (community) | Approve | ||
Review via email: mp+189376@code.launchpad.net |
Commit message
frontend: Remove the endpoint first when shutting down
Description of the change
frontend: Remove the endpoint first when shutting down
Ensures that the endpoint is removed from the filesystem at the start of the shutdown process. (So that, if there are problems later we are not left with a stale endpoint.)
PS Jenkins bot (ps-jenkins) wrote : | # |
Daniel van Vugt (vanvugt) wrote : | # |
For a moment I thought this was a fix for bug 1235159. Now I realize it's not really. The main case (sadly) we need to deal with on that one is a stale socket file left by a crashing server.
Daniel van Vugt (vanvugt) wrote : | # |
Hmm, actually, is this a good idea at all?
The proposal here only solves a minority of cases (if any) leading to bug 1235159. Is there any real value in doing this or should we aim for better handling of crashes (which is particularly important for other open bugs too).
Alan Griffiths (alan-griffiths) wrote : | # |
> Hmm, actually, is this a good idea at all?
>
> The proposal here only solves a minority of cases (if any) leading to bug
> 1235159. Is there any real value in doing this or should we aim for better
> handling of crashes (which is particularly important for other open bugs too).
I agree it isn't a complete fix but I don't think Mir alone can handle the "kill -9" cases.
OTOH It does address the cases where Mir fails to complete shutdown.
Alan Griffiths (alan-griffiths) wrote : | # |
Added handling of remaining cases where we know we are exiting.
PS Jenkins bot (ps-jenkins) wrote : | # |
PASSED: Continuous integration, rev:1111
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
Click here to trigger a rebuild:
http://
Daniel van Vugt (vanvugt) wrote : | # |
"kill -9" is irrelevant. All Mir server "crashes" result in catchable signals, upon which we could still do a little recovery.
Daniel van Vugt (vanvugt) wrote : | # |
This is dangerous:
+ if (sig == SIGSEGV) exit(EXIT_FAILURE);
You shouldn't turn a crash into a clean exit. Or else your segfaults will never generate any crash reports and core files. You should instead re-raise as a fatal signal (like SIGSEGV again, or SIGABRT).
Also, other fatal signals we care about: SIGBUS, SIGPIPE.
Eleni Maria Stea (hikiko) wrote : | # |
It looks good. But yes, you have to avoid the clean exit (I don't know if raising another signal is the right approach though so: abstain :))
Daniel van Vugt (vanvugt) wrote : | # |
Sorry, I'm out of time today to give this an adequate review.
PS Jenkins bot (ps-jenkins) wrote : | # |
PASSED: Continuous integration, rev:1114
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
Click here to trigger a rebuild:
http://
Eleni Maria Stea (hikiko) wrote : | # |
I sent the signals one by one and it works fine, I see no /tmp/mir_socket anymore (only in case of kill -9 where we cant really do anything). I'll try to cause a segmentation fault now, it will be interesting to see what happens. Approved :)
Alexandros Frantzis (afrantzis) wrote : | # |
I don't think we shouldn't be handling non-fatal signals like SIGPIPE, SIGALRM. If other code in the same process as us depends on these (which is not unreasonable), we will remove the endpoint prematurely. I would also argue against including SIGILL, since I know it is used by libcrypto to check for available special instruction on ARM, and libcrypto is used by the unity8 process on the phone.
Alan Griffiths (alan-griffiths) wrote : | # |
After discussion with Alexandros on #ubuntu-mir we agreed on handling: SIGQUIT, SIGABRT, SIGFPE, SIGSEGV, SIGBUS
The mechanism introduced is easily extensible. If someone has an argument for handling additional signals they can propose that in a separate MP.
Alexandros Frantzis (afrantzis) wrote : | # |
Looks good.
PS Jenkins bot (ps-jenkins) wrote : | # |
PASSED: Continuous integration, rev:1116
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
Click here to trigger a rebuild:
http://
Daniel van Vugt (vanvugt) wrote : | # |
Turning signals without handlers into an exit is a serious mistake:
117 + if (!old_handler[sig])
118 + {
119 + _Exit(EXIT_
120 + }
Your debugger (or whatever) won't ever know a crash occurred. No crash report generated. The process just vanishes without a (stack)trace. I just tested this by inserting a SIGSEGV and no core was generated, just:
[1]+ Exit 1 sudo bin/mir_
Instead you should re-raise, and let the system default handler take over:
raise(sig);
So the system can actually detect the crash, get a core dump and send us crash reports.
Secondly, this "handler" needs a more descriptive name like "handle_
113 +extern "C" void handler(int sig)
Daniel van Vugt (vanvugt) wrote : | # |
P.S. SIGSPIPE is a fatal signal and should be handled (see "man 7 signal").
Also, when testing core dumping remember to "ulimit -c unlimited" first.
Alan Griffiths (alan-griffiths) wrote : | # |
> P.S. SIGSPIPE is a fatal signal and should be handled (see "man 7 signal").
SIGPIPE is "Term", not "Core" and, I think, correctly handled elsewhere.
PS Jenkins bot (ps-jenkins) wrote : | # |
PASSED: Continuous integration, rev:1117
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
Click here to trigger a rebuild:
http://
Daniel van Vugt (vanvugt) wrote : | # |
OK, manual testing by inserting real crashes into the server works. And I get usable core files with usable stack traces. Interestingly, the stack ends at the error location and fatal_signal_
Testing with fatal exceptions also works, but the resulting core dump has no usable stack trace. I suspect that's a pre-existing issue with C++ exceptions, or at least how we use them?...
Alan Griffiths (alan-griffiths) wrote : | # |
> Testing with fatal exceptions also works, but the resulting core dump has no
> usable stack trace. I suspect that's a pre-existing issue with C++ exceptions,
> or at least how we use them?...
The standard doesn't specify whether, with an unhandled exception, the stack is unwound before calling terminate(). g++ does the unwinding (which destroys the stack trace).
The real problem is folks starting threads without exception handlers where needed.
Which thread(s) are you creating these fatal exceptions on?
Daniel van Vugt (vanvugt) wrote : | # |
Nevermind. Confirmed the lack of stack trace on fatal exceptions is indeed a pre-existing problem.
Daniel van Vugt (vanvugt) wrote : | # |
Approved. Although a parting thought...
delete_endpoint should simply be called "cleanup" or something generic, so we can add more logic like DRM cleanup (to resolve an open bug or two). It should also have a comment mentioning it might be called as the result of a crash, and so _must_not_ use any heap operations.
Daniel van Vugt (vanvugt) wrote : | # |
My concerns about exceptions are logged here --> bug 1237332
Alexandros Frantzis (afrantzis) wrote : | # |
From POSIX.1-2008:
"The ISO C standard places a restriction on applications relating to the use of raise() from signal handlers. This restriction does not apply to POSIX applications, as POSIX.1-2008 requires raise() to be async-signal-safe"
(referring to nested raise() calls)
glibc follows POSIX.1-2008, so everything should be fine.
Preview Diff
1 | === modified file 'examples/render_surfaces.cpp' |
2 | --- examples/render_surfaces.cpp 2013-09-24 16:22:46 +0000 |
3 | +++ examples/render_surfaces.cpp 2013-10-09 08:48:59 +0000 |
4 | @@ -273,6 +273,7 @@ |
5 | void start() {} |
6 | void stop() {} |
7 | int client_socket_fd() const override { return 0; } |
8 | + void remove_endpoint() const override { } |
9 | }; |
10 | |
11 | return std::make_shared<NullConnector>(); |
12 | |
13 | === modified file 'include/server/mir/frontend/connector.h' |
14 | --- include/server/mir/frontend/connector.h 2013-09-24 14:02:58 +0000 |
15 | +++ include/server/mir/frontend/connector.h 2013-10-09 08:48:59 +0000 |
16 | @@ -31,6 +31,7 @@ |
17 | virtual void stop() = 0; |
18 | |
19 | virtual int client_socket_fd() const = 0; |
20 | + virtual void remove_endpoint() const = 0; |
21 | |
22 | protected: |
23 | Connector() = default; |
24 | |
25 | === modified file 'src/server/frontend/published_socket_connector.cpp' |
26 | --- src/server/frontend/published_socket_connector.cpp 2013-09-27 16:04:11 +0000 |
27 | +++ src/server/frontend/published_socket_connector.cpp 2013-10-09 08:48:59 +0000 |
28 | @@ -46,7 +46,7 @@ |
29 | |
30 | mf::PublishedSocketConnector::~PublishedSocketConnector() noexcept |
31 | { |
32 | - std::remove(socket_file.c_str()); |
33 | + remove_endpoint(); |
34 | } |
35 | |
36 | void mf::PublishedSocketConnector::start_accept() |
37 | @@ -64,6 +64,11 @@ |
38 | ba::placeholders::error)); |
39 | } |
40 | |
41 | +void mf::PublishedSocketConnector::remove_endpoint() const |
42 | +{ |
43 | + std::remove(socket_file.c_str()); |
44 | +} |
45 | + |
46 | void mf::PublishedSocketConnector::on_new_connection( |
47 | std::shared_ptr<boost::asio::local::stream_protocol::socket> const& socket, |
48 | boost::system::error_code const& ec) |
49 | @@ -166,6 +171,10 @@ |
50 | return socket_fd[client]; |
51 | } |
52 | |
53 | +void mf::BasicConnector::remove_endpoint() const |
54 | +{ |
55 | +} |
56 | + |
57 | mf::BasicConnector::~BasicConnector() noexcept |
58 | { |
59 | stop(); |
60 | |
61 | === modified file 'src/server/frontend/published_socket_connector.h' |
62 | --- src/server/frontend/published_socket_connector.h 2013-09-27 16:04:11 +0000 |
63 | +++ src/server/frontend/published_socket_connector.h 2013-10-09 08:48:59 +0000 |
64 | @@ -56,6 +56,8 @@ |
65 | void start() override; |
66 | void stop() override; |
67 | int client_socket_fd() const override; |
68 | + void remove_endpoint() const override; |
69 | + |
70 | |
71 | protected: |
72 | void create_session_for(std::shared_ptr<boost::asio::local::stream_protocol::socket> const& server_socket) const; |
73 | @@ -80,6 +82,8 @@ |
74 | std::shared_ptr<ConnectorReport> const& report); |
75 | ~PublishedSocketConnector() noexcept; |
76 | |
77 | + void remove_endpoint() const override; |
78 | + |
79 | private: |
80 | void start_accept(); |
81 | void on_new_connection(std::shared_ptr<boost::asio::local::stream_protocol::socket> const& socket, |
82 | |
83 | === modified file 'src/server/run_mir.cpp' |
84 | --- src/server/run_mir.cpp 2013-04-24 05:22:20 +0000 |
85 | +++ src/server/run_mir.cpp 2013-10-09 08:48:59 +0000 |
86 | @@ -20,10 +20,38 @@ |
87 | #include "mir/display_server.h" |
88 | #include "mir/main_loop.h" |
89 | #include "mir/server_configuration.h" |
90 | +#include "mir/frontend/connector.h" |
91 | |
92 | #include <csignal> |
93 | +#include <cstdlib> |
94 | #include <cassert> |
95 | |
96 | +namespace |
97 | +{ |
98 | +std::weak_ptr<mir::frontend::Connector> weak_connector; |
99 | + |
100 | +extern "C" void delete_endpoint() |
101 | +{ |
102 | + if (auto connector = weak_connector.lock()) |
103 | + { |
104 | + weak_connector.reset(); |
105 | + connector->remove_endpoint(); |
106 | + } |
107 | +} |
108 | + |
109 | +extern "C" { typedef void (*sig_handler)(int); } |
110 | + |
111 | +volatile sig_handler old_handler[SIGUNUSED] = { nullptr }; |
112 | + |
113 | +extern "C" void fatal_signal_cleanup(int sig) |
114 | +{ |
115 | + delete_endpoint(); |
116 | + |
117 | + signal(sig, old_handler[sig]); |
118 | + raise(sig); |
119 | +} |
120 | +} |
121 | + |
122 | void mir::run_mir(ServerConfiguration& config, std::function<void(DisplayServer&)> init) |
123 | { |
124 | DisplayServer* server_ptr{nullptr}; |
125 | @@ -33,6 +61,7 @@ |
126 | {SIGINT, SIGTERM}, |
127 | [&server_ptr](int) |
128 | { |
129 | + delete_endpoint(); |
130 | assert(server_ptr); |
131 | server_ptr->stop(); |
132 | }); |
133 | @@ -40,6 +69,13 @@ |
134 | DisplayServer server(config); |
135 | server_ptr = &server; |
136 | |
137 | + weak_connector = config.the_connector(); |
138 | + |
139 | + for (auto sig : { SIGQUIT, SIGABRT, SIGFPE, SIGSEGV, SIGBUS }) |
140 | + old_handler[sig] = signal(sig, fatal_signal_cleanup); |
141 | + |
142 | + std::atexit(&delete_endpoint); |
143 | + |
144 | init(server); |
145 | server.run(); |
146 | } |
147 | |
148 | === modified file 'tests/acceptance-tests/test_server_shutdown.cpp' |
149 | --- tests/acceptance-tests/test_server_shutdown.cpp 2013-09-27 16:59:05 +0000 |
150 | +++ tests/acceptance-tests/test_server_shutdown.cpp 2013-10-09 08:48:59 +0000 |
151 | @@ -112,7 +112,9 @@ |
152 | |
153 | } |
154 | |
155 | -TEST_F(BespokeDisplayServerTestFixture, server_can_shut_down_when_clients_are_blocked) |
156 | +using ServerShutdown = BespokeDisplayServerTestFixture; |
157 | + |
158 | +TEST_F(ServerShutdown, server_can_shut_down_when_clients_are_blocked) |
159 | { |
160 | Flag next_buffer_done1{"next_buffer_done1_c5d49978.tmp"}; |
161 | Flag next_buffer_done2{"next_buffer_done2_c5d49978.tmp"}; |
162 | @@ -199,7 +201,7 @@ |
163 | }); |
164 | } |
165 | |
166 | -TEST_F(BespokeDisplayServerTestFixture, server_releases_resources_on_shutdown_with_connected_clients) |
167 | +TEST_F(ServerShutdown, server_releases_resources_on_shutdown_with_connected_clients) |
168 | { |
169 | Flag surface_created1{"surface_created1_7e9c69fc.tmp"}; |
170 | Flag surface_created2{"surface_created2_7e9c69fc.tmp"}; |
171 | @@ -341,3 +343,91 @@ |
172 | resources_freed_success.set(); |
173 | } |
174 | } |
175 | + |
176 | +namespace |
177 | +{ |
178 | +bool file_exists(std::string const& filename) |
179 | +{ |
180 | + struct stat statbuf; |
181 | + return 0 == stat(filename.c_str(), &statbuf); |
182 | +} |
183 | +} |
184 | + |
185 | +TEST_F(ServerShutdown, server_removes_endpoint_on_normal_exit) |
186 | +{ |
187 | + using ServerConfig = TestingServerConfiguration; |
188 | + |
189 | + ServerConfig server_config; |
190 | + launch_server_process(server_config); |
191 | + |
192 | + run_in_test_process([&] |
193 | + { |
194 | + ASSERT_TRUE(file_exists(server_config.the_socket_file())); |
195 | + |
196 | + shutdown_server_process(); |
197 | + EXPECT_FALSE(file_exists(server_config.the_socket_file())); |
198 | + }); |
199 | +} |
200 | + |
201 | +TEST_F(ServerShutdown, server_removes_endpoint_on_abort) |
202 | +{ |
203 | + struct ServerConfig : TestingServerConfiguration |
204 | + { |
205 | + void exec() override |
206 | + { |
207 | + abort(); |
208 | + } |
209 | + }; |
210 | + |
211 | + ServerConfig server_config; |
212 | + launch_server_process(server_config); |
213 | + |
214 | + run_in_test_process([&] |
215 | + { |
216 | + using std::chrono::steady_clock; |
217 | + std::chrono::seconds const limit(2); |
218 | + |
219 | + ASSERT_TRUE(file_exists(server_config.the_socket_file())); |
220 | + |
221 | + // shutdown should fail as the server aborted |
222 | + ASSERT_FALSE(shutdown_server_process()); |
223 | + |
224 | + EXPECT_FALSE(file_exists(server_config.the_socket_file())); |
225 | + }); |
226 | +} |
227 | + |
228 | +struct OnSignal : ServerShutdown, ::testing::WithParamInterface<int> {}; |
229 | + |
230 | +TEST_P(OnSignal, removes_endpoint_on_signal) |
231 | +{ |
232 | + struct ServerConfig : TestingServerConfiguration |
233 | + { |
234 | + void exec() override |
235 | + { |
236 | + raise(sig); |
237 | + } |
238 | + |
239 | + ServerConfig(int sig) : sig(sig) {} |
240 | + int const sig; |
241 | + }; |
242 | + |
243 | + ServerConfig server_config(GetParam()); |
244 | + launch_server_process(server_config); |
245 | + |
246 | + run_in_test_process([&] |
247 | + { |
248 | + using std::chrono::steady_clock; |
249 | + std::chrono::seconds const limit(2); |
250 | + |
251 | + ASSERT_TRUE(file_exists(server_config.the_socket_file())); |
252 | + |
253 | + // shutdown should fail as the server faulted |
254 | + ASSERT_FALSE(shutdown_server_process()); |
255 | + |
256 | + EXPECT_FALSE(file_exists(server_config.the_socket_file())); |
257 | + }); |
258 | +} |
259 | + |
260 | +INSTANTIATE_TEST_CASE_P(ServerShutdown, |
261 | + OnSignal, |
262 | + ::testing::Values(SIGQUIT, SIGABRT, SIGFPE, SIGSEGV, SIGBUS)); |
263 | |
264 | === modified file 'tests/integration-tests/shell/test_session.cpp' |
265 | --- tests/integration-tests/shell/test_session.cpp 2013-09-24 16:22:46 +0000 |
266 | +++ tests/integration-tests/shell/test_session.cpp 2013-10-09 08:48:59 +0000 |
267 | @@ -68,6 +68,7 @@ |
268 | void start() {} |
269 | void stop() {} |
270 | int client_socket_fd() const override { return 0; } |
271 | + void remove_endpoint() const override { } |
272 | }; |
273 | |
274 | return std::make_shared<NullConnector>(); |
275 | |
276 | === modified file 'tests/integration-tests/test_display_server_main_loop_events.cpp' |
277 | --- tests/integration-tests/test_display_server_main_loop_events.cpp 2013-09-24 14:02:58 +0000 |
278 | +++ tests/integration-tests/test_display_server_main_loop_events.cpp 2013-10-09 08:48:59 +0000 |
279 | @@ -59,6 +59,7 @@ |
280 | MOCK_METHOD0(start, void()); |
281 | MOCK_METHOD0(stop, void()); |
282 | MOCK_CONST_METHOD0(client_socket_fd, int()); |
283 | + MOCK_CONST_METHOD0(remove_endpoint, void()); |
284 | }; |
285 | |
286 | class MockDisplayChanger : public mir::DisplayChanger |
PASSED: Continuous integration, rev:1109 jenkins. qa.ubuntu. com/job/ mir-team- mir-development -branch- ci/91/ jenkins. qa.ubuntu. com/job/ mir-android- saucy-i386- build/2237 jenkins. qa.ubuntu. com/job/ mir-clang- saucy-amd64- build/2122 jenkins. qa.ubuntu. com/job/ mir-team- mir-development -branch- saucy-amd64- ci/90 jenkins. qa.ubuntu. com/job/ mir-team- mir-development -branch- saucy-amd64- ci/90/artifact/ work/output/ *zip*/output. zip
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
Click here to trigger a rebuild: s-jenkins: 8080/job/ mir-team- mir-development -branch- ci/91/rebuild
http://