Mir

Merge lp:~alan-griffiths/mir/remove-endpoint-first-when-shutting-down into lp:mir

Proposed by Alan Griffiths
Status: Merged
Approved by: Alexandros Frantzis
Approved revision: no longer in the source branch.
Merged at revision: 1119
Proposed branch: lp:~alan-griffiths/mir/remove-endpoint-first-when-shutting-down
Merge into: lp:mir
Diff against target: 286 lines (+146/-3)
8 files modified
examples/render_surfaces.cpp (+1/-0)
include/server/mir/frontend/connector.h (+1/-0)
src/server/frontend/published_socket_connector.cpp (+10/-1)
src/server/frontend/published_socket_connector.h (+4/-0)
src/server/run_mir.cpp (+36/-0)
tests/acceptance-tests/test_server_shutdown.cpp (+92/-2)
tests/integration-tests/shell/test_session.cpp (+1/-0)
tests/integration-tests/test_display_server_main_loop_events.cpp (+1/-0)
To merge this branch: bzr merge lp:~alan-griffiths/mir/remove-endpoint-first-when-shutting-down
Reviewer Review Type Date Requested Status
Alexandros Frantzis (community) Approve
Daniel van Vugt Approve
PS Jenkins bot (community) continuous-integration Approve
Kevin DuBois (community) Approve
Eleni Maria Stea (community) Approve
Review via email: mp+189376@code.launchpad.net

Commit message

frontend: Remove the endpoint first when shutting down

Description of the change

frontend: Remove the endpoint first when shutting down

Ensures that the endpoint is removed from the filesystem at the start of the shutdown process. (So that, if there are problems later we are not left with a stale endpoint.)

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

For a moment I thought this was a fix for bug 1235159. Now I realize it's not really. The main case (sadly) we need to deal with on that one is a stale socket file left by a crashing server.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Hmm, actually, is this a good idea at all?

The proposal here only solves a minority of cases (if any) leading to bug 1235159. Is there any real value in doing this or should we aim for better handling of crashes (which is particularly important for other open bugs too).

review: Needs Information
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> Hmm, actually, is this a good idea at all?
>
> The proposal here only solves a minority of cases (if any) leading to bug
> 1235159. Is there any real value in doing this or should we aim for better
> handling of crashes (which is particularly important for other open bugs too).

I agree it isn't a complete fix but I don't think Mir alone can handle the "kill -9" cases.

OTOH It does address the cases where Mir fails to complete shutdown.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Added handling of remaining cases where we know we are exiting.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

"kill -9" is irrelevant. All Mir server "crashes" result in catchable signals, upon which we could still do a little recovery.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This is dangerous:
+ if (sig == SIGSEGV) exit(EXIT_FAILURE);

You shouldn't turn a crash into a clean exit. Or else your segfaults will never generate any crash reports and core files. You should instead re-raise as a fatal signal (like SIGSEGV again, or SIGABRT).

Also, other fatal signals we care about: SIGBUS, SIGPIPE.

review: Needs Fixing
Revision history for this message
Eleni Maria Stea (hikiko) wrote :

It looks good. But yes, you have to avoid the clean exit (I don't know if raising another signal is the right approach though so: abstain :))

review: Abstain
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Sorry, I'm out of time today to give this an adequate review.

review: Abstain
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Eleni Maria Stea (hikiko) wrote :

I sent the signals one by one and it works fine, I see no /tmp/mir_socket anymore (only in case of kill -9 where we cant really do anything). I'll try to cause a segmentation fault now, it will be interesting to see what happens. Approved :)

review: Approve
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

I don't think we shouldn't be handling non-fatal signals like SIGPIPE, SIGALRM. If other code in the same process as us depends on these (which is not unreasonable), we will remove the endpoint prematurely. I would also argue against including SIGILL, since I know it is used by libcrypto to check for available special instruction on ARM, and libcrypto is used by the unity8 process on the phone.

review: Needs Fixing
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

After discussion with Alexandros on #ubuntu-mir we agreed on handling: SIGQUIT, SIGABRT, SIGFPE, SIGSEGV, SIGBUS

The mechanism introduced is easily extensible. If someone has an argument for handling additional signals they can propose that in a separate MP.

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

Looks good.

review: Approve
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Kevin DuBois (kdub) wrote :

good

review: Approve
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Turning signals without handlers into an exit is a serious mistake:

117 + if (!old_handler[sig])
118 + {
119 + _Exit(EXIT_FAILURE);
120 + }

Your debugger (or whatever) won't ever know a crash occurred. No crash report generated. The process just vanishes without a (stack)trace. I just tested this by inserting a SIGSEGV and no core was generated, just:
[1]+ Exit 1 sudo bin/mir_demo_server_shell

Instead you should re-raise, and let the system default handler take over:
    raise(sig);
So the system can actually detect the crash, get a core dump and send us crash reports.

Secondly, this "handler" needs a more descriptive name like "handle_fatal_signal". Because developers will see it in stack traces and need to better recognise and understand that's happening:

113 +extern "C" void handler(int sig)

review: Needs Fixing
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

P.S. SIGSPIPE is a fatal signal and should be handled (see "man 7 signal").

Also, when testing core dumping remember to "ulimit -c unlimited" first.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> P.S. SIGSPIPE is a fatal signal and should be handled (see "man 7 signal").

SIGPIPE is "Term", not "Core" and, I think, correctly handled elsewhere.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK, manual testing by inserting real crashes into the server works. And I get usable core files with usable stack traces. Interestingly, the stack ends at the error location and fatal_signal_cleanup() is never mentioned, despite definitely executing before dumping core. That's strange, but OK.

Testing with fatal exceptions also works, but the resulting core dump has no usable stack trace. I suspect that's a pre-existing issue with C++ exceptions, or at least how we use them?...

review: Needs Information
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> Testing with fatal exceptions also works, but the resulting core dump has no
> usable stack trace. I suspect that's a pre-existing issue with C++ exceptions,
> or at least how we use them?...

The standard doesn't specify whether, with an unhandled exception, the stack is unwound before calling terminate(). g++ does the unwinding (which destroys the stack trace).

The real problem is folks starting threads without exception handlers where needed.

Which thread(s) are you creating these fatal exceptions on?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Nevermind. Confirmed the lack of stack trace on fatal exceptions is indeed a pre-existing problem.

review: Approve
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Approved. Although a parting thought...

delete_endpoint should simply be called "cleanup" or something generic, so we can add more logic like DRM cleanup (to resolve an open bug or two). It should also have a comment mentioning it might be called as the result of a crash, and so _must_not_ use any heap operations.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

My concerns about exceptions are logged here --> bug 1237332

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

From POSIX.1-2008:

"The ISO C standard places a restriction on applications relating to the use of raise() from signal handlers. This restriction does not apply to POSIX applications, as POSIX.1-2008 requires raise() to be async-signal-safe"

(referring to nested raise() calls)

glibc follows POSIX.1-2008, so everything should be fine.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'examples/render_surfaces.cpp'
2--- examples/render_surfaces.cpp 2013-09-24 16:22:46 +0000
3+++ examples/render_surfaces.cpp 2013-10-09 08:48:59 +0000
4@@ -273,6 +273,7 @@
5 void start() {}
6 void stop() {}
7 int client_socket_fd() const override { return 0; }
8+ void remove_endpoint() const override { }
9 };
10
11 return std::make_shared<NullConnector>();
12
13=== modified file 'include/server/mir/frontend/connector.h'
14--- include/server/mir/frontend/connector.h 2013-09-24 14:02:58 +0000
15+++ include/server/mir/frontend/connector.h 2013-10-09 08:48:59 +0000
16@@ -31,6 +31,7 @@
17 virtual void stop() = 0;
18
19 virtual int client_socket_fd() const = 0;
20+ virtual void remove_endpoint() const = 0;
21
22 protected:
23 Connector() = default;
24
25=== modified file 'src/server/frontend/published_socket_connector.cpp'
26--- src/server/frontend/published_socket_connector.cpp 2013-09-27 16:04:11 +0000
27+++ src/server/frontend/published_socket_connector.cpp 2013-10-09 08:48:59 +0000
28@@ -46,7 +46,7 @@
29
30 mf::PublishedSocketConnector::~PublishedSocketConnector() noexcept
31 {
32- std::remove(socket_file.c_str());
33+ remove_endpoint();
34 }
35
36 void mf::PublishedSocketConnector::start_accept()
37@@ -64,6 +64,11 @@
38 ba::placeholders::error));
39 }
40
41+void mf::PublishedSocketConnector::remove_endpoint() const
42+{
43+ std::remove(socket_file.c_str());
44+}
45+
46 void mf::PublishedSocketConnector::on_new_connection(
47 std::shared_ptr<boost::asio::local::stream_protocol::socket> const& socket,
48 boost::system::error_code const& ec)
49@@ -166,6 +171,10 @@
50 return socket_fd[client];
51 }
52
53+void mf::BasicConnector::remove_endpoint() const
54+{
55+}
56+
57 mf::BasicConnector::~BasicConnector() noexcept
58 {
59 stop();
60
61=== modified file 'src/server/frontend/published_socket_connector.h'
62--- src/server/frontend/published_socket_connector.h 2013-09-27 16:04:11 +0000
63+++ src/server/frontend/published_socket_connector.h 2013-10-09 08:48:59 +0000
64@@ -56,6 +56,8 @@
65 void start() override;
66 void stop() override;
67 int client_socket_fd() const override;
68+ void remove_endpoint() const override;
69+
70
71 protected:
72 void create_session_for(std::shared_ptr<boost::asio::local::stream_protocol::socket> const& server_socket) const;
73@@ -80,6 +82,8 @@
74 std::shared_ptr<ConnectorReport> const& report);
75 ~PublishedSocketConnector() noexcept;
76
77+ void remove_endpoint() const override;
78+
79 private:
80 void start_accept();
81 void on_new_connection(std::shared_ptr<boost::asio::local::stream_protocol::socket> const& socket,
82
83=== modified file 'src/server/run_mir.cpp'
84--- src/server/run_mir.cpp 2013-04-24 05:22:20 +0000
85+++ src/server/run_mir.cpp 2013-10-09 08:48:59 +0000
86@@ -20,10 +20,38 @@
87 #include "mir/display_server.h"
88 #include "mir/main_loop.h"
89 #include "mir/server_configuration.h"
90+#include "mir/frontend/connector.h"
91
92 #include <csignal>
93+#include <cstdlib>
94 #include <cassert>
95
96+namespace
97+{
98+std::weak_ptr<mir::frontend::Connector> weak_connector;
99+
100+extern "C" void delete_endpoint()
101+{
102+ if (auto connector = weak_connector.lock())
103+ {
104+ weak_connector.reset();
105+ connector->remove_endpoint();
106+ }
107+}
108+
109+extern "C" { typedef void (*sig_handler)(int); }
110+
111+volatile sig_handler old_handler[SIGUNUSED] = { nullptr };
112+
113+extern "C" void fatal_signal_cleanup(int sig)
114+{
115+ delete_endpoint();
116+
117+ signal(sig, old_handler[sig]);
118+ raise(sig);
119+}
120+}
121+
122 void mir::run_mir(ServerConfiguration& config, std::function<void(DisplayServer&)> init)
123 {
124 DisplayServer* server_ptr{nullptr};
125@@ -33,6 +61,7 @@
126 {SIGINT, SIGTERM},
127 [&server_ptr](int)
128 {
129+ delete_endpoint();
130 assert(server_ptr);
131 server_ptr->stop();
132 });
133@@ -40,6 +69,13 @@
134 DisplayServer server(config);
135 server_ptr = &server;
136
137+ weak_connector = config.the_connector();
138+
139+ for (auto sig : { SIGQUIT, SIGABRT, SIGFPE, SIGSEGV, SIGBUS })
140+ old_handler[sig] = signal(sig, fatal_signal_cleanup);
141+
142+ std::atexit(&delete_endpoint);
143+
144 init(server);
145 server.run();
146 }
147
148=== modified file 'tests/acceptance-tests/test_server_shutdown.cpp'
149--- tests/acceptance-tests/test_server_shutdown.cpp 2013-09-27 16:59:05 +0000
150+++ tests/acceptance-tests/test_server_shutdown.cpp 2013-10-09 08:48:59 +0000
151@@ -112,7 +112,9 @@
152
153 }
154
155-TEST_F(BespokeDisplayServerTestFixture, server_can_shut_down_when_clients_are_blocked)
156+using ServerShutdown = BespokeDisplayServerTestFixture;
157+
158+TEST_F(ServerShutdown, server_can_shut_down_when_clients_are_blocked)
159 {
160 Flag next_buffer_done1{"next_buffer_done1_c5d49978.tmp"};
161 Flag next_buffer_done2{"next_buffer_done2_c5d49978.tmp"};
162@@ -199,7 +201,7 @@
163 });
164 }
165
166-TEST_F(BespokeDisplayServerTestFixture, server_releases_resources_on_shutdown_with_connected_clients)
167+TEST_F(ServerShutdown, server_releases_resources_on_shutdown_with_connected_clients)
168 {
169 Flag surface_created1{"surface_created1_7e9c69fc.tmp"};
170 Flag surface_created2{"surface_created2_7e9c69fc.tmp"};
171@@ -341,3 +343,91 @@
172 resources_freed_success.set();
173 }
174 }
175+
176+namespace
177+{
178+bool file_exists(std::string const& filename)
179+{
180+ struct stat statbuf;
181+ return 0 == stat(filename.c_str(), &statbuf);
182+}
183+}
184+
185+TEST_F(ServerShutdown, server_removes_endpoint_on_normal_exit)
186+{
187+ using ServerConfig = TestingServerConfiguration;
188+
189+ ServerConfig server_config;
190+ launch_server_process(server_config);
191+
192+ run_in_test_process([&]
193+ {
194+ ASSERT_TRUE(file_exists(server_config.the_socket_file()));
195+
196+ shutdown_server_process();
197+ EXPECT_FALSE(file_exists(server_config.the_socket_file()));
198+ });
199+}
200+
201+TEST_F(ServerShutdown, server_removes_endpoint_on_abort)
202+{
203+ struct ServerConfig : TestingServerConfiguration
204+ {
205+ void exec() override
206+ {
207+ abort();
208+ }
209+ };
210+
211+ ServerConfig server_config;
212+ launch_server_process(server_config);
213+
214+ run_in_test_process([&]
215+ {
216+ using std::chrono::steady_clock;
217+ std::chrono::seconds const limit(2);
218+
219+ ASSERT_TRUE(file_exists(server_config.the_socket_file()));
220+
221+ // shutdown should fail as the server aborted
222+ ASSERT_FALSE(shutdown_server_process());
223+
224+ EXPECT_FALSE(file_exists(server_config.the_socket_file()));
225+ });
226+}
227+
228+struct OnSignal : ServerShutdown, ::testing::WithParamInterface<int> {};
229+
230+TEST_P(OnSignal, removes_endpoint_on_signal)
231+{
232+ struct ServerConfig : TestingServerConfiguration
233+ {
234+ void exec() override
235+ {
236+ raise(sig);
237+ }
238+
239+ ServerConfig(int sig) : sig(sig) {}
240+ int const sig;
241+ };
242+
243+ ServerConfig server_config(GetParam());
244+ launch_server_process(server_config);
245+
246+ run_in_test_process([&]
247+ {
248+ using std::chrono::steady_clock;
249+ std::chrono::seconds const limit(2);
250+
251+ ASSERT_TRUE(file_exists(server_config.the_socket_file()));
252+
253+ // shutdown should fail as the server faulted
254+ ASSERT_FALSE(shutdown_server_process());
255+
256+ EXPECT_FALSE(file_exists(server_config.the_socket_file()));
257+ });
258+}
259+
260+INSTANTIATE_TEST_CASE_P(ServerShutdown,
261+ OnSignal,
262+ ::testing::Values(SIGQUIT, SIGABRT, SIGFPE, SIGSEGV, SIGBUS));
263
264=== modified file 'tests/integration-tests/shell/test_session.cpp'
265--- tests/integration-tests/shell/test_session.cpp 2013-09-24 16:22:46 +0000
266+++ tests/integration-tests/shell/test_session.cpp 2013-10-09 08:48:59 +0000
267@@ -68,6 +68,7 @@
268 void start() {}
269 void stop() {}
270 int client_socket_fd() const override { return 0; }
271+ void remove_endpoint() const override { }
272 };
273
274 return std::make_shared<NullConnector>();
275
276=== modified file 'tests/integration-tests/test_display_server_main_loop_events.cpp'
277--- tests/integration-tests/test_display_server_main_loop_events.cpp 2013-09-24 14:02:58 +0000
278+++ tests/integration-tests/test_display_server_main_loop_events.cpp 2013-10-09 08:48:59 +0000
279@@ -59,6 +59,7 @@
280 MOCK_METHOD0(start, void());
281 MOCK_METHOD0(stop, void());
282 MOCK_CONST_METHOD0(client_socket_fd, int());
283+ MOCK_CONST_METHOD0(remove_endpoint, void());
284 };
285
286 class MockDisplayChanger : public mir::DisplayChanger

Subscribers

People subscribed via source and target branches