Mir

Merge lp:~raof/mir/optimize-server-anr-wakeups into lp:mir

Proposed by Chris Halse Rogers
Status: Merged
Approved by: Daniel van Vugt
Approved revision: no longer in the source branch.
Merged at revision: 2912
Proposed branch: lp:~raof/mir/optimize-server-anr-wakeups
Merge into: lp:mir
Prerequisite: lp:~raof/mir/fix-deadlock-in-glib-alarm
Diff against target: 468 lines (+294/-34)
6 files modified
src/server/glib_main_loop.cpp (+2/-1)
src/server/scene/timeout_application_not_responding_detector.cpp (+60/-32)
src/server/scene/timeout_application_not_responding_detector.h (+3/-0)
tests/include/mir/test/doubles/fake_alarm_factory.h (+4/-0)
tests/mir_test_doubles/fake_alarm_factory.cpp (+33/-1)
tests/unit-tests/scene/test_timeout_application_not_responding_detector.cpp (+192/-0)
To merge this branch: bzr merge lp:~raof/mir/optimize-server-anr-wakeups
Reviewer Review Type Date Requested Status
PS Jenkins bot (community) continuous-integration Approve
Alexandros Frantzis (community) Approve
Kevin DuBois (community) Approve
Review via email: mp+266820@code.launchpad.net

This proposal supersedes a proposal from 2015-07-28.

Commit message

Remove unnecessary server wakeups from TimeoutApplicationNotRespondingDetector.

We don't need to wake up when either
a) There are no active sessions, or
b) All sessions we're aware of are unresponsive

Description of the change

Remove unnecessary server wakeups from TimeoutApplicationNotRespondingDetector.

We don't need to wake up when either
a) There are no active sessions (obviously!), or
b) All sessions we're aware of are unresponsive

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote : Posted in a previous version of this proposal
review: Needs Fixing (continuous-integration)
Revision history for this message
Kevin DuBois (kdub) wrote : Posted in a previous version of this proposal

lgtm

review: Approve
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote : Posted in a previous version of this proposal
review: Needs Fixing (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Kevin DuBois (kdub) wrote :

still lgtm

review: Approve
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

Looks good.

review: Approve
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Chris Halse Rogers (raof) wrote :

So, it looks like CI is passable now. Let's give it a re-run...

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If you apply the namespace locally each time, it actually requires more characters than the old syntax, which defeats the purpose of the literals enhancing readability...

228 + using namespace std::literals::chrono_literals;
229 + auto const step_by = 1ms;

vs
             auto const step_by = std::chrono::milliseconds(1);

The latter is also backward compatible :)

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Prize for the oldest Mir proposal not yet landed. Although 3 weeks is not really old by our standards...

This is not a review but I'll top approve.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Archiving artifacts
ERROR: Failed to archive artifacts: mir-medium-test-runner/**/*
hudson.util.IOException2: java.io.IOException: request to write '8192' bytes exceeds size in header of '393216' bytes for entry 'mir-medium-test-runner/results/crash/_usr_bin_unity8-dash.32011.crash'

Looks unrelated.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also another failure; this one might be real...

7: [ RUN ] ServerShutdown.server_can_shut_down_when_clients_are_blocked
7: [1440610613.673790] mirplatform: Found graphics driver: mesa-x11
7: [1440610613.673844] mirplatform: Found graphics driver: android
7: [1440610613.673893] mirplatform: Found graphics driver: mesa-kms
7: [1440610613.673899] mirplatform: Found graphics driver: dummy
7: [1440610613.674656] mirserver: Mir version 0.16.0
7: [1440610613.694973] <ERROR> MirSurfaceAPI: Caught exception at client library boundary (in mir_surface_release): /mir/src/client/rpc/stream_socket_transport.cpp(168): Throw in function virtual void mir::client::rpc::StreamSocketTransport::send_message(const std::vector<uint8_t> &, const std::vector<mir::Fd> &)
7: Dynamic exception type: N5boost16exception_detail10clone_implINS0_19error_info_injectorIN3mir25socket_disconnected_errorEEEEE
7: std::exception::what: Failed to send message to server: Broken pipe
7: 32, "Broken pipe"
7: [1440610613.695132] <ERROR> MirConnectionAPI: Caught exception at client library boundary (in release): /mir/src/client/rpc/stream_socket_transport.cpp(168): Throw in function virtual void mir::client::rpc::StreamSocketTransport::send_message(const std::vector<uint8_t> &, const std::vector<mir::Fd> &)
7: Dynamic exception type: N5boost16exception_detail10clone_implINS0_19error_info_injectorIN3mir25socket_disconnected_errorEEEEE
7: std::exception::what: Failed to send message to server: Broken pipe
7: 32, "Broken pipe"
7: /mir/tools/detect_fd_leaks.bash: line 84: 1501 Segmentation fault (core dumped) $@ 2>&1

Revision history for this message
Chris Halse Rogers (raof) wrote :

Hm. Of course this isn't reproducibly locally. That would make things too easy...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Jenkins is back! Care to give it another try?

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) :
review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/server/glib_main_loop.cpp'
2--- src/server/glib_main_loop.cpp 2015-08-20 20:11:21 +0000
3+++ src/server/glib_main_loop.cpp 2015-09-08 07:23:10 +0000
4@@ -38,7 +38,7 @@
5 std::shared_ptr<mir::time::Clock> const& clock,
6 std::shared_ptr<mir::LockableCallback> const& callback,
7 std::function<void()> const& exception_handler)
8- : main_context{main_context},
9+ : main_context{g_main_context_ref(main_context)},
10 clock{clock},
11 state_{State::cancelled},
12 exception_handler{exception_handler},
13@@ -50,6 +50,7 @@
14 ~AlarmImpl() override
15 {
16 gsource.ensure_no_further_dispatch();
17+ g_main_context_unref(main_context);
18 }
19
20 bool cancel() override
21
22=== modified file 'src/server/scene/timeout_application_not_responding_detector.cpp'
23--- src/server/scene/timeout_application_not_responding_detector.cpp 2015-07-06 01:14:14 +0000
24+++ src/server/scene/timeout_application_not_responding_detector.cpp 2015-09-08 07:23:10 +0000
25@@ -54,37 +54,9 @@
26 ms::TimeoutApplicationNotRespondingDetector::TimeoutApplicationNotRespondingDetector(
27 mt::AlarmFactory& alarms,
28 std::chrono::milliseconds period)
29- : alarm{alarms.create_alarm([this, period]()
30- {
31- {
32- std::lock_guard<std::mutex> lock{session_mutex};
33- for (auto const& session_pair : sessions)
34- {
35- if (!session_pair.second->replied_since_last_ping)
36- {
37- session_pair.second->flagged_as_unresponsive = true;
38- unresponsive_sessions_temporary.push_back(session_pair.first);
39- }
40- else
41- {
42- session_pair.second->pinger();
43- session_pair.second->replied_since_last_ping = false;
44- }
45- }
46- }
47-
48- // Dispatch notifications outside the lock.
49- for (auto const& unresponsive_session : unresponsive_sessions_temporary)
50- {
51- observers.session_unresponsive(unresponsive_session);
52- }
53-
54- unresponsive_sessions_temporary.clear();
55-
56- this->alarm->reschedule_in(period);
57- })}
58+ : period{period},
59+ alarm{alarms.create_alarm(std::bind(&TimeoutApplicationNotRespondingDetector::handle_ping_cycle, this))}
60 {
61- alarm->reschedule_in(period);
62 }
63
64 ms::TimeoutApplicationNotRespondingDetector::~TimeoutApplicationNotRespondingDetector()
65@@ -94,8 +66,16 @@
66 void ms::TimeoutApplicationNotRespondingDetector::register_session(
67 frontend::Session const* session, std::function<void()> const& pinger)
68 {
69- std::lock_guard<std::mutex> lock{session_mutex};
70- sessions[dynamic_cast<Session const*>(session)] = std::make_unique<ANRContext>(pinger);
71+ bool alarm_needs_schedule;
72+ {
73+ std::lock_guard<std::mutex> lock{session_mutex};
74+ sessions[dynamic_cast<Session const*>(session)] = std::make_unique<ANRContext>(pinger);
75+ alarm_needs_schedule = alarm->state() != mt::Alarm::State::pending;
76+ }
77+ if (alarm_needs_schedule)
78+ {
79+ alarm->reschedule_in(period);
80+ }
81 }
82
83 void ms::TimeoutApplicationNotRespondingDetector::unregister_session(
84@@ -109,6 +89,7 @@
85 frontend::Session const* received_for)
86 {
87 bool needs_now_responsive_notification{false};
88+ bool alarm_needs_rescheduling;
89 {
90 std::lock_guard<std::mutex> lock{session_mutex};
91
92@@ -119,11 +100,17 @@
93 needs_now_responsive_notification = true;
94 }
95 session_ctx->replied_since_last_ping = true;
96+
97+ alarm_needs_rescheduling = alarm->state() != mt::Alarm::State::pending;
98 }
99 if (needs_now_responsive_notification)
100 {
101 observers.session_now_responsive(dynamic_cast<Session const*>(received_for));
102 }
103+ if (alarm_needs_rescheduling)
104+ {
105+ alarm->reschedule_in(period);
106+ }
107 }
108
109 void ms::TimeoutApplicationNotRespondingDetector::register_observer(
110@@ -137,3 +124,44 @@
111 {
112 observers.remove(observer);
113 }
114+
115+void ms::TimeoutApplicationNotRespondingDetector::handle_ping_cycle()
116+{
117+ bool needs_rearm{false};
118+ {
119+ std::lock_guard<std::mutex> lock{session_mutex};
120+ for (auto const& session_pair : sessions)
121+ {
122+ bool const newly_unresponsive =
123+ !session_pair.second->replied_since_last_ping &&
124+ !session_pair.second->flagged_as_unresponsive;
125+ bool const needs_ping =
126+ session_pair.second->replied_since_last_ping;
127+
128+ if (newly_unresponsive)
129+ {
130+ session_pair.second->flagged_as_unresponsive = true;
131+ unresponsive_sessions_temporary.push_back(session_pair.first);
132+ }
133+ else if (needs_ping)
134+ {
135+ session_pair.second->pinger();
136+ session_pair.second->replied_since_last_ping = false;
137+ needs_rearm = true;
138+ }
139+ }
140+ }
141+
142+ // Dispatch notifications outside the lock.
143+ for (auto const& unresponsive_session : unresponsive_sessions_temporary)
144+ {
145+ observers.session_unresponsive(unresponsive_session);
146+ }
147+
148+ unresponsive_sessions_temporary.clear();
149+
150+ if (needs_rearm)
151+ {
152+ this->alarm->reschedule_in(this->period);
153+ }
154+}
155
156=== modified file 'src/server/scene/timeout_application_not_responding_detector.h'
157--- src/server/scene/timeout_application_not_responding_detector.h 2015-07-06 01:14:14 +0000
158+++ src/server/scene/timeout_application_not_responding_detector.h 2015-09-08 07:23:10 +0000
159@@ -64,6 +64,8 @@
160 void register_observer(std::shared_ptr<Observer> const& observer) override;
161 void unregister_observer(std::shared_ptr<Observer> const& observer) override;
162 private:
163+ void handle_ping_cycle();
164+
165 struct ANRContext;
166
167 class ANRObservers : public Observer, private BasicObservers<Observer>
168@@ -80,6 +82,7 @@
169 std::unordered_map<Session const*, std::unique_ptr<ANRContext>> sessions;
170 std::vector<Session const*> unresponsive_sessions_temporary;
171
172+ std::chrono::milliseconds const period;
173 std::unique_ptr<time::Alarm> const alarm;
174 };
175 }
176
177=== modified file 'tests/include/mir/test/doubles/fake_alarm_factory.h'
178--- tests/include/mir/test/doubles/fake_alarm_factory.h 2015-06-30 04:51:42 +0000
179+++ tests/include/mir/test/doubles/fake_alarm_factory.h 2015-09-08 07:23:10 +0000
180@@ -41,6 +41,7 @@
181 std::shared_ptr<mir::time::Clock> const& clock);
182
183 void time_updated();
184+ int wakeup_count() const;
185
186 bool cancel() override;
187 State state() const override;
188@@ -49,6 +50,7 @@
189 bool reschedule_for(mir::time::Timestamp timeout) override;
190
191 private:
192+ int triggered_count;
193 std::function<void()> const callback;
194 State alarm_state;
195 mir::time::Timestamp triggers_at;
196@@ -66,6 +68,8 @@
197 std::shared_ptr<mir::LockableCallback> const& callback) override;
198
199 void advance_by(mt::Duration step);
200+ void advance_smoothly_by(mt::Duration step);
201+ int wakeup_count() const;
202
203 private:
204 std::vector<FakeAlarm*> alarms;
205
206=== modified file 'tests/mir_test_doubles/fake_alarm_factory.cpp'
207--- tests/mir_test_doubles/fake_alarm_factory.cpp 2015-06-30 04:51:42 +0000
208+++ tests/mir_test_doubles/fake_alarm_factory.cpp 2015-09-08 07:23:10 +0000
209@@ -18,12 +18,15 @@
210
211 #include "mir/test/doubles/fake_alarm_factory.h"
212
213+#include <numeric>
214+
215 namespace mtd = mir::test::doubles;
216
217 mtd::FakeAlarm::FakeAlarm(
218 std::function<void()> const& callback,
219 std::shared_ptr<mir::time::Clock> const& clock)
220- : callback{callback},
221+ : triggered_count{0},
222+ callback{callback},
223 alarm_state{State::cancelled},
224 triggers_at{mir::time::Timestamp::max()},
225 clock{clock}
226@@ -37,9 +40,15 @@
227 triggers_at = mir::time::Timestamp::max();
228 alarm_state = State::triggered;
229 callback();
230+ ++triggered_count;
231 }
232 }
233
234+int mtd::FakeAlarm::wakeup_count() const
235+{
236+ return triggered_count;
237+}
238+
239 bool mtd::FakeAlarm::cancel()
240 {
241 if (alarm_state == State::pending)
242@@ -108,3 +117,26 @@
243 alarm->time_updated();
244 }
245 }
246+
247+void mtd::FakeAlarmFactory::advance_smoothly_by(mt::Duration step)
248+{
249+ using namespace std::literals::chrono_literals;
250+ auto const step_by = 1ms;
251+ while (step.count() > 0)
252+ {
253+ advance_by(step_by);
254+ step -= step_by;
255+ }
256+}
257+
258+int mtd::FakeAlarmFactory::wakeup_count() const
259+{
260+ return std::accumulate(
261+ alarms.begin(),
262+ alarms.end(),
263+ 0,
264+ [](int count, FakeAlarm const* alarm)
265+ {
266+ return count + alarm->wakeup_count();
267+ });
268+}
269
270=== modified file 'tests/unit-tests/scene/test_timeout_application_not_responding_detector.cpp'
271--- tests/unit-tests/scene/test_timeout_application_not_responding_detector.cpp 2015-07-06 01:14:14 +0000
272+++ tests/unit-tests/scene/test_timeout_application_not_responding_detector.cpp 2015-09-08 07:23:10 +0000
273@@ -277,3 +277,195 @@
274 // Notification should not have been generated
275 EXPECT_FALSE(session_not_responding);
276 }
277+
278+TEST(TimeoutApplicationNotRespondingDetector, does_not_schedule_alarm_when_no_sessions)
279+{
280+ using namespace testing;
281+ using namespace std::literals::chrono_literals;
282+
283+ mtd::FakeAlarmFactory fake_alarms;
284+
285+ ms::TimeoutApplicationNotRespondingDetector detector{fake_alarms, 1s};
286+
287+ // Go through several ping cycles.
288+ fake_alarms.advance_smoothly_by(5000ms);
289+
290+ EXPECT_THAT(fake_alarms.wakeup_count(), Eq(0));
291+
292+ NiceMock<mtd::MockSceneSession> session;
293+
294+ detector.register_session(&session, [](){});
295+
296+ fake_alarms.advance_smoothly_by(5000ms);
297+ EXPECT_THAT(fake_alarms.wakeup_count(), Gt(0));
298+
299+ int const previous_wakeup_count{fake_alarms.wakeup_count()};
300+
301+ detector.unregister_session(&session);
302+
303+ fake_alarms.advance_smoothly_by(5000ms);
304+
305+ // We allow one extra wakeup to notice there are no active sessions.
306+ EXPECT_THAT(fake_alarms.wakeup_count(), Le(previous_wakeup_count + 1));
307+}
308+
309+TEST(TimeoutApplicationNotRespondingDetector, does_not_schedule_alarm_when_all_sessions_are_unresponsive)
310+{
311+ using namespace testing;
312+ using namespace std::literals::chrono_literals;
313+
314+ mtd::FakeAlarmFactory fake_alarms;
315+
316+ ms::TimeoutApplicationNotRespondingDetector detector{fake_alarms, 1s};
317+
318+ NiceMock<mtd::MockSceneSession> session;
319+ bool session_unresponsive{false};
320+
321+ auto observer = std::make_shared<NiceMock<MockObserver>>();
322+ ON_CALL(*observer, session_unresponsive(_))
323+ .WillByDefault(Invoke([&session_unresponsive](auto /*session*/)
324+ {
325+ session_unresponsive = true;
326+ }));
327+ detector.register_observer(observer);
328+
329+ detector.register_session(&session, [](){});
330+
331+ fake_alarms.advance_smoothly_by(5000ms);
332+
333+ ASSERT_TRUE(session_unresponsive);
334+ EXPECT_THAT(fake_alarms.wakeup_count(), Gt(0));
335+
336+ int const previous_wakeup_count{fake_alarms.wakeup_count()};
337+
338+ // All the sessions are now unresponsive, so we shouldn't be triggering wakeups
339+ fake_alarms.advance_smoothly_by(5000ms);
340+ EXPECT_THAT(fake_alarms.wakeup_count(), Eq(previous_wakeup_count));
341+}
342+
343+TEST(TimeoutApplicationNotRespondingDetector, session_switches_between_responsive_and_unresponsive)
344+{
345+ using namespace testing;
346+ using namespace std::literals::chrono_literals;
347+
348+ mtd::FakeAlarmFactory fake_alarms;
349+
350+ ms::TimeoutApplicationNotRespondingDetector detector{fake_alarms, 1s};
351+
352+ NiceMock<mtd::MockSceneSession> session;
353+ bool session_unresponsive{false};
354+
355+ auto observer = std::make_shared<NiceMock<MockObserver>>();
356+ ON_CALL(*observer, session_unresponsive(_))
357+ .WillByDefault(Invoke([&session_unresponsive](auto /*session*/)
358+ {
359+ session_unresponsive = true;
360+ }));
361+ ON_CALL(*observer, session_now_responsive(_))
362+ .WillByDefault(Invoke([&session_unresponsive](auto /*session*/)
363+ {
364+ session_unresponsive = false;
365+ }));
366+ detector.register_observer(observer);
367+
368+ detector.register_session(&session, [](){});
369+
370+ fake_alarms.advance_smoothly_by(5000ms);
371+
372+ EXPECT_TRUE(session_unresponsive);
373+
374+ detector.pong_received(&session);
375+ EXPECT_FALSE(session_unresponsive);
376+
377+ fake_alarms.advance_smoothly_by(5000ms);
378+ EXPECT_TRUE(session_unresponsive);
379+
380+ detector.pong_received(&session);
381+ EXPECT_FALSE(session_unresponsive);
382+}
383+
384+TEST(TimeoutApplicationNotRespondingDetector, sends_unresponsive_notification_only_once)
385+{
386+ using namespace testing;
387+ using namespace std::literals::chrono_literals;
388+
389+ mtd::FakeAlarmFactory fake_alarms;
390+
391+ ms::TimeoutApplicationNotRespondingDetector detector{fake_alarms, 1s};
392+
393+ NiceMock<mtd::MockSceneSession> session_one;
394+ NiceMock<mtd::MockSceneSession> session_two;
395+ auto delayed_dispatch_pong = fake_alarms.create_alarm(
396+ [&detector, &session_two]()
397+ {
398+ detector.pong_received(&session_two);
399+ });
400+ int unresponsive_notifications{0};
401+
402+ auto observer = std::make_shared<NiceMock<MockObserver>>();
403+ ON_CALL(*observer, session_unresponsive(_))
404+ .WillByDefault(Invoke([&unresponsive_notifications](auto /*session*/)
405+ {
406+ ++unresponsive_notifications;
407+ }));
408+ detector.register_observer(observer);
409+
410+ detector.register_session(&session_one, [](){});
411+ // We need a responsive session to ensure the ANRDetector keeps rescheduling wakeups.
412+ //
413+ // We need the delayed_dispatch_pong so that we can do the pong outside of the
414+ // ping callback; otherwise this would deadlock.
415+ detector.register_session(&session_two,
416+ [&delayed_dispatch_pong]()
417+ {
418+ delayed_dispatch_pong->reschedule_in(1ms);
419+ });
420+
421+ fake_alarms.advance_smoothly_by(5000ms);
422+
423+ EXPECT_THAT(unresponsive_notifications, Eq(1));
424+}
425+
426+TEST(TimeoutApplicationNotRespondingDetector, sends_pings_on_schedule_as_sessions_are_connecting)
427+{
428+ using namespace testing;
429+ using namespace std::literals::chrono_literals;
430+
431+ mtd::FakeAlarmFactory fake_alarms;
432+
433+ auto const cycle_time = 1s;
434+ ms::TimeoutApplicationNotRespondingDetector detector{fake_alarms, cycle_time};
435+
436+ NiceMock<mtd::MockSceneSession> session;
437+ std::atomic<int> ping_count{0};
438+ auto delayed_dispatch_pong = fake_alarms.create_alarm(
439+ [&detector, &session, &ping_count]()
440+ {
441+ detector.pong_received(&session);
442+ ++ping_count;
443+ });
444+
445+ detector.register_session(&session,
446+ [&delayed_dispatch_pong]()
447+ {
448+ delayed_dispatch_pong->reschedule_in(1ms);
449+ });
450+
451+
452+ auto duration = 0ms;
453+ auto const step = 500ms;
454+
455+ // <= ensures we go past the threshold for the cycle
456+ while (duration <= 5s)
457+ {
458+ NiceMock<mtd::MockSceneSession> dummy_session;
459+ detector.register_session(&dummy_session, [](){});
460+
461+ fake_alarms.advance_smoothly_by(step);
462+ duration += step;
463+
464+ detector.unregister_session(&dummy_session);
465+ }
466+
467+ EXPECT_THAT(ping_count, Ge(duration / cycle_time));
468+}

Subscribers

People subscribed via source and target branches