Mir

Merge lp:~mir-team/mir/fix-1339700-take2 into lp:mir

Proposed by Alberto Aguirre
Status: Merged
Approved by: Daniel van Vugt
Approved revision: no longer in the source branch.
Merged at revision: 1769
Proposed branch: lp:~mir-team/mir/fix-1339700-take2
Merge into: lp:mir
Diff against target: 100 lines (+26/-7)
3 files modified
src/server/asio_main_loop.cpp (+16/-3)
src/server/compositor/buffer_queue.h (+4/-1)
src/server/compositor/timeout_frame_dropping_policy_factory.cpp (+6/-3)
To merge this branch: bzr merge lp:~mir-team/mir/fix-1339700-take2
Reviewer Review Type Date Requested Status
PS Jenkins bot (community) continuous-integration Approve
Chris Halse Rogers Approve
Daniel van Vugt Approve
Review via email: mp+226534@code.launchpad.net

Commit message

Fix an additional deadlock that could occur after the original deadlock
is fixed (LP: #1339700)

Only ~AlarmImpl should wait for callback completion.

Warning: This introduces a semantic change whereby Alarm::cancel() is now
non-blocking and doesn't guarantee that callbacks are complete. In future
(when ABI changes permit), stop() will be exposed as the public method for
blocking until callbacks are done.

Description of the change

Until stop is exposed, we ensure that the alarm (and objects with embedded alarms) are the first ones to be destroyed so dead objects are not accessed by installed callbacks.

---
Alberto Aguirre (albaguirre) wrote on 2014-07-11: #
Thread A is calling BufferQueue::release, owns BufferQueue::guard, and it's about to call TimeoutFrameDroppingPolicy::swap_unblocked() gets pre-empted

Thread B is executing the alarm handler in TimeoutFrameDroppingPolicy, which invokes the callback installed by BufferQueue to drop a frame, attempts to own BufferQueue::guard so it blocks.

Thread A resumes, calls TimeoutFrameDroppingPolicy::swap_unblocked() and calls alarm->cancel(), it deadlocks, since data->callbacks_running will never be decremented (waiting for the alarm handler invocation to return).

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I like the idea of allowing cancel to become asynchronous in future. But I think we need to be careful to avoid clouding the issue of fixing the important bugs first...

(1) Bug 1339700 is fixed already, according to all the details in bug 1339700 that problem is already solved. So this proposal doesn't need to call itself fix-1339700-anything but that's not a blocker.

(2) Changing cancel() to non-blocking and introducing stop() as the blocking method is a semantic API change. Sounds like a good idea to me, but it's not required to declare bug 1339700 fixed, and also changes like that can't be back-ported to a stable branch like 0.4.

(3) Is it necessary to propose changes to Alarm and the buffering code simultaneously?

(4) Please don't use the lambda syntax with wait. Am I the only one who can't read it?
31 + data->callback_done.wait(lock,[this]{return data->callbacks_running == 0;});
vs
40 - while (data->callbacks_running > 0)
41 - data->callback_done.wait(lock);
The latter is more familiar to users of condition variables from before C++ gained them. It's also more efficient (or doesn't require optimization anyway) and easier to read as logic.

(5) This needs a more descriptive name:
47 +bool AlarmImpl::cancel_l()

review: Needs Fixing
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Oh, I see you've described a new potential deadlock you're trying to fix. That should probably have been mentioned in this proposal...
https://code.launchpad.net/~vanvugt/mir/fix-1339700-alarm/+merge/226252/comments/546666

This is one of those situations where the deadlock shown in the bug report is solved, but we may have to fix other code to stop additional (previously masked) deadlocks from happening.

Revision history for this message
Alberto Aguirre (albaguirre) wrote :

1) no it's not fixed
2) it's REQUIRED to fix 1339700
3) why not?
4) I prefer the lambda
5) ummm ok.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Yes, sorry. You can go to bed. I'll work on getting this approved and landed for your Monday.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Seems reasonable. Providing we don't have any lingering code in Mir that still depends on Alarm::cancel() blocking, it should be OK.

review: Approve
Revision history for this message
Chris Halse Rogers (raof) wrote :

Looks OK to me.

I think you *might* be the only one who finds lambda syntax with cv.wait() unreadable :)

review: Approve
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

"The latter is more familiar to users of condition variables from before C++ gained them. It's also more efficient (or doesn't require optimization anyway) and easier to read as logic."

You know the lambda version just hides the while loop in the function, right? So it's harder to fully understand, and slightly less efficient in passing extra parameters and calling back into a lambda.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/server/asio_main_loop.cpp'
2--- src/server/asio_main_loop.cpp 2014-07-11 04:03:02 +0000
3+++ src/server/asio_main_loop.cpp 2014-07-14 03:52:53 +0000
4@@ -344,11 +344,13 @@
5 bool reschedule_in(std::chrono::milliseconds delay) override;
6 bool reschedule_for(mir::time::Timestamp time_point) override;
7 private:
8+ void stop();
9+ bool cancel_unlocked();
10 void update_timer();
11 struct InternalState
12 {
13 explicit InternalState(std::function<void(void)> callback)
14- : callback{callback}
15+ : callback{callback}, state{pending}
16 {
17 }
18
19@@ -422,16 +424,27 @@
20
21 AlarmImpl::~AlarmImpl() noexcept
22 {
23- AlarmImpl::cancel();
24+ AlarmImpl::stop();
25 }
26
27-bool AlarmImpl::cancel()
28+void AlarmImpl::stop()
29 {
30 std::unique_lock<decltype(data->m)> lock(data->m);
31
32 while (data->callbacks_running > 0)
33 data->callback_done.wait(lock);
34
35+ cancel_unlocked();
36+}
37+
38+bool AlarmImpl::cancel()
39+{
40+ std::lock_guard<decltype(data->m)> lock(data->m);
41+ return cancel_unlocked();
42+}
43+
44+bool AlarmImpl::cancel_unlocked()
45+{
46 if (data->state == triggered)
47 return false;
48
49
50=== modified file 'src/server/compositor/buffer_queue.h'
51--- src/server/compositor/buffer_queue.h 2014-07-09 09:52:33 +0000
52+++ src/server/compositor/buffer_queue.h 2014-07-14 03:52:53 +0000
53@@ -86,11 +86,14 @@
54
55 int nbuffers;
56 bool frame_dropping_enabled;
57- std::unique_ptr<FrameDroppingPolicy> framedrop_policy;
58 graphics::BufferProperties the_properties;
59
60 std::condition_variable snapshot_released;
61 std::shared_ptr<graphics::GraphicBufferAllocator> gralloc;
62+
63+ // Ensure framedrop_policy gets destroyed first so the callback installed
64+ // does not access dead objects.
65+ std::unique_ptr<FrameDroppingPolicy> framedrop_policy;
66 };
67
68 }
69
70=== modified file 'src/server/compositor/timeout_frame_dropping_policy_factory.cpp'
71--- src/server/compositor/timeout_frame_dropping_policy_factory.cpp 2014-06-25 16:30:46 +0000
72+++ src/server/compositor/timeout_frame_dropping_policy_factory.cpp 2014-07-14 03:52:53 +0000
73@@ -41,22 +41,25 @@
74
75 private:
76 std::chrono::milliseconds const timeout;
77+ std::atomic<unsigned int> pending_swaps;
78+
79+ // Ensure alarm gets destroyed first so its handler does not access dead
80+ // objects.
81 std::unique_ptr<mir::time::Alarm> const alarm;
82- std::atomic<unsigned int> pending_swaps;
83 };
84
85 TimeoutFrameDroppingPolicy::TimeoutFrameDroppingPolicy(std::shared_ptr<mir::time::Timer> const& timer,
86 std::chrono::milliseconds timeout,
87 std::function<void(void)> drop_frame)
88 : timeout{timeout},
89+ pending_swaps{0},
90 alarm{timer->create_alarm([this, drop_frame]
91 {
92 assert(pending_swaps.load() > 0);
93 drop_frame();
94 if (--pending_swaps > 0)
95 alarm->reschedule_in(this->timeout);
96- })},
97- pending_swaps{0}
98+ })}
99 {
100 }
101

Subscribers

People subscribed via source and target branches