Mir

Merge lp:~afrantzis/mir/fix-1441759-screencast into lp:mir

Proposed by Alexandros Frantzis
Status: Merged
Approved by: Alexandros Frantzis
Approved revision: no longer in the source branch.
Merged at revision: 2522
Proposed branch: lp:~afrantzis/mir/fix-1441759-screencast
Merge into: lp:mir
Diff against target: 73 lines (+38/-5)
2 files modified
src/server/frontend/session_mediator.cpp (+8/-5)
tests/unit-tests/frontend/test_session_mediator.cpp (+30/-0)
To merge this branch: bzr merge lp:~afrantzis/mir/fix-1441759-screencast
Reviewer Review Type Date Requested Status
Alan Griffiths Approve
Chris Halse Rogers Needs Fixing
Kevin DuBois (community) Approve
PS Jenkins bot (community) continuous-integration Approve
Alberto Aguirre (community) Approve
Review via email: mp+256942@code.launchpad.net

Commit message

frontend: Properly handle asynchronous completion function invocations (LP: #1441759)

Under some circumstances (e.g., when screencasting) the completion function for a client buffer request may be invoked asynchronously, but from the same thread that initiated the client buffer request (i.e. one of the IPC threads). Our code didn't handle this situation properly, causing clients to hang waiting for a reply that never arrived.

Description of the change

frontend: Properly handle asynchronous swap buffer completion function invocations (LP: #1441759)

Under some circumstances (e.g., when screencasting) the completion function for a client buffer request may be invoked asynchronously, but from the same thread that initiated the client buffer request (i.e. one of the IPC threads). Our code didn't handle this situation properly, causing clients to hang waiting for a reply that never arrived.

To post a comment you must log in.
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

LGTM.

review: Approve
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Kevin DuBois (kdub) :
review: Approve
Revision history for this message
Chris Halse Rogers (raof) wrote :

I'm a bit confused here. Under what situation will weak_tid.lock() return a valid pointer but *weak_tid.lock() != this_thread::get_id()?

Yes; the only time that it's valid to call lock.unlock() is when the lambda is executed in a stack frame below advance_buffer, so really we're not shuttling a thread_id into the swap_buffers callback, we're checking whether it's being called on the same stack or not.

It looks like this could be made simpler by just passing ownership of lock to advance_buffer.

Actually, what is that lock protecting, anyway? Eyeballing it, I think it'd be more correct to just release the lock before calling advance_buffers.

Marked as needs fixing. At least requires removing the this_thread.get_id() check, because that check is guaranteed to be true by virtue of executing deeper on the same stack as advance_buffer.

review: Needs Fixing
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> I'm a bit confused here. Under what situation will weak_tid.lock() return a valid pointer
> Under what situation will weak_tid.lock() return a valid pointer but *weak_tid.lock() != this_thread::get_id()?

This can happen when a buffer is not available to the client when calling stream.swap_buffers(), but becomes available soon after stream.swap_buffers() returns (e.g. because a compositor has released a buffer) but before we leave the advance_buffer function (so weak_ptr is still valid). The completion callback will then be called in another thread (the compositor thread), while weak_ptr is still valid.

> It looks like this could be made simpler by just passing ownership of lock to advance_buffer.

The idea is to avoid holding the lock while running the completion function, which (among other things) performs IPC operations. If lock ownership was passed to advance_buffer (and not conditionally unlocked in the completion function), we would be holding the lock if the completion function executed synchronously.

Revision history for this message
Chris Halse Rogers (raof) wrote :

> > I'm a bit confused here. Under what situation will weak_tid.lock() return a
> valid pointer
> > Under what situation will weak_tid.lock() return a valid pointer but
> *weak_tid.lock() != this_thread::get_id()?
>
> This can happen when a buffer is not available to the client when calling
> stream.swap_buffers(), but becomes available soon after stream.swap_buffers()
> returns (e.g. because a compositor has released a buffer) but before we leave
> the advance_buffer function (so weak_ptr is still valid). The completion
> callback will then be called in another thread (the compositor thread), while
> weak_ptr is still valid.

Ah, yeah, fair point.
>
> > It looks like this could be made simpler by just passing ownership of lock
> to advance_buffer.
>
> The idea is to avoid holding the lock while running the completion function,
> which (among other things) performs IPC operations. If lock ownership was
> passed to advance_buffer (and not conditionally unlocked in the completion
> function), we would be holding the lock if the completion function executed
> synchronously.
Right. advance_buffer would be in charge of releasing the lock, which it would be perfectly safe to do unconditionally.

I think I like my last suggestion the most, though - don't call advance_buffer with the session_mutex locked, because it's unnecessary.

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> I think I like my last suggestion the most, though - don't call advance_buffer with the session_mutex locked, because it's unnecessary.

I think you may be right, but I want to investigate and test a bit more to ensure it doesn't cause any unintended consequences.

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> I think you may be right, but I want to investigate and test a bit more to ensure it doesn't cause any unintended consequences.

That being said, I would prefer making such an investigation/change independent of this MP. This MP fixes a bug with minimal changes, so I think it would be reasonable to merge this even if we remove the whole locking scheme soon.

Revision history for this message
Alberto Aguirre (albaguirre) wrote :

>Right. advance_buffer would be in charge of releasing the lock, which it would be perfectly safe >to do unconditionally.

No, we moved away from that because you don't want to release a lock from a thread that doesn't own it.

Revision history for this message
Chris Halse Rogers (raof) wrote :

On Thu, Apr 23, 2015 at 12:31 AM, Alberto Aguirre
<email address hidden> wrote:
>> Right. advance_buffer would be in charge of releasing the lock,
>> which it would be perfectly safe >to do unconditionally.
>
> No, we moved away from that because you don't want to release a lock
> from a thread that doesn't own it.

No, I mean advance_buffer can release the lock *before* calling
swap_buffers.

This is basically a consequence of “holding the lock while calling
advance_buffer doesn't protect anything”.

@Alexandros: If your goal is a quick bugfix for quick
release/cherry-picking, then yeah, go ahead with this. In the absence
of such a goal, I think removing the locking is more appropriate.

Revision history for this message
Alberto Aguirre (albaguirre) wrote :

> On Thu, Apr 23, 2015 at 12:31 AM, Alberto Aguirre
> <email address hidden> wrote:
> >> Right. advance_buffer would be in charge of releasing the lock,
> >> which it would be perfectly safe >to do unconditionally.
> >
> > No, we moved away from that because you don't want to release a lock
> > from a thread that doesn't own it.
>
> No, I mean advance_buffer can release the lock *before* calling
> swap_buffers.
>
> This is basically a consequence of “holding the lock while calling
> advance_buffer doesn't protect anything”.
>
> @Alexandros: If your goal is a quick bugfix for quick
> release/cherry-picking, then yeah, go ahead with this. In the absence
> of such a goal, I think removing the locking is more appropriate.

But do we want to say call swap_buffers even if release_surface has been called?
I believe that was the other reason for that mutex existing - making each of the SessionMediator calls atomic up to the IPC callback

Revision history for this message
Chris Halse Rogers (raof) wrote :

If that's our goal then I think we fail at it now - when swap_buffers is handled asynchronously I can't see any guarantee that any number of RPC calls haven't been *completely* processed before the IPC callback occurs.

I'm not sure why we want that guarantee, anyway? From a client perspective it's perfectly valid to call mir_surface_release without waiting for a previous mir_surface_swap_buffers to complete.

We do need to ensure that we process IPC messages in-connection-order, but locking at the SessionMediator level is insufficient to guarantee that. Our core socket read code is not reentrant, so it's guaranteed that SessionMediator is not going to be asked to process a release_surface call before a previous create_surface/exchange_buffers/next_buffer call has completed.

If we wanted to provide ordering guarantees across connections I think we'd need to provide explicit RPC ordering primitives.

Revision history for this message
Chris Halse Rogers (raof) wrote :

“...has completed.” On the same connection, which corresponds to the same Session.

I think it turns out that I'm arguing that almost all of the locking we do in SessionMediator is pointless? That seems a strong claim. I wonder if ThreadSanitizer can test this hypothesis...

Revision history for this message
Chris Halse Rogers (raof) wrote :

It can, and it's weak evidence in favour. Removing session_mutex entirely (a) doesn't cause any tests to fail, and (b) doesn't cause ThreadSanitiser to squawk.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Shorten please :)

+TEST_F(SessionMediator, completes_exchange_buffer_when_completion_is_invoked_asynchronously_from_thread_that_initiated_exchange)

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I've not reviewed the locking logic WRT the above discussion but beware of the fact that we're currently running a single IPC thread by default but potentially have a pool.

I think we can now be confident that a pool isn't needed and could remove that option. I also think we can massively simplify the code in this area by using coroutines now that asio has spawn and yield_context.

But this is a decent minimal fix for the problem addressed.

review: Approve
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

A pool (or something more elegant than a pool) would be required to solve bug 1395421. I'd favour moving toward ultra-light-weight (and non-blocking) RPC instead though, so the location of the response call becomes irrelevant.

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

Top approving. We can revisit this as part of a potential 'frontend' reworking as discussed in previous comments.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/server/frontend/session_mediator.cpp'
2--- src/server/frontend/session_mediator.cpp 2015-04-16 20:59:36 +0000
3+++ src/server/frontend/session_mediator.cpp 2015-04-21 14:21:34 +0000
4@@ -161,23 +161,26 @@
5 std::unique_lock<std::mutex>& lock,
6 std::function<void(graphics::Buffer*, graphics::BufferIpcMsgType)> complete)
7 {
8- auto const tid = std::this_thread::get_id();
9+ auto const tid = std::make_shared<std::thread::id>(std::this_thread::get_id());
10+ auto const weak_tid = std::weak_ptr<std::thread::id>{tid};
11
12 stream.swap_buffers(
13 old_buffer,
14 // Note: We assume that the lambda will be executed within swap_buffers
15 // (in which case the lock reference is valid) or in a different thread
16 // altogether (in which case the dangling reference is not accessed)
17- [this, tid, &lock, stream_id, complete](mg::Buffer* new_buffer)
18+ [this, weak_tid, &lock, stream_id, complete](mg::Buffer* new_buffer)
19 {
20- if (tid == std::this_thread::get_id())
21- lock.unlock();
22+ if (auto const tid = weak_tid.lock())
23+ {
24+ if (*tid == std::this_thread::get_id())
25+ lock.unlock();
26+ }
27
28 if (buffer_stream_tracker.track_buffer(stream_id, new_buffer))
29 complete(new_buffer, mg::BufferIpcMsgType::update_msg);
30 else
31 complete(new_buffer, mg::BufferIpcMsgType::full_msg);
32-
33 });
34 }
35
36
37=== modified file 'tests/unit-tests/frontend/test_session_mediator.cpp'
38--- tests/unit-tests/frontend/test_session_mediator.cpp 2015-03-31 02:35:42 +0000
39+++ tests/unit-tests/frontend/test_session_mediator.cpp 2015-04-21 14:21:34 +0000
40@@ -899,3 +899,33 @@
41 EXPECT_THAT(*(reinterpret_cast<int*>(request.data.data())), Eq(magic));
42 EXPECT_THAT(drm_response.status_code(), Eq(test_response));
43 }
44+
45+// Regression test for LP: #1441759
46+TEST_F(SessionMediator, completes_exchange_buffer_when_completion_is_invoked_asynchronously_from_thread_that_initiated_exchange)
47+{
48+ using namespace testing;
49+ auto const& mock_surface = stubbed_session->mock_surface_at(mf::SurfaceId{0});
50+ mtd::StubBuffer stub_buffer1;
51+ mtd::StubBuffer stub_buffer2;
52+ std::function<void(mg::Buffer*)> completion_func;
53+
54+ // create
55+ InSequence seq;
56+ EXPECT_CALL(*mock_surface, swap_buffers(_, _))
57+ .WillOnce(InvokeArgument<1>(&stub_buffer1));
58+ // exchange, steal completion function
59+ EXPECT_CALL(*mock_surface, swap_buffers(_,_))
60+ .WillOnce(SaveArg<1>(&completion_func));
61+
62+ mediator.connect(nullptr, &connect_parameters, &connection, null_callback.get());
63+ mediator.create_surface(nullptr, &surface_parameters, &surface_response, null_callback.get());
64+
65+ buffer_request.mutable_id()->set_value(surface_response.id().value());
66+ *buffer_request.mutable_buffer() = surface_response.buffer_stream().buffer();
67+
68+ mediator.exchange_buffer(nullptr, &buffer_request, &buffer_response, null_callback.get());
69+
70+ // Execute completion function asynchronously (i.e. not as part of the exchange_buffer
71+ // call), but from the same thread that initiated the exchange_buffer operation
72+ completion_func(&stub_buffer2);
73+}

Subscribers

People subscribed via source and target branches