Merge lp:~alan-griffiths/mir/fix-1526248 into lp:mir
| Status: | Merged |
|---|---|
| Approved by: | Daniel van Vugt on 2015-12-17 |
| Approved revision: | 3200 |
| Merged at revision: | 3201 |
| Proposed branch: | lp:~alan-griffiths/mir/fix-1526248 |
| Merge into: | lp:mir |
| Diff against target: |
158 lines (+24/-18) 5 files modified
src/client/connection_surface_map.h (+2/-2) src/client/rpc/mir_basic_rpc_channel.cpp (+4/-2) src/client/rpc/mir_basic_rpc_channel.h (+3/-1) src/client/rpc/mir_protobuf_rpc_channel.cpp (+7/-3) src/client/surface_map.cpp (+8/-10) |
| To merge this branch: | bzr merge lp:~alan-griffiths/mir/fix-1526248 |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Daniel van Vugt | Abstain on 2015-12-17 | ||
| Alberto Aguirre | Approve on 2015-12-16 | ||
| PS Jenkins bot | continuous-integration | Approve on 2015-12-15 | |
| Kevin DuBois (community) | 2015-12-15 | Approve on 2015-12-15 | |
|
Review via email:
|
|||
Commit Message
client: Fix racy access surfaces and streams in ConnectionSurfa
Description of the Change
client: Fix racy access surfaces and streams in ConnectionSurfa
| Kevin DuBois (kdub) wrote : | # |
| Kevin DuBois (kdub) wrote : | # |
- 3200. By Alan Griffiths on 2015-12-15
-
Fix race in mclr::MirProtob
ufRpcChannel: :on_data_ available( )
| Alan Griffiths (alan-griffiths) wrote : | # |
> I have a more involved fix coming (probably today actually), where we switch
> to using shared ptrs.
Do you want to block this version?
Is there an additional reason to switch to shared_ptr<>?
| Kevin DuBois (kdub) wrote : | # |
The additional reason was that the shutdown sequence is somewhat confused, and its difficult to come up with a proper shutdown/disconnect message handling with NBS (which has more rapid events being sent)
No reason to block this one though, (esp if it has a second fix in it outside of SurfaceMap) lgtm.
| PS Jenkins bot (ps-jenkins) wrote : | # |
PASSED: Continuous integration, rev:3199
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
SUCCESS: http://
Click here to trigger a rebuild:
http://
| PS Jenkins bot (ps-jenkins) wrote : | # |
PASSED: Continuous integration, rev:3200
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
SUCCESS: http://
Click here to trigger a rebuild:
http://
| Daniel van Vugt (vanvugt) wrote : | # |
I recall a long time ago that locking during exec() caused deadlocks there. How sure are you that's now fixed?
- lk.unlock();
| Alan Griffiths (alan-griffiths) wrote : | # |
> I recall a long time ago that locking during exec() caused deadlocks there.
> How sure are you that's now fixed?
>
> - lk.unlock();
> exec(surface);
I've left the tests of this area looping over lunch and been running exploratory tests.
This isn't mathematical proof that there isn't a deadlock scenario, but deadlock is hypothetical and this fixes a demonstrable problem:
Unlocking is clearly wrong as it allows another thread to call erase() followed by deleting "surface" between unlock() and exec(). Holding a shared lock prevents that while still allowing non-mutating access.
| Alberto Aguirre (albaguirre) wrote : | # |
There's a lock order inversion report (with a TSan build)
but It looks benign since the only time the locks in question are acquired in different order is during surface creation.
During creation:
Callback MirSurface::created is invoked
-locks MirSurface::mutex
-calls buffer_
--calls MirConnection:
---calls ConnectionSurfa
----write locks ConnectionSurfa
During normal operation:
MirProtobufRpcC
-calls ConnectionSurfa
--Read locks ConnectionSurfa
---Invokes callback which does MirSurface:
----locks MirSurface::mutex
http://
I don't think the buffer stream creation needs to be done with a locked MirSurface::mutex; We should take a closer look at the uses of MirSurface::mutex, some don't seem necessary -but that's a separate issue outside the scope of this MP.

I have a more involved fix coming (probably today actually), where we switch to using shared ptrs.
Its a bit unintuitive, but since the resources in the map are raw ptrs, they can actually be deleted before the map gets the lock, and removes them. The branch I have switches to shared_ptrs, so the object can't get deleted while under SurfaceMap's lock.