Mir

Merge lp:~raof/mir/fix-lock-inversion into lp:mir

Proposed by Chris Halse Rogers on 2017-07-26
Status: Merged
Approved by: Alan Griffiths on 2017-07-27
Approved revision: 4216
Merged at revision: 4215
Proposed branch: lp:~raof/mir/fix-lock-inversion
Merge into: lp:mir
Diff against target: 152 lines (+60/-40)
3 files modified
src/client/buffer_vault.cpp (+59/-12)
src/client/buffer_vault.h (+1/-0)
tests/unit-tests/client/test_buffer_vault.cpp (+0/-28)
To merge this branch: bzr merge lp:~raof/mir/fix-lock-inversion
Reviewer Review Type Date Requested Status
Mir CI Bot continuous-integration Approve on 2017-07-27
Alan Griffiths 2017-07-26 Approve on 2017-07-26
Review via email: mp+328069@code.launchpad.net

Commit message

mcl::BufferVault: Fix lock inversion.

~BufferVault takes BufferVault::mutex and then calls AtomicCallback::set_callback() on each buffer,
which takes AtomicCallback::mutex.

The RPC subsystem calls buffer->callback(), which takes AtomicCallback::mutex and then calls
BufferVault::wire_transfer_inbound... which immediately takes BufferVault::mutex.

This is clearly a disaster waiting to happen.

Instead, move to a two-phase destruction approach:
1) Lock BufferVault::mutex, set a flag telling callbacks to bail, and collect all the existing
    buffers.
2) Drop the lock, and use buffer->set_callback() to wait on any thread already in a callback.
3) Don't bother telling the server to release any buffers; they're associated with the
   BufferStream that we're in the process of destroying, and will get cleaned up there.

Description of the change

This, unfortunately, moves the deadlock into a different place, but I think it's a correct change.

To post a comment you must log in.
Alan Griffiths (alan-griffiths) wrote :

> The RPC subsystem calls buffer->callback(), which takes
> AtomicCallback::mutex and then calls BufferVault::wire_transfer_inbound...
> which immediately takes BufferVault::mutex.

This (pre-existing code) is hard to reason about, the extra logic doesn't help.

Would it be any more complex to make BufferVault::mutex a std::recursive_mutex? (The above is all on a single thread.)

review: Needs Information
Mir CI Bot (mir-ci-bot) wrote :

PASSED: Continuous integration, rev:4214
https://mir-jenkins.ubuntu.com/job/mir-ci/3510/
Executed test runs:
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-mir/4803
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/4986
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=artful/4975
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial/4975
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/4975
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=artful/4840
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=artful/4840/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/4840
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/4840/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=artful/4840
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=artful/4840/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial/4840
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial/4840/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/4840
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/4840/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=mesa,release=artful/4840
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=mesa,release=artful/4840/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=mesa,release=zesty/4840
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=mesa,release=zesty/4840/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial/4840
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial/4840/artifact/output/*zip*/output.zip

Click here to trigger a rebuild:
https://mir-jenkins.ubuntu.com/job/mir-ci/3510/rebuild

review: Approve (continuous-integration)
Chris Halse Rogers (raof) wrote :

It is on two different threads - the IPC thread calls buffer->callback (), the application thread calls ~BufferVault. I don't think a recursive mutex helps here?

On 26 July 2017 6:41:46 pm AEST, Alan Griffiths <email address hidden> wrote:
>Review: Needs Information
>
>> The RPC subsystem calls buffer->callback(), which takes
>> AtomicCallback::mutex and then calls
>BufferVault::wire_transfer_inbound...
>> which immediately takes BufferVault::mutex.
>
>This (pre-existing code) is hard to reason about, the extra logic
>doesn't help.
>
>Would it be any more complex to make BufferVault::mutex a
>std::recursive_mutex? (The above is all on a single thread.)
>--
>https://code.launchpad.net/~raof/mir/fix-lock-inversion/+merge/328069
>You are the owner of lp:~raof/mir/fix-lock-inversion.

Alan Griffiths (alan-griffiths) wrote :

> It is on two different threads - the IPC thread calls buffer->callback (), the
> application thread calls ~BufferVault. I don't think a recursive mutex helps
> here?

Ack. I misread what was happening.

review: Abstain
Alan Griffiths (alan-griffiths) wrote :

Nits, but non-blocking:

    bool disposing{false};

Perhaps not the clearest name. I don't have any great idea, but "shutting_down" seems marginally better.

~~~~

    try
    {
        ...
    }
    catch (...)
    {
    }

I guess this was there to ignore IPC errors in the destructor. If there are actually exceptions from the new code are being "eaten" something is badly wrong.

review: Approve
lp:~raof/mir/fix-lock-inversion updated on 2017-07-27
4215. By Chris Halse Rogers on 2017-07-27

mcl::~BufferVault: Drop catch-all exception handling.

We no longer do anything in the destructor that could reasonably be expected to throw.

4216. By Chris Halse Rogers on 2017-07-27

mcl::BufferVault: Rename “disposing” to “being_destroyed”.

Disposing is apparently not part of a widely-shared vocabulary, so try and make it
more obvious what the flag means.

Mir CI Bot (mir-ci-bot) wrote :

PASSED: Continuous integration, rev:4216
https://mir-jenkins.ubuntu.com/job/mir-ci/3511/
Executed test runs:
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-mir/4805
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/4992
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=artful/4981
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial/4981
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/4981
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=artful/4842
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=artful/4842/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/4842
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/4842/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=artful/4842
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=artful/4842/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial/4842
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial/4842/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/4842
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/4842/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=mesa,release=artful/4842
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=mesa,release=artful/4842/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=mesa,release=zesty/4842
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=mesa,release=zesty/4842/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial/4842
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial/4842/artifact/output/*zip*/output.zip

Click here to trigger a rebuild:
https://mir-jenkins.ubuntu.com/job/mir-ci/3511/rebuild

review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/client/buffer_vault.cpp'
2--- src/client/buffer_vault.cpp 2017-07-17 02:44:34 +0000
3+++ src/client/buffer_vault.cpp 2017-07-27 02:04:12 +0000
4@@ -80,24 +80,63 @@
5 mcl::BufferVault::~BufferVault()
6 {
7 buffer_factory->cancel_requests_with_context(this);
8+
9+ std::vector<std::shared_ptr<MirBuffer>> current_buffers;
10+
11 std::unique_lock<std::mutex> lk(mutex);
12+
13+ // Prevent callbacks from allocating new buffers
14+ being_destroyed = true;
15+
16 for (auto& it : buffers)
17- try
18+ if (auto map = surface_map.lock())
19 {
20- if (auto map = surface_map.lock())
21- {
22- if (auto buffer = map->buffer(it.first))
23- {
24- buffer->set_callback(ignore_buffer, nullptr);
25- }
26- }
27- if (!disconnected_)
28- {
29- free_buffer(it.first);
30+ if (auto buffer = map->buffer(it.first))
31+ {
32+ /*
33+ * Annoying wart:
34+ *
35+ * mir::AtomicCallback is atomic via locking, and we need it to be because
36+ * we rely on the guarantee that after set_callback() returns no thread
37+ * is in, or will enter, the previous callback.
38+ *
39+ * However!
40+ *
41+ * wire_transfer_inbound() is called from buffer->callback, and acquires
42+ * BufferVault::mutex.
43+ *
44+ * This destuctor must call buffer->set_callback(), and has acquired
45+ * BufferVault::mutex.
46+ *
47+ * So we've got a lock inversion.
48+ *
49+ * We can't unlock here, as wire_transfer_inbound() might mutate
50+ * BufferVault::buffers, which we're iterating over.
51+ *
52+ * So store them up to process later...
53+ */
54+ current_buffers.push_back(buffer);
55+
56+ /*
57+ * ...and erase them from the SurfaceMap.
58+ *
59+ * After this point, the RPC layer will ignore any messages about
60+ * this buffer.
61+ *
62+ * We don't need to explicitly ask the server to free it; it'll be
63+ * freed with the BufferStream
64+ */
65+ map->erase(it.first);
66 }
67 }
68- catch (...)
69+ lk.unlock();
70+
71+ /*
72+ * Now, ensure that no threads are in the existing buffer callbacks...
73+ */
74+ for (auto const& buffer : current_buffers)
75 {
76+ buffer->set_callback(ignore_buffer, nullptr);
77 }
78 }
79
80@@ -246,6 +285,14 @@
81 void mcl::BufferVault::wire_transfer_inbound(int buffer_id)
82 {
83 std::unique_lock<std::mutex> lk(mutex);
84+
85+ /*
86+ * The BufferVault is in the process of being destroyed; there's no point in
87+ * doing any processing and we should *certainly* not allocate any more buffers.
88+ */
89+ if (being_destroyed)
90+ return;
91+
92 last_received_id = buffer_id;
93 auto buffer = checked_buffer_from_map(buffer_id);
94 auto inbound_size = buffer->size();
95
96=== modified file 'src/client/buffer_vault.h'
97--- src/client/buffer_vault.h 2017-05-08 03:04:26 +0000
98+++ src/client/buffer_vault.h 2017-07-27 02:04:12 +0000
99@@ -96,6 +96,7 @@
100 int const usage;
101
102 std::mutex mutex;
103+ bool being_destroyed{false};
104 BufferMap buffers;
105 std::deque<NoTLSPromise<std::shared_ptr<MirBuffer>>> promises;
106 geometry::Size size;
107
108=== modified file 'tests/unit-tests/client/test_buffer_vault.cpp'
109--- tests/unit-tests/client/test_buffer_vault.cpp 2017-05-17 04:48:46 +0000
110+++ tests/unit-tests/client/test_buffer_vault.cpp 2017-07-27 02:04:12 +0000
111@@ -156,16 +156,6 @@
112 make_vault();
113 }
114
115-TEST_F(BufferVault, frees_the_buffers_we_actually_got)
116-{
117- EXPECT_CALL(mock_requests, free_buffer(package.buffer_id()));
118- EXPECT_CALL(mock_requests, free_buffer(package2.buffer_id()));
119- auto vault = make_vault();
120-
121- vault->wire_transfer_inbound(package.buffer_id());
122- vault->wire_transfer_inbound(package2.buffer_id());
123-}
124-
125 TEST_F(BufferVault, withdrawing_and_never_filling_up_will_timeout)
126 {
127 using namespace std::literals::chrono_literals;
128@@ -370,24 +360,6 @@
129 }, std::exception);
130 }
131
132-TEST_F(BufferVault, makes_sure_rpc_calls_exceptions_are_caught_in_destructor)
133-{
134- EXPECT_CALL(mock_requests, free_buffer(_))
135- .Times(1)
136- .WillOnce(Throw(std::runtime_error("")));
137- EXPECT_NO_THROW({
138- auto vault = make_vault();
139- vault->wire_transfer_inbound(package.buffer_id());
140- });
141-}
142-
143-TEST_F(StartedBufferVault, skips_free_buffer_rpc_calls_if_disconnected)
144-{
145- EXPECT_CALL(mock_requests, free_buffer(_))
146- .Times(0);
147- vault.disconnected();
148-}
149-
150 TEST_F(StartedBufferVault, buffer_count_remains_the_same_after_scaling)
151 {
152 std::array<mp::Buffer, 3> buffers;

Subscribers

People subscribed via source and target branches