Mir

Merge lp:~alan-griffiths/mir/fix-1334287 into lp:mir

Proposed by Alan Griffiths
Status: Merged
Approved by: Alan Griffiths
Approved revision: no longer in the source branch.
Merged at revision: 1725
Proposed branch: lp:~alan-griffiths/mir/fix-1334287
Merge into: lp:mir
Diff against target: 122 lines (+41/-30)
2 files modified
src/server/asio_main_loop.cpp (+33/-22)
src/server/compositor/timeout_frame_dropping_policy_factory.cpp (+8/-8)
To merge this branch: bzr merge lp:~alan-griffiths/mir/fix-1334287
Reviewer Review Type Date Requested Status
PS Jenkins bot (community) continuous-integration Approve
Kevin DuBois (community) Approve
Andreas Pokorny (community) Approve
Cemil Azizoglu (community) Approve
Review via email: mp+224457@code.launchpad.net

Commit message

server: Avoid race condition in AlarmImpl

Description of the change

Avoid race condition in AlarmImpl

The problem in lp:1334287 is that AlarmImpl doesn't meet this guarantee.

    /**
     * \note Destruction of the Alarm guarantees that the callback will not subsequently be called
     */
    virtual ~Alarm() = default;

So, IIUC AlarmImpl::update_timer()'s lambda grabs a "data" pointer fiddles about and *then releases its lock*.

Then (on some other thread) the TimeoutFrameDroppingPolicy is deleted which destroys the Alarm/AlarmImpl which calls cancel() and sets data->state to "cancelled".

Back on the original thread and regardless of ~AlarmImpl() and AlarmImpl::cancel() having been called, data->callback() is invoked and chaos ensues.

To post a comment you must log in.
Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

Looks good.

review: Approve
Revision history for this message
Andreas Pokorny (andreas-pokorny) wrote :

Unless I am missing something TimeoutFramedroppingPolicy reschedules the alarm (which tries to lock too) while handling the timer callback.

see timeout_frame_dropping_policy_factory.cpp:59

Maybe we have to switch to a recursive mutex as a bandaid?

I try to distill a small solution from the split main loop MP that works without a mutex.. but still trying.

review: Needs Fixing
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> Unless I am missing something TimeoutFramedroppingPolicy reschedules the alarm
> (which tries to lock too) while handling the timer callback.
>
> see timeout_frame_dropping_policy_factory.cpp:59

Ack. But that raises the question of why no test hangs as a result.

> Maybe we have to switch to a recursive mutex as a bandaid?

Or separate locks for state that needs to be locked separately.

> I try to distill a small solution from the split main loop MP that works
> without a mutex.. but still trying.

I'm trying to imagine how you can hope to meet the "can't be called after" guarantee without a lock around the call. Or do you plan to relax that guarantee?

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Switched to a recursive mutex to allow calls into the alarm from the callback.

Revision history for this message
Andreas Pokorny (andreas-pokorny) wrote :

hm maybe because the reconfiguration attempt is conditional inside the frame dropping policy it did not trigger inside the tests..

review: Approve
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Kevin DuBois (kdub) wrote :

okay

review: Approve
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> Build timed out (after 240 minutes). Marking the build as failed.
> java.lang.InterruptedException
> Build step 'Debian PBuilder NG' marked build as failure
> ERROR:pbuilderjenkins:Error during build execution

Is it my imagination or are we getting more strange timeouts. Is it CI or us?

Re-approving...

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> hm maybe because the reconfiguration attempt is conditional inside the frame
> dropping policy it did not trigger inside the tests..

Which implies the code wasn't written to pass a test. :(

Revision history for this message
PS Jenkins bot (ps-jenkins) :
review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/server/asio_main_loop.cpp'
2--- src/server/asio_main_loop.cpp 2014-06-24 10:22:31 +0000
3+++ src/server/asio_main_loop.cpp 2014-06-25 16:35:46 +0000
4@@ -344,15 +344,42 @@
5 {
6 }
7
8- mutable std::mutex m;
9- std::function<void(void)> callback;
10+ std::recursive_mutex m;
11+ std::function<void(void)> const callback;
12 State state;
13 };
14
15 ::deadline_timer timer;
16- std::shared_ptr<InternalState> data;
17+ std::shared_ptr<InternalState> const data;
18+ std::function<void(boost::system::error_code const& ec)> const handler;
19+
20+ friend auto make_handler(std::weak_ptr<InternalState> possible_data)
21+ -> std::function<void(boost::system::error_code const& ec)>;
22 };
23
24+auto make_handler(std::weak_ptr<AlarmImpl::InternalState> possible_data)
25+-> std::function<void(boost::system::error_code const& ec)>
26+{
27+ // Awkwardly, we can't stop the async_wait handler from being called
28+ // on a destroyed AlarmImpl. This means we need to wedge a weak_ptr
29+ // into the async_wait callback.
30+ return [possible_data](boost::system::error_code const& ec)
31+ {
32+ if (!ec)
33+ {
34+ if (auto data = possible_data.lock())
35+ {
36+ std::unique_lock<decltype(data->m)> lock(data->m);
37+ if (data->state == mir::time::Alarm::pending)
38+ {
39+ data->state = mir::time::Alarm::triggered;
40+ data->callback();
41+ }
42+ }
43+ }
44+ };
45+}
46+
47 AlarmImpl::AlarmImpl(boost::asio::io_service& io,
48 std::chrono::milliseconds delay,
49 std::function<void ()> callback)
50@@ -372,7 +399,8 @@
51 AlarmImpl::AlarmImpl(boost::asio::io_service& io,
52 std::function<void(void)> callback)
53 : timer{io},
54- data{std::make_shared<InternalState>(callback)}
55+ data{std::make_shared<InternalState>(callback)},
56+ handler{make_handler(data)}
57 {
58 data->state = triggered;
59 }
60@@ -416,25 +444,8 @@
61
62 void AlarmImpl::update_timer()
63 {
64+ timer.async_wait(handler);
65 std::lock_guard<decltype(data->m)> lock(data->m);
66- // Awkwardly, we can't stop the async_wait handler from being called
67- // on a destroyed AlarmImpl. This means we need to wedge a shared_ptr
68- // into the async_wait callback.
69- std::weak_ptr<InternalState> possible_data = data;
70- timer.async_wait([possible_data](boost::system::error_code const& ec)
71- {
72- auto data = possible_data.lock();
73- if (!data)
74- return;
75-
76- std::unique_lock<decltype(data->m)> lock(data->m);
77- if (!ec && data->state == pending)
78- {
79- data->state = triggered;
80- lock.unlock();
81- data->callback();
82- }
83- });
84 data->state = pending;
85 }
86 }
87
88=== modified file 'src/server/compositor/timeout_frame_dropping_policy_factory.cpp'
89--- src/server/compositor/timeout_frame_dropping_policy_factory.cpp 2014-05-22 20:48:20 +0000
90+++ src/server/compositor/timeout_frame_dropping_policy_factory.cpp 2014-06-25 16:35:46 +0000
91@@ -41,7 +41,7 @@
92
93 private:
94 std::chrono::milliseconds const timeout;
95- std::unique_ptr<mir::time::Alarm> alarm;
96+ std::unique_ptr<mir::time::Alarm> const alarm;
97 std::atomic<unsigned int> pending_swaps;
98 };
99
100@@ -49,15 +49,15 @@
101 std::chrono::milliseconds timeout,
102 std::function<void(void)> drop_frame)
103 : timeout{timeout},
104+ alarm{timer->create_alarm([this, drop_frame]
105+ {
106+ assert(pending_swaps.load() > 0);
107+ drop_frame();
108+ if (--pending_swaps > 0)
109+ alarm->reschedule_in(this->timeout);
110+ })},
111 pending_swaps{0}
112 {
113- alarm = timer->create_alarm([this, drop_frame]
114- {
115- assert(pending_swaps.load() > 0);
116- drop_frame();
117- if (--pending_swaps > 0)
118- alarm->reschedule_in(this->timeout);
119- });
120 }
121
122 void TimeoutFrameDroppingPolicy::swap_now_blocking()

Subscribers

People subscribed via source and target branches