Mir

Merge lp:~raof/mir/fix-threaded-dispatcher-death-test-race into lp:mir

Proposed by Chris Halse Rogers
Status: Merged
Approved by: Daniel van Vugt
Approved revision: no longer in the source branch.
Merged at revision: 3919
Proposed branch: lp:~raof/mir/fix-threaded-dispatcher-death-test-race
Merge into: lp:mir
Diff against target: 35 lines (+11/-6)
1 file modified
tests/unit-tests/dispatch/test_threaded_dispatcher.cpp (+11/-6)
To merge this branch: bzr merge lp:~raof/mir/fix-threaded-dispatcher-death-test-race
Reviewer Review Type Date Requested Status
Mir CI Bot continuous-integration Approve
Cemil Azizoglu (community) Approve
Brandon Schaefer (community) Approve
Alan Griffiths Approve
Daniel van Vugt Abstain
Review via email: mp+314304@code.launchpad.net

Commit message

Fix race in destroying_dispatcher_from_a_callback_is_an_error.

Fixes (another cause of): https://bugs.launchpad.net/mir/+bug/1647573

To post a comment you must log in.
Revision history for this message
Mir CI Bot (mir-ci-bot) wrote :

PASSED: Continuous integration, rev:3914
https://mir-jenkins.ubuntu.com/job/mir-ci/2568/
Executed test runs:
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-mir/3345
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/3412
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=vivid+overlay/3404
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial+overlay/3404
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=yakkety/3404
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=yakkety/3374
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=yakkety/3374/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3374
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3374/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=yakkety/3374
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=yakkety/3374/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3374
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3374/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3374
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3374/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3374
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3374/artifact/output/*zip*/output.zip

Click here to trigger a rebuild:
https://mir-jenkins.ubuntu.com/job/mir-ci/2568/rebuild

review: Approve (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

*shrug*

review: Approve
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Wait, doesn't this create a new race where the dispatcher may get leaked in some cases? I mean if there is a need for std::atomic then it would appear that implies a race where the pointer might be null and deletion does not occur. I don't understand how this fixes anything...

review: Abstain
Revision history for this message
Chris Halse Rogers (raof) wrote :

There are two differences here:

1) assign to dispatcher *before* triggering the TestDispatchable to delete it, and

2) make dispatcher an atomic pointer, because we're assigning it in the test thread but reading it (to delete it) in the ThreadedDispatcher thread.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I see no value in making 'dispatcher' atomic:

case 1: if there is an ordering between setting and reading it, then there is no need for atomic;
case 2: if there is no ordering then /even with atomic/ the delete can see the unset value and the test is broken.

AFAICS the move of 'trigger()' ensures we have case 1 and atomic<> is therefore unnecessary.

In fact atomic<> is confusing: As Daniel says it implies we have case 2 (when it ensures is that the value loaded is "safe").

review: Needs Fixing
Revision history for this message
Chris Halse Rogers (raof) wrote :

I don't think there *is* an ordering between setting and reading it:

By my understanding the following code is allowed to loop infinitely:

    bool stop{false};
    std::thread i_can_endless_loop{[&stop]() { while (!stop) ; }};

    stop = true;

because there's no memory barrier - the reads from stop in “while (!stop)” are neither before nor after the “stop = true” write.

However

    std::atomic<bool> stop{false};
    std::thread i_cannot_endless_loop{[&stop]() { while (!stop) ; }};

    stop = true;

because the atomic read and write operations provide an ordering.

This is similar to what's happening in the test - new ThreadedDispatcher is starting a thread, then assigning to dispatcher.

dispatchable->trigger() is *not* a memory barrier - it's a write to a file-descriptor. I think the std::atomic<md::ThreadedDispatcher*> is technically necessary.

Now, in practise I don't think this will actually be a problem, because:
1) I don't think any compiler is going to be smart enough to notice that ‘delete dispatcher’ does not happen-after a write to dispatcher, and so replace it with abort() (which would be standards-conforming)

and

2) All the platforms we care about provide cache-coherency, so the write to dispatcher will be seen by the ThreadedDispatcher thread.

Actually, now that I notice it, without std::atomic<> I don't think the compiler is even required to emit a store to dispatcher?

Revision history for this message
Chris Halse Rogers (raof) wrote :

So, in short: atomic is required because it *provides* the ordering between setting it and reading it.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I can't say for sure but think you're imagining atomic solves more racing problems than it does.

Atomics I think are implemented using spinlocks or similar. So mutual exclusion is guaranteed but they are not designed to solve the large scale ordering problems of thread synchronization (where those threads are not yet accessing memory so memory_order is irrelevant).

Roughly speaking I agree with Alan's assessment that either atomic is not required or the test is broken.

Revision history for this message
Chris Halse Rogers (raof) wrote :

Atomics (at least of CPU-word-sized values, which pointers are) are implemented using memory barriers. Mutual exclusion is *not* guaranteed. They guarantee two things: (1) you never observe torn writes (not interesting for us because pointers are word-sized) and (2) they provide a memory ordering.

This test only needs (2), a memory ordering.

Revision history for this message
Chris Halse Rogers (raof) wrote :

This code:

{
    md::ThreadedDispatcher* dispatcher;

    auto dispatchable = std::make_shared<mt::TestDispatchable>([&dispatcher]() { delete dispatcher; });

    dispatcher = new md::ThreadedDispatcher("Death thread", dispatchable);

    dispatchable->trigger();
}

invokes undefined behaviour. The read for “delete dispatcher” (which happens in the thread created in the ThreadedDispatcher) conflicts with the write to dispatcher (after ThreadedDispatcher's constructor has run).

The read for “delete dispatcher” is triggered by dispatchable->trigger(), but dispatchable->trigger() is *not* a synchronisation point - there are no mutexes or memory barriers involved.

std::atomic<> is exactly the tool required to solve this. (I guess we could also use std::atomic_thread_fence, but that's essentially a larger hammer for the same thing).

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> std::atomic<> is exactly the tool required to solve this.

While atomic<> does guarantee there is /an/ ordering between load() and store() it does not give the ordering guarantee "store() is order-before load()" needed by the test. Specifically "store() is order-after load()" is an equally valid behaviour.

As I see it the only questionable bit is: "write(write_fd) is order-before poll(&waiter...) returning" (as FDs are not part of the C++ memory model).

If we have that guarantee, then atomic<> is unnecessary, if we don't atomic<> doesn't provide it.

Revision history for this message
Chris Halse Rogers (raof) wrote :

Store is sequence-before write(fd), and poll will return after (but not necessarily order-after) write(fd), guaranteeing that the store() is order-before load().

So I *think* the std::atomic bridges between the runtime instruction ordering we know this to have and the C++ memory model.

(On consideration, I'm not sure whether the compiler is free to reorder the store and the call to write(fd). But in the absence of a memory barrier the CPU is free to grab the pre-store value from memory. We could try this on Andreas' multi-socket box!)

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> Store is sequence-before write(fd), and poll will return after (but not
> necessarily order-after) write(fd), guaranteeing that the store() is order-
> before load().
>
> So I *think* the std::atomic bridges between the runtime instruction ordering
> we know this to have and the C++ memory model.

I can't find adequate wording either.

I guess we've proved that using atomic<> isn't the cause of confusion - and that's the basis on which I voted "NF". If we see CI failures because load() fetches the pre-store value we can revisit.

review: Approve
Revision history for this message
Brandon Schaefer (brandontschaefer) wrote :

lgtm

review: Approve
Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

> I guess we've proved that using atomic<> isn't the cause of confusion - and
> that's the basis on which I voted "NF". If we see CI failures because load()
> fetches the pre-store value we can revisit.

Sounds like a plan.

review: Approve
Revision history for this message
Mir CI Bot (mir-ci-bot) wrote :

FAILED: Autolanding.
More details in the following jenkins job:
https://mir-jenkins.ubuntu.com/job/mir-autolanding/923/
Executed test runs:
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-mir/3399/console
    None: https://mir-jenkins.ubuntu.com/job/generic-land-mp/972/console
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/3466
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=vivid+overlay/3458
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial+overlay/3458
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/3458
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3428
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3428/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3428
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3428/artifact/output/*zip*/output.zip
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3428/console
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3428
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3428/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3428
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3428/artifact/output/*zip*/output.zip
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3428/console

review: Needs Fixing (continuous-integration)
Revision history for this message
Chris Halse Rogers (raof) wrote :

Looks like CI lost network:
Caused by: java.io.IOException: No route to host

Retriggering.

Revision history for this message
Mir CI Bot (mir-ci-bot) wrote :

FAILED: Autolanding.
More details in the following jenkins job:
https://mir-jenkins.ubuntu.com/job/mir-autolanding/926/
Executed test runs:
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-mir/3409/console
    None: https://mir-jenkins.ubuntu.com/job/generic-land-mp/975/console
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-0-fetch/3476
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=vivid+overlay/3468
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=xenial+overlay/3468
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-1-sourcepkg/release=zesty/3468
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3438
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=clang,platform=mesa,release=zesty/3438/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3438
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=xenial+overlay/3438/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3438
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=amd64,compiler=gcc,platform=mesa,release=zesty/3438/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3438
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=cross-armhf,compiler=gcc,platform=android,release=vivid+overlay/3438/artifact/output/*zip*/output.zip
    SUCCESS: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3438
        deb: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=android,release=vivid+overlay/3438/artifact/output/*zip*/output.zip
    FAILURE: https://mir-jenkins.ubuntu.com/job/build-2-binpkg-mir/arch=i386,compiler=gcc,platform=mesa,release=xenial+overlay/3438/console

review: Needs Fixing (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Mir CI Bot (mir-ci-bot) :
review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'tests/unit-tests/dispatch/test_threaded_dispatcher.cpp'
--- tests/unit-tests/dispatch/test_threaded_dispatcher.cpp 2016-08-26 09:24:42 +0000
+++ tests/unit-tests/dispatch/test_threaded_dispatcher.cpp 2017-01-09 02:59:00 +0000
@@ -259,7 +259,6 @@
259 using namespace testing;259 using namespace testing;
260 using namespace std::chrono_literals;260 using namespace std::chrono_literals;
261261
262 FLAGS_gtest_death_test_style = "threadsafe";
263 constexpr char const* exception_msg = "Ducks! Ducks attack!";262 constexpr char const* exception_msg = "Ducks! Ducks attack!";
264263
265 auto dispatchable = std::make_shared<mt::TestDispatchable>([]()264 auto dispatchable = std::make_shared<mt::TestDispatchable>([]()
@@ -406,12 +405,18 @@
406405
407 MIR_EXPECT_EXIT(406 MIR_EXPECT_EXIT(
408 {407 {
409 md::ThreadedDispatcher* dispatcher;408 std::atomic<md::ThreadedDispatcher*> dispatcher;
410409
411 auto dispatchable = std::make_shared<mt::TestDispatchable>([&dispatcher]() { delete dispatcher; });410 auto dispatchable = std::make_shared<mt::TestDispatchable>(
412411 [&dispatcher]()
413 dispatchable->trigger();412 {
413 delete dispatcher.load();
414 });
415
414 dispatcher = new md::ThreadedDispatcher("Death thread", dispatchable);416 dispatcher = new md::ThreadedDispatcher("Death thread", dispatchable);
417
418 dispatchable->trigger();
419
415 std::this_thread::sleep_for(10s);420 std::this_thread::sleep_for(10s);
416 }, KilledBySignal(SIGABRT), ".*Destroying ThreadedDispatcher.*");421 }, KilledBySignal(SIGABRT), ".*Destroying ThreadedDispatcher.*");
417}422}

Subscribers

People subscribed via source and target branches