Mir

Merge lp:~dandrader/mir/switchingBundle_lp1270964 into lp:mir

Proposed by Daniel d'Andrada
Status: Merged
Approved by: Daniel van Vugt
Approved revision: no longer in the source branch.
Merged at revision: 1350
Proposed branch: lp:~dandrader/mir/switchingBundle_lp1270964
Merge into: lp:mir
Diff against target: 88 lines (+34/-4)
3 files modified
src/server/compositor/switching_bundle.cpp (+1/-2)
src/server/compositor/switching_bundle.h (+3/-1)
tests/unit-tests/compositor/test_switching_bundle.cpp (+30/-1)
To merge this branch: bzr merge lp:~dandrader/mir/switchingBundle_lp1270964
Reviewer Review Type Date Requested Status
PS Jenkins bot (community) continuous-integration Approve
Daniel van Vugt Approve
Alan Griffiths Abstain
Andreas Pokorny (community) Approve
Alexandros Frantzis (community) Approve
Review via email: mp+202446@code.launchpad.net

This proposal supersedes a proposal from 2014-01-20.

Commit message

Only use SwitchingBundle::last_consumed after it has been set.

Otherwise SwitchingBundle::compositor_acquire could follow a bogus code path.
(LP: #1270964)

Description of the change

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote : Posted in a previous version of this proposal
review: Needs Fixing (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal

(1) The new test fails:
[ RUN ] SwitchingBundleTest.compositor_client_interleaved
/home/dan/bzr/mir/tmp.964/tests/unit-tests/compositor/test_switching_bundle.cpp:389: Failure
Value of: bundle.first_ready
  Actual: 1
Expected: 0
[ FAILED ] SwitchingBundleTest.compositor_client_interleaved (0 ms)
The same failure is occurring on desktop amd64 and armhf (Nexus 10).

(2) I'm undecided about this approach:
47 + friend class SwitchingBundleTest_compositor_client_interleaved_Test;
It's certainly a convenient solution for introspection. But that then allows the test to couple to internal members it shouldn't be coupled to.

And before anyone asks, this has nothing to do with bug 1270245 (which still happens with this branch).

review: Needs Fixing
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal

(2) Please remove "friend class ..." and all the ASSERT_EQ statements that depend on it. They're not needed to produce a correct regression test, as shown here:
  https://code.launchpad.net/~vanvugt/mir/test-1270964/+merge/202406
Feel free to merge that branch into this one.

review: Needs Fixing
Revision history for this message
Daniel d'Andrada (dandrader) wrote : Posted in a previous version of this proposal

> (1) The new test fails:
> [ RUN ] SwitchingBundleTest.compositor_client_interleaved
> /home/dan/bzr/mir/tmp.964/tests/unit-
> tests/compositor/test_switching_bundle.cpp:389: Failure
> Value of: bundle.first_ready
> Actual: 1
> Expected: 0
> [ FAILED ] SwitchingBundleTest.compositor_client_interleaved (0 ms)
> The same failure is occurring on desktop amd64 and armhf (Nexus 10).

Sorry, I rushed with that proposal right before my EOD.

Revision history for this message
Daniel d'Andrada (dandrader) wrote :

> (2) Please remove "friend class ..." and all the ASSERT_EQ statements that depend on it. They're not needed to produce a correct regression test

Sure, but I do think there's a lot of value in testing the internal state, as you can identify problems at an earlier stage before they manifest themselves externally, in this case, locking in a subsequent call.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

52 +/*
53 + Regression test for LP#1270964
54 + In the situation emulated here SwitchingBundle::last_consumed ise used without ever being set,
55 + thus its initial value of zero will wrongly be used and lead to bogus behavior.
56 + */
57 +TEST_F(SwitchingBundleTest, compositor_client_interleaved)

Would a better name be "does_not_hang_as_described_in_bug_1270964"?

71 + //(nbufs=3,fcomp=2,ncomp=1,fready=0,nready=1,fclient=1,nclient=1) <-- BUG starts here
72 + // Should be:
73 + //(nbufs=3,fcomp=0,ncomp=1,fready=0,nready=0,fclient=1,nclient=1)
...
77 + //(nbufs=3,fcomp=2,ncomp=1,fready=0,nready=2,fclient=2,nclient=0)
78 + // Should be:
79 + //(nbufs=3,fcomp=0,ncomp=1,fready=1,nready=1,fclient=2,nclient=0)

These comments are too cryptic for me.

81 + client_buffer = bundle.client_acquire(); // <- would lock here if in buggy state

I'm not a fan of tests that fail by hanging.

review: Needs Fixing
Revision history for this message
Daniel d'Andrada (dandrader) wrote :

> 71 + //(nbufs=3,fcomp=2,ncomp=1,fready=0,nready=1,fclient=1,nclient=1)
> <-- BUG starts here
> 72 + // Should be:
> 73 + //(nbufs=3,fcomp=0,ncomp=1,fready=0,nready=0,fclient=1,nclient=1)
> ...
> 77 + //(nbufs=3,fcomp=2,ncomp=1,fready=0,nready=2,fclient=2,nclient=0)
> 78 + // Should be:
> 79 + //(nbufs=3,fcomp=0,ncomp=1,fready=1,nready=1,fclient=2,nclient=0)
>
> These comments are too cryptic for me.

This is the output of operator<<, telling the internal state of SwitchingBundle.

>
> 81 + client_buffer = bundle.client_acquire(); // <- would lock here if in
> buggy state
>
> I'm not a fan of tests that fail by hanging.

Me neither, suggestions?

That's the external, "visible", symptom of this bug. The other option being checking the internal state of SwitchingBundle after each step to see if doing things correctly. But duflu doesn't like that.

I vote for internal state checks, as explained on a previous comment.

So, what do we do?
1 - test if it locks (current approach)
2 - check the internal state (doesn't hang when it fails as it detects the issue at earlier steps)
3 - Suggestions?

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> > I'm not a fan of tests that fail by hanging.
>
> Me neither, suggestions?
>
> That's the external, "visible", symptom of this bug. The other option being
> checking the internal state of SwitchingBundle after each step to see if doing
> things correctly. But duflu doesn't like that.
>
> I vote for internal state checks, as explained on a previous comment.
>
> So, what do we do?
> 1 - test if it locks (current approach)
> 2 - check the internal state (doesn't hang when it fails as it detects the
> issue at earlier steps)
> 3 - Suggestions?

Some (not necessarily good) suggestions...

Expose enough of the state to enable a check through the public interface. e.g.

   ASSERT_TRUE(bundle.is_valid());

Or check the result of streaming?

    std::ostringstream out;
    out << bundle;
    ASSERT_THAT(out.str(), HasSubstring(...));

Have a timer thread that breaks the lock.

Revision history for this message
Kevin DuBois (kdub) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

As stated yesterday:
"And before anyone asks, this has nothing to do with bug 1270245 (which still happens with this branch)."

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This solution is indeed technically correct. Although I would personally take a simpler approach and initialize last_consumed to 0xdeadbeef or something similar. Then the caller has much less (1 in 4 billion) chance of accidentally writing code than could trigger the bug. And no need for a new member.

Fortunately I just realized and remembered Mir presently has 0% chance of experiencing the bug. Because our real compositor starts at frame #1.

review: Approve
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> Fortunately I just realized and remembered Mir presently has 0% chance of
> experiencing the bug. Because our real compositor starts at frame #1.

Then a comment "use 1 for the first frameno" would appear to suffice".

Revision history for this message
Daniel d'Andrada (dandrader) wrote :

> Fortunately I just realized and remembered Mir presently has 0% chance of
> experiencing the bug. Because our real compositor starts at frame #1.

For what it's worth, I consistently get it with the prototype qml compositor.

Revision history for this message
Daniel d'Andrada (dandrader) wrote :

> > Fortunately I just realized and remembered Mir presently has 0% chance of
> > experiencing the bug. Because our real compositor starts at frame #1.
>
> Then a comment "use 1 for the first frameno" would appear to suffice".

As a workaround.

Revision history for this message
Daniel d'Andrada (dandrader) wrote :

> > Fortunately I just realized and remembered Mir presently has 0% chance of
> > experiencing the bug. Because our real compositor starts at frame #1.
>
> For what it's worth, I consistently get it with the prototype qml compositor.

Because I start with frameno 0, duh.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> > Fortunately I just realized and remembered Mir presently has 0% chance of
> > experiencing the bug. Because our real compositor starts at frame #1.
>
> For what it's worth, I consistently get it with the prototype qml compositor.

Then it could be argued that the prototype qml compositor is buggy - as we now realize know it should not start with frameno = 0.

I'm now tending to the view that 0 is a bad initial value (as it is otherwise reasonable for a compositor to start with this value) and that there ought to be better way to manage frame numbers (e.g. a generator class or just a constant "initial_frameno").

I don't like the proposed test (as discussed above) and the proposed solution feels complex for what is really a poor interface relying on an undocumented convention.

Revision history for this message
Daniel d'Andrada (dandrader) wrote :

Now the test doesn't lock on failure or probe SwitchingBundle's internal state. Should make everybody happy. :)

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

Since 0 is a valid value it may come up again when our global/local frameno counts wrap around. So we either:
1. Need to allow it (proposed solution)
2. Make the value invalid and ensure 100% that it never comes up during normal operation

I am OK with (1). It allows the user to start with any initial value they want and this is in IMO a better interface since less is expected from the user and there is no opportunity for error.

That being said, I would also be happy with something like the following for approach (2):

struct FrameNumber
{
    void operator++() { if (++number == invalid_value) ++number; }
    int operator int() { return number; }

    static int const invalid_value = 0;

private:
    int number = invalid_value + 1;
};

BufferBundle::compositor_acquire(FrameNumber frameno);

review: Approve
Revision history for this message
Andreas Pokorny (andreas-pokorny) wrote :

As a gradual improvement I would suggest an encapsulation of these two member variables into one:

i.e. using boost::optional<unsigned long> last_consuned;

But Alexandros proposal for (2) is nicer.

review: Approve
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

> Since 0 is a valid value it may come up again when our global/local frameno
> counts wrap around.

that doesn't matter if last_consumed has been written - it is only the first frame number used that can trigger the failure.

review: Abstain
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> > Since 0 is a valid value it may come up again when our global/local frameno
> > counts wrap around.
>
> that doesn't matter if last_consumed has been written - it is only the first
> frame number used that can trigger the failure.

The fine point here is that this is a potential problem for all new surfaces, since they may be created just after the count wraps around and therefore get 0 as their first frameno value.

Revision history for this message
Daniel d'Andrada (dandrader) wrote :

> As a gradual improvement I would suggest an encapsulation of these two member
> variables into one:
>
> i.e. using boost::optional<unsigned long> last_consuned;

I liked it! Done.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Kevin DuBois (kdub) wrote :

> As stated yesterday:
> "And before anyone asks, this has nothing to do with bug 1270245 (which still
> happens with this branch)."

must have missed the message, sorry

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I'm not sure that using boost is a better solution. Mostly because boost is so hideously documented that it's often very hard for the reader to learn what a particular template is/does. If using an additional variable lets you create a solution that the reader will understand immediately (no new library to learn), then that would have been better.

Now I kind of understand boost::optional, I think it's a dangerous template to use. Because assigning to an optional and testing its boolean value are essentially unrelated values. That's asking to be misinterpreted (as I have already today).

But this new version looks correct also.

review: Approve
Revision history for this message
PS Jenkins bot (ps-jenkins) :
review: Approve (continuous-integration)
Revision history for this message
Daniel d'Andrada (dandrader) wrote :

> [...] Mostly because boost is so hideously documented that it's often very hard for the reader to learn what a particular template is/does. [...]

Gotta agree with you on that.

Wasn't easy to learn boost::optional from its own documentation, which is more an essay on its motivation and design than an explanation on how to actually use it (the header file ended up doing the job).

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/server/compositor/switching_bundle.cpp'
2--- src/server/compositor/switching_bundle.cpp 2014-01-21 06:27:40 +0000
3+++ src/server/compositor/switching_bundle.cpp 2014-01-22 17:13:38 +0000
4@@ -76,7 +76,6 @@
5 first_ready{0}, nready{0},
6 first_client{0}, nclients{0},
7 snapshot{-1}, nsnapshotters{0},
8- last_consumed{0},
9 overlapping_compositors{false},
10 framedropping{false}, force_drop{0}
11 {
12@@ -273,7 +272,7 @@
13 int compositor;
14
15 // Multi-monitor acquires close to each other get the same frame:
16- bool same_frame = (frameno == last_consumed);
17+ bool same_frame = last_consumed && (frameno == *last_consumed);
18
19 int avail = nfree();
20 bool can_recycle = ncompositors || avail;
21
22=== modified file 'src/server/compositor/switching_bundle.h'
23--- src/server/compositor/switching_bundle.h 2014-01-13 06:12:33 +0000
24+++ src/server/compositor/switching_bundle.h 2014-01-22 17:13:38 +0000
25@@ -25,6 +25,8 @@
26 #include <mutex>
27 #include <memory>
28
29+#include <boost/optional/optional.hpp>
30+
31 namespace mir
32 {
33 namespace graphics
34@@ -98,7 +100,7 @@
35 mutable std::mutex guard;
36 std::condition_variable cond;
37
38- unsigned long last_consumed;
39+ boost::optional<unsigned long> last_consumed;
40
41 bool overlapping_compositors;
42
43
44=== modified file 'tests/unit-tests/compositor/test_switching_bundle.cpp'
45--- tests/unit-tests/compositor/test_switching_bundle.cpp 2014-01-13 06:12:33 +0000
46+++ tests/unit-tests/compositor/test_switching_bundle.cpp 2014-01-22 17:13:38 +0000
47@@ -329,6 +329,36 @@
48 }
49 }
50
51+/*
52+ Regression test for LP#1270964
53+ In the original bug, SwitchingBundle::last_consumed would be used without ever being set
54+ in the compositor_acquire() call, thus its initial value of zero would wrongly be used
55+ and lead to the wrong buffer being given to the compositor
56+ */
57+TEST_F(SwitchingBundleTest, compositor_client_interleaved)
58+{
59+ int nbuffers = 3;
60+ mc::SwitchingBundle bundle(nbuffers, allocator, basic_properties);
61+ mg::Buffer* client_buffer = nullptr;
62+ std::shared_ptr<mg::Buffer> compositor_buffer = nullptr;
63+
64+ client_buffer = bundle.client_acquire();
65+ mg::BufferID first_ready_buffer_id = client_buffer->id();
66+ bundle.client_release(client_buffer);
67+
68+ client_buffer = bundle.client_acquire();
69+
70+ // in the original bug, compositor would be given the wrong buffer here
71+ compositor_buffer = bundle.compositor_acquire(0 /*frameno*/);
72+
73+ ASSERT_EQ(first_ready_buffer_id, compositor_buffer->id());
74+
75+ // Clean up
76+ bundle.client_release(client_buffer);
77+ bundle.compositor_release(compositor_buffer);
78+ compositor_buffer.reset();
79+}
80+
81 TEST_F(SwitchingBundleTest, overlapping_compositors_get_different_frames)
82 {
83 // This test simulates bypass behaviour
84@@ -868,4 +898,3 @@
85 }
86 }
87 }
88-

Subscribers

People subscribed via source and target branches