Merge lp:~dandrader/mir/switchingBundle_lp1270964 into lp:mir
- switchingBundle_lp1270964
- Merge into development-branch
Status: | Merged |
---|---|
Approved by: | Daniel van Vugt |
Approved revision: | no longer in the source branch. |
Merged at revision: | 1350 |
Proposed branch: | lp:~dandrader/mir/switchingBundle_lp1270964 |
Merge into: | lp:mir |
Diff against target: |
88 lines (+34/-4) 3 files modified
src/server/compositor/switching_bundle.cpp (+1/-2) src/server/compositor/switching_bundle.h (+3/-1) tests/unit-tests/compositor/test_switching_bundle.cpp (+30/-1) |
To merge this branch: | bzr merge lp:~dandrader/mir/switchingBundle_lp1270964 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
PS Jenkins bot (community) | continuous-integration | Approve | |
Daniel van Vugt | Approve | ||
Alan Griffiths | Abstain | ||
Andreas Pokorny (community) | Approve | ||
Alexandros Frantzis (community) | Approve | ||
Review via email: mp+202446@code.launchpad.net |
This proposal supersedes a proposal from 2014-01-20.
Commit message
Only use SwitchingBundle
Otherwise SwitchingBundle
(LP: #1270964)
Description of the change
Fixes bug 1270964.
PS Jenkins bot (ps-jenkins) wrote : Posted in a previous version of this proposal | # |
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal | # |
(1) The new test fails:
[ RUN ] SwitchingBundle
/home/dan/
Value of: bundle.first_ready
Actual: 1
Expected: 0
[ FAILED ] SwitchingBundle
The same failure is occurring on desktop amd64 and armhf (Nexus 10).
(2) I'm undecided about this approach:
47 + friend class SwitchingBundle
It's certainly a convenient solution for introspection. But that then allows the test to couple to internal members it shouldn't be coupled to.
And before anyone asks, this has nothing to do with bug 1270245 (which still happens with this branch).
Daniel van Vugt (vanvugt) wrote : Posted in a previous version of this proposal | # |
(2) Please remove "friend class ..." and all the ASSERT_EQ statements that depend on it. They're not needed to produce a correct regression test, as shown here:
https:/
Feel free to merge that branch into this one.
Daniel d'Andrada (dandrader) wrote : Posted in a previous version of this proposal | # |
> (1) The new test fails:
> [ RUN ] SwitchingBundle
> /home/dan/
> tests/composito
> Value of: bundle.first_ready
> Actual: 1
> Expected: 0
> [ FAILED ] SwitchingBundle
> The same failure is occurring on desktop amd64 and armhf (Nexus 10).
Sorry, I rushed with that proposal right before my EOD.
Daniel d'Andrada (dandrader) wrote : | # |
> (2) Please remove "friend class ..." and all the ASSERT_EQ statements that depend on it. They're not needed to produce a correct regression test
Sure, but I do think there's a lot of value in testing the internal state, as you can identify problems at an earlier stage before they manifest themselves externally, in this case, locking in a subsequent call.
PS Jenkins bot (ps-jenkins) wrote : | # |
PASSED: Continuous integration, rev:1342
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
SUCCESS: http://
Click here to trigger a rebuild:
http://
Alan Griffiths (alan-griffiths) wrote : | # |
52 +/*
53 + Regression test for LP#1270964
54 + In the situation emulated here SwitchingBundle
55 + thus its initial value of zero will wrongly be used and lead to bogus behavior.
56 + */
57 +TEST_F(
Would a better name be "does_not_
71 + //(nbufs=
72 + // Should be:
73 + //(nbufs=
...
77 + //(nbufs=
78 + // Should be:
79 + //(nbufs=
These comments are too cryptic for me.
81 + client_buffer = bundle.
I'm not a fan of tests that fail by hanging.
Daniel d'Andrada (dandrader) wrote : | # |
> 71 + //(nbufs=
> <-- BUG starts here
> 72 + // Should be:
> 73 + //(nbufs=
> ...
> 77 + //(nbufs=
> 78 + // Should be:
> 79 + //(nbufs=
>
> These comments are too cryptic for me.
This is the output of operator<<, telling the internal state of SwitchingBundle.
>
> 81 + client_buffer = bundle.
> buggy state
>
> I'm not a fan of tests that fail by hanging.
Me neither, suggestions?
That's the external, "visible", symptom of this bug. The other option being checking the internal state of SwitchingBundle after each step to see if doing things correctly. But duflu doesn't like that.
I vote for internal state checks, as explained on a previous comment.
So, what do we do?
1 - test if it locks (current approach)
2 - check the internal state (doesn't hang when it fails as it detects the issue at earlier steps)
3 - Suggestions?
Alan Griffiths (alan-griffiths) wrote : | # |
> > I'm not a fan of tests that fail by hanging.
>
> Me neither, suggestions?
>
> That's the external, "visible", symptom of this bug. The other option being
> checking the internal state of SwitchingBundle after each step to see if doing
> things correctly. But duflu doesn't like that.
>
> I vote for internal state checks, as explained on a previous comment.
>
> So, what do we do?
> 1 - test if it locks (current approach)
> 2 - check the internal state (doesn't hang when it fails as it detects the
> issue at earlier steps)
> 3 - Suggestions?
Some (not necessarily good) suggestions...
Expose enough of the state to enable a check through the public interface. e.g.
ASSERT_
Or check the result of streaming?
std:
out << bundle;
ASSERT_
Have a timer thread that breaks the lock.
Kevin DuBois (kdub) wrote : | # |
could this be related to https:/
Daniel van Vugt (vanvugt) wrote : | # |
As stated yesterday:
"And before anyone asks, this has nothing to do with bug 1270245 (which still happens with this branch)."
Daniel van Vugt (vanvugt) wrote : | # |
This solution is indeed technically correct. Although I would personally take a simpler approach and initialize last_consumed to 0xdeadbeef or something similar. Then the caller has much less (1 in 4 billion) chance of accidentally writing code than could trigger the bug. And no need for a new member.
Fortunately I just realized and remembered Mir presently has 0% chance of experiencing the bug. Because our real compositor starts at frame #1.
Alan Griffiths (alan-griffiths) wrote : | # |
> Fortunately I just realized and remembered Mir presently has 0% chance of
> experiencing the bug. Because our real compositor starts at frame #1.
Then a comment "use 1 for the first frameno" would appear to suffice".
Daniel d'Andrada (dandrader) wrote : | # |
> Fortunately I just realized and remembered Mir presently has 0% chance of
> experiencing the bug. Because our real compositor starts at frame #1.
For what it's worth, I consistently get it with the prototype qml compositor.
Daniel d'Andrada (dandrader) wrote : | # |
> > Fortunately I just realized and remembered Mir presently has 0% chance of
> > experiencing the bug. Because our real compositor starts at frame #1.
>
> Then a comment "use 1 for the first frameno" would appear to suffice".
As a workaround.
Daniel d'Andrada (dandrader) wrote : | # |
> > Fortunately I just realized and remembered Mir presently has 0% chance of
> > experiencing the bug. Because our real compositor starts at frame #1.
>
> For what it's worth, I consistently get it with the prototype qml compositor.
Because I start with frameno 0, duh.
Alan Griffiths (alan-griffiths) wrote : | # |
> > Fortunately I just realized and remembered Mir presently has 0% chance of
> > experiencing the bug. Because our real compositor starts at frame #1.
>
> For what it's worth, I consistently get it with the prototype qml compositor.
Then it could be argued that the prototype qml compositor is buggy - as we now realize know it should not start with frameno = 0.
I'm now tending to the view that 0 is a bad initial value (as it is otherwise reasonable for a compositor to start with this value) and that there ought to be better way to manage frame numbers (e.g. a generator class or just a constant "initial_frameno").
I don't like the proposed test (as discussed above) and the proposed solution feels complex for what is really a poor interface relying on an undocumented convention.
Daniel d'Andrada (dandrader) wrote : | # |
Now the test doesn't lock on failure or probe SwitchingBundle's internal state. Should make everybody happy. :)
Alexandros Frantzis (afrantzis) wrote : | # |
Since 0 is a valid value it may come up again when our global/local frameno counts wrap around. So we either:
1. Need to allow it (proposed solution)
2. Make the value invalid and ensure 100% that it never comes up during normal operation
I am OK with (1). It allows the user to start with any initial value they want and this is in IMO a better interface since less is expected from the user and there is no opportunity for error.
That being said, I would also be happy with something like the following for approach (2):
struct FrameNumber
{
void operator++() { if (++number == invalid_value) ++number; }
int operator int() { return number; }
static int const invalid_value = 0;
private:
int number = invalid_value + 1;
};
BufferBundle:
Andreas Pokorny (andreas-pokorny) wrote : | # |
As a gradual improvement I would suggest an encapsulation of these two member variables into one:
i.e. using boost::
But Alexandros proposal for (2) is nicer.
Alan Griffiths (alan-griffiths) wrote : | # |
> Since 0 is a valid value it may come up again when our global/local frameno
> counts wrap around.
that doesn't matter if last_consumed has been written - it is only the first frame number used that can trigger the failure.
PS Jenkins bot (ps-jenkins) wrote : | # |
PASSED: Continuous integration, rev:1343
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
SUCCESS: http://
Click here to trigger a rebuild:
http://
Alexandros Frantzis (afrantzis) wrote : | # |
> > Since 0 is a valid value it may come up again when our global/local frameno
> > counts wrap around.
>
> that doesn't matter if last_consumed has been written - it is only the first
> frame number used that can trigger the failure.
The fine point here is that this is a potential problem for all new surfaces, since they may be created just after the count wraps around and therefore get 0 as their first frameno value.
Daniel d'Andrada (dandrader) wrote : | # |
> As a gradual improvement I would suggest an encapsulation of these two member
> variables into one:
>
> i.e. using boost::
I liked it! Done.
PS Jenkins bot (ps-jenkins) wrote : | # |
PASSED: Continuous integration, rev:1344
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
deb: http://
SUCCESS: http://
SUCCESS: http://
Click here to trigger a rebuild:
http://
Kevin DuBois (kdub) wrote : | # |
> As stated yesterday:
> "And before anyone asks, this has nothing to do with bug 1270245 (which still
> happens with this branch)."
must have missed the message, sorry
Daniel van Vugt (vanvugt) wrote : | # |
I'm not sure that using boost is a better solution. Mostly because boost is so hideously documented that it's often very hard for the reader to learn what a particular template is/does. If using an additional variable lets you create a solution that the reader will understand immediately (no new library to learn), then that would have been better.
Now I kind of understand boost::optional, I think it's a dangerous template to use. Because assigning to an optional and testing its boolean value are essentially unrelated values. That's asking to be misinterpreted (as I have already today).
But this new version looks correct also.
PS Jenkins bot (ps-jenkins) : | # |
Daniel d'Andrada (dandrader) wrote : | # |
> [...] Mostly because boost is so hideously documented that it's often very hard for the reader to learn what a particular template is/does. [...]
Gotta agree with you on that.
Wasn't easy to learn boost::optional from its own documentation, which is more an essay on its motivation and design than an explanation on how to actually use it (the header file ended up doing the job).
Preview Diff
1 | === modified file 'src/server/compositor/switching_bundle.cpp' |
2 | --- src/server/compositor/switching_bundle.cpp 2014-01-21 06:27:40 +0000 |
3 | +++ src/server/compositor/switching_bundle.cpp 2014-01-22 17:13:38 +0000 |
4 | @@ -76,7 +76,6 @@ |
5 | first_ready{0}, nready{0}, |
6 | first_client{0}, nclients{0}, |
7 | snapshot{-1}, nsnapshotters{0}, |
8 | - last_consumed{0}, |
9 | overlapping_compositors{false}, |
10 | framedropping{false}, force_drop{0} |
11 | { |
12 | @@ -273,7 +272,7 @@ |
13 | int compositor; |
14 | |
15 | // Multi-monitor acquires close to each other get the same frame: |
16 | - bool same_frame = (frameno == last_consumed); |
17 | + bool same_frame = last_consumed && (frameno == *last_consumed); |
18 | |
19 | int avail = nfree(); |
20 | bool can_recycle = ncompositors || avail; |
21 | |
22 | === modified file 'src/server/compositor/switching_bundle.h' |
23 | --- src/server/compositor/switching_bundle.h 2014-01-13 06:12:33 +0000 |
24 | +++ src/server/compositor/switching_bundle.h 2014-01-22 17:13:38 +0000 |
25 | @@ -25,6 +25,8 @@ |
26 | #include <mutex> |
27 | #include <memory> |
28 | |
29 | +#include <boost/optional/optional.hpp> |
30 | + |
31 | namespace mir |
32 | { |
33 | namespace graphics |
34 | @@ -98,7 +100,7 @@ |
35 | mutable std::mutex guard; |
36 | std::condition_variable cond; |
37 | |
38 | - unsigned long last_consumed; |
39 | + boost::optional<unsigned long> last_consumed; |
40 | |
41 | bool overlapping_compositors; |
42 | |
43 | |
44 | === modified file 'tests/unit-tests/compositor/test_switching_bundle.cpp' |
45 | --- tests/unit-tests/compositor/test_switching_bundle.cpp 2014-01-13 06:12:33 +0000 |
46 | +++ tests/unit-tests/compositor/test_switching_bundle.cpp 2014-01-22 17:13:38 +0000 |
47 | @@ -329,6 +329,36 @@ |
48 | } |
49 | } |
50 | |
51 | +/* |
52 | + Regression test for LP#1270964 |
53 | + In the original bug, SwitchingBundle::last_consumed would be used without ever being set |
54 | + in the compositor_acquire() call, thus its initial value of zero would wrongly be used |
55 | + and lead to the wrong buffer being given to the compositor |
56 | + */ |
57 | +TEST_F(SwitchingBundleTest, compositor_client_interleaved) |
58 | +{ |
59 | + int nbuffers = 3; |
60 | + mc::SwitchingBundle bundle(nbuffers, allocator, basic_properties); |
61 | + mg::Buffer* client_buffer = nullptr; |
62 | + std::shared_ptr<mg::Buffer> compositor_buffer = nullptr; |
63 | + |
64 | + client_buffer = bundle.client_acquire(); |
65 | + mg::BufferID first_ready_buffer_id = client_buffer->id(); |
66 | + bundle.client_release(client_buffer); |
67 | + |
68 | + client_buffer = bundle.client_acquire(); |
69 | + |
70 | + // in the original bug, compositor would be given the wrong buffer here |
71 | + compositor_buffer = bundle.compositor_acquire(0 /*frameno*/); |
72 | + |
73 | + ASSERT_EQ(first_ready_buffer_id, compositor_buffer->id()); |
74 | + |
75 | + // Clean up |
76 | + bundle.client_release(client_buffer); |
77 | + bundle.compositor_release(compositor_buffer); |
78 | + compositor_buffer.reset(); |
79 | +} |
80 | + |
81 | TEST_F(SwitchingBundleTest, overlapping_compositors_get_different_frames) |
82 | { |
83 | // This test simulates bypass behaviour |
84 | @@ -868,4 +898,3 @@ |
85 | } |
86 | } |
87 | } |
88 | - |
FAILED: Continuous integration, rev:1342 jenkins. qa.ubuntu. com/job/ mir-team- mir-development -branch- ci/689/ jenkins. qa.ubuntu. com/job/ mir-android- trusty- i386-build/ 677 jenkins. qa.ubuntu. com/job/ mir-clang- trusty- amd64-build/ 673 jenkins. qa.ubuntu. com/job/ mir-mediumtests -trusty- touch/279/ console jenkins. qa.ubuntu. com/job/ mir-team- mir-development -branch- trusty- amd64-ci/ 419/console jenkins. qa.ubuntu. com/job/ mir-team- mir-development -branch- trusty- armhf-ci/ 423/console jenkins. qa.ubuntu. com/job/ mir-mediumtests -builder- trusty- armhf/279/ console
http://
Executed test runs:
SUCCESS: http://
SUCCESS: http://
FAILURE: http://
FAILURE: http://
FAILURE: http://
FAILURE: http://
Click here to trigger a rebuild: s-jenkins. ubuntu- ci:8080/ job/mir- team-mir- development- branch- ci/689/ rebuild
http://