Mir

Merge lp:~vanvugt/mir/ddouble into lp:mir

Proposed by Daniel van Vugt
Status: Merged
Approved by: Daniel van Vugt
Approved revision: 2161
Merged at revision: 2673
Proposed branch: lp:~vanvugt/mir/ddouble
Merge into: lp:mir
Diff against target: 486 lines (+277/-45)
3 files modified
src/server/compositor/buffer_queue.cpp (+96/-19)
src/server/compositor/buffer_queue.h (+14/-0)
tests/unit-tests/compositor/test_buffer_queue.cpp (+167/-26)
To merge this branch: bzr merge lp:~vanvugt/mir/ddouble
Reviewer Review Type Date Requested Status
Cemil Azizoglu (community) Approve
Alexandros Frantzis (community) Approve
Alberto Aguirre (community) Abstain
Kevin DuBois (community) Abstain
PS Jenkins bot (community) continuous-integration Approve
Alan Griffiths Abstain
Review via email: mp+258351@code.launchpad.net

Commit message

Introducing dynamic buffer queue scaling to minimize lag.
(LP: #1240909)

Actually the size of the queue doesn't really change with this branch.
What it does do is reliably detect when a client is staying ahead of
the compositor and throttle it so that no more than one future frame
is ever pre-rendered. Hence you get the same single-frame latency as
double buffering. But the moment the client fails to keep up, it will
get to use all three buffers again.

Description of the change

Background reading and imagery to help you understand the improvement:
https://docs.google.com/document/d/116i4TC0rls4wKFmbaRrHL_UT_Jg22XI8YqpXGLLTVCc/edit#

Fun facts:

* For manual testing you can see the switching in action if you set MIR_CLIENT_PERF_REPORT=log and then resize mir_demo_client_flicker to large enough that it takes too long to render (>16ms). It will get triple buffers then. If you shrink it, the client can render quickly again and reports double buffers.

* Strange and wonderful things happen with input resampling. Event-driven clients that use the default 59Hz resampling rate (e.g. mir_demo_client_eglsquare) will generally not be detected as "keeping up" and so will always be on triple buffers. But surprisingly this is a good thing -- the low input rate means the compositor is always consuming buffers faster than we're producing and you get lag around one frame as the consumer eats through the queue up to where the producer is (so the queue length is almost irrelevant). Other clients that reliably produce at full frame rate or higher however will get switched to double buffers (one frame lag).

* Additional manual testing has been done on a couple of phones (I backported this branch). Everything remains smooth. Unity8 is fast enough on some devices to qualify for double buffering. Unity8-dash is generally not fast enough on any device to qualify. You can see the stats on a live phone with:
    restart unity8 MIR_CLIENT_PERF_REPORT=log
    tail -f ~/.cache/upstart/unity8.log

* In raw numbers this branch at least reduces end-to-end lag on the phone from 5 frames to 4 (when unity8 qualifies for double buffering). And if your app (not unity8-dash) is fast enough to keep up too, it's then only 3 frames lag. So this branch reduces Unity8/USC total graphics latency by 20-40%, but more consistent results closer to 40% will depend on Unity8/dash performance improving in future. Also keep in mind on common mobile devices the touchscreen hardware itself may represent most of the latency. So even a 100% reduction in graphics latency could likely still feel slightly laggy to the user. And thus a 40% improvement is actually less than 40% of the total lag you perceive.

* A "slow" client that gets triple buffers is actually one that's achieving between 50% and 100% of the compositor frame rate. 100% or more is obviously fast and gets double buffers. Less obvious is that really slow clients doing less than 50% frame rate (e.g. a clock) actually keep the system idle enough that the third buffer doesn't get touched so you might see really slow clients only report 1 or 2 buffers in action even though we allow them 3.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

I don't see a way of disabling the prediction.

review: Needs Fixing
lp:~vanvugt/mir/ddouble updated
2152. By Daniel van Vugt

Merge latest trunk

2153. By Daniel van Vugt

Merge latest trunk

2154. By Daniel van Vugt

Add an API for getting/setting the scaling threshold, as requested by
Alberto. But I suggest nobody should every use it because the heuristic
for scaling down is very reliable and you'd just get worse performance if
you ever turned it off.

This also allows the new tests to be cleaner with no magic constants
any more.

2155. By Daniel van Vugt

No need to disable any tests any more. Just tweak the scaling delay.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

While descriptions of manual testing scenarios are interesting we all know that manual testing doesn't scale and we'll miss regressions.

Can we think of a way to automate any of this? Even if we need to define a new test category "test_needing_graphics_hardware"? (I know this problem existed before this MP but here's as good a place as any to discuss.)

It would be really nice to have something more than unit tests to prove the /system/ behavior is as intended, not just the buffer queue.

review: Needs Information
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The new tests cover the new functionality completely. And what you don't see in the diff is existing test `slow_client_framerate_matches_compositor' covering off any potential for performance regressions really well (it took quite some effort to make that test pass).

Of course more tests always help. But AFAIK we're at 100% functionality coverage here already so it shouldn't be a blocking issue. All pre-existing tests still pass too.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I agree that the new unit tests cover the new behavior of BufferQueue.

But I disagree that the proposed tests cover the new functionality of the system: there are no new tests of /the system as a whole/. The potential value of such tests is clear from the "Description of the Change" detailing how to /manually/ test the new functionality.

We should always be looking for ways to avoid relying on manual testing and this seems like an opportunity.

Revision history for this message
Kevin DuBois (kdub) wrote :

Seems like the idea can get a better latency/throughput balance, some questions...

Questions:
I can see some sort of policy (mc::BufferingPolicy?) being added to the constructor to try to keep the complexity of BufferQueue down, and make the policies interchangeable (also could providing a null/disabled policy to turn off dynamism)

190+ void set_scaling_delay(int nframes);
191+ int scaling_delay() const;
these are just here for for the benefit of the tests... is there a plan in the works so that something else in the system can change this?

249+ std::this_thread::sleep_for(throttled_compositor_rate);
Is this timing sensitive?

nits:
47+ else if (int(buffers.size()) < nbuffers)
c-cast

review: Needs Information
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

set_scaling_delay was only just added to satisfy Alberto. I preferred not having that at all because in the absence of bugs (and none are known) it makes no sense to ever turn it off. That just allows people to shoot themselves in the proverbial foot, performance wise. Throughput is never compromised so it should always be turned on. Although it does help with a couple of tests that are invalid with queue scaling so would need to be deleted/disabled in the absence of set_scaling_delay.

The sleep_for was not my first preference. I toyed a lot with timerless tests but what I found was that they were nowhere near as effective at finding subtle bugs (when I intentionally introduced them) as the tests using real time. So I switched back to using real time because they're stronger tests more sensitive to subtle mistakes. Just as the existing `slow_client_framerate_matches_compositor' is so effective it continues to surprise me.

Yeah that cast is there for readability. If I said:
  int x = buffers.size();
  if (x < nbuffers)
then no one would complain. It's all the same and not something we should bother blocking on. The current form is more readable.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Alan:
I only mentioned manual testing in the description because people should have some healthy suspicion of my claims and test for themselves. The suggestion to use manual testing is for healthy quality assurance and does not indicate any gap in the automated test coverage.

Revision history for this message
Kevin DuBois (kdub) wrote :

> Yeah that cast is there for readability. If I said:
> int x = buffers.size();
> if (x < nbuffers)
> then no one would complain. It's all the same and not something we should
> bother blocking on. The current form is more readable.

Just trying to stick to the style guide :)
http://unity.ubuntu.com/mir/cppguide/index.html?showone=Casting#Casting
We've been chatting about some of the style guide stuff in the EU/US standup (and probably at the sprint next week too), maybe just another point of the style guide to bring up

Revision history for this message
Kevin DuBois (kdub) wrote :

> The sleep_for was not my first preference. I toyed a lot with timerless tests
> but what I found was that they were nowhere near as effective at finding
> subtle bugs (when I intentionally introduced them) as the tests using real
> time. So I switched back to using real time because they're stronger tests
> more sensitive to subtle mistakes. Just as the existing
> `slow_client_framerate_matches_compositor' is so effective it continues to
> surprise me.

So, obviously the worst thing possible is to have a bug or regression, so a strong test is good. Another bad thing though is to have tests break under loaded CI, or to have the tests take a long time to run.

The test seems robust enough, but just leery of timing tests from past bugs.
The unit test time increases significantly with these tests. At least on my system, the old BufferQueueTests take 3-4s to run, but the new tests increase that time to about 12-13s. (log of run: http://pastebin.ubuntu.com/11100093/). Is there a way to decrease the test time while still having bugs guarded against? Its difficult to know where to draw the line as to how long a test can take to run, but this seems like an increase in test run time that should be avoided if at all possible.

Revision history for this message
Kevin DuBois (kdub) wrote :

> set_scaling_delay was only just added to satisfy Alberto. I preferred not
> having that at all because in the absence of bugs (and none are known) it
> makes no sense to ever turn it off. That just allows people to shoot
> themselves in the proverbial foot, performance wise. Throughput is never
> compromised so it should always be turned on. Although it does help with a
> couple of tests that are invalid with queue scaling so would need to be
> deleted/disabled in the absence of set_scaling_delay.

Side-stepping the argument of if it should be disabled or configurable a bit [1], the current code does have disable/configure abilities. So, it makes more sense to have this be a policy interface (or even an int parameter) passed in via the BufferQueue constructor than it does to have dangling functions that we don't intend to use in production code.

[1] I buy that its more of a nice-to-have (and easy-to-have) than something we're planning on making use of often.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Indeed I think I can shorten the new tests more and maintain robustness... They're not "time sensitive" like other tests though. The only timing I introduce is the sleep in throttled_compositor_thread() to ensure that the client loop (which is not throttled at all) runs faster than the compositor. That holds true even under load on a slow system so no false positives are expected (but the probability is never quite zero).

As for a "policy" interface. It's possible but would be messy, requiring more code (more classes) and end up less readable than "q.scaling_delay()" is right now. So I'd rather not.

lp:~vanvugt/mir/ddouble updated
2156. By Daniel van Vugt

Merge latest trunk

2157. By Daniel van Vugt

Merge latest trunk

2158. By Daniel van Vugt

Merge latest trunk

2159. By Daniel van Vugt

Shorten the new test a lot. Now down from 13s to 5s, whereas trunk is 3.5s
so the new tests now only add 1.5s.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I'll leave it to those with stronger opinions

review: Abstain
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

"set_scaling_delay was only just added to satisfy Alberto. "

It shouldn't just be me. ANY predictive algorithm will get wrong predictions. And I would prefer something less cryptic than setting -1 on set_scaling_delay. Just a enable_dynamic_depth or something like that should suffice.

47 + else if (int(buffers.size()) < nbuffers)

C-cast... No c-casts per mir style guide.

> As for a "policy" interface. It's possible but would be messy, requiring more
> code (more classes) and end up less readable than "q.scaling_delay()" is right
> now. So I'd rather not.

As oppossed to the complex web of logic introduced in BufferQueue? The algorithm is not clear (which is one benefit of an interface with clear boundaries).

Also it's not obvious what the trade-offs are in this scheme.

Needs fixing, needs info.

review: Needs Fixing
Revision history for this message
Kevin DuBois (kdub) wrote :

@test shortening, thanks!

> As for a "policy" interface. It's possible but would be messy, requiring more
> code (more classes) and end up less readable than "q.scaling_delay()" is right
> now. So I'd rather not.

Sure, it requires some interfaces, but the expressiveness of the code improves by not having to figure out what a 'scaling delay' is. As the code stands now, we have to bake the policies and the swapping all into on BufferQueue object, which has always been pretty complex with just the swapping requirements. It also diffuses the BufferQueue logic a bit, because we have a dividing line that 'this is the size/scaling/queing policy' and 'this is the swapping queue'.

Like Alberto points out, this is something that can be optimized for in different ways, and I can see that using differing policies could be helpful. Having to define the interfaces and relationships between the BufferQueue and the policy-interface would stop the BufferQueue from mixing up the policy portion with the swapping portion, and hopefully lead to a a more flexible system.

So, I guess 'needs-fixing' on the c-cast, and 'appreciates-further-discussion' on the interface (its seeming like a good vs better situation)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I glad people are suspicious. You should be since we've been trying to achieve this for over a year.

Although please be wary of "rearranging deck chairs on the titanic". Requests to change designs that contribute nothing functional and result in more complex code will be accommodated to some extent to avoid argument, but quickly become a waste of time. Try not to be Vogons...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

C cast:
Now removed. Not my preferred syntax but the hit to readability is minor.

What's the trade-off?
Although none has been observed this this current implementation (I never put it up for review until I believed all the corner cases had been solved), in theory there is a trade-off. That is for clients who are on the bleeding edge of keeping up most-of-but-not-quite-all the time. A client keeping up for 80% of frames and only missing every 5th for example would get scaled back to two buffers for 60% of the time, and would get three buffers for the 40% of the time. Depending on the specific client's behaviour, this could result in it keeping up only 60% of the time with this algorithm, instead of 80% of the time with always-triple-buffers.
  But this whole branch is an exercise in working around sub-optimal clients (most notably unity8-dash). I have worked for months on and off to ensure those edge cases we actually see in Ubuntu where our clients are too slow are correctly handled by this algorithm. But in the long run we should of course aim to make all our graphics code fast (so double buffering can be used). It's not hard; just requires some commitment.
  The current algorithm presented here has no visible stuttering on our phones, and provides a slightly visible reduction in lag.

Redesign with a policy object:
I'll make a proposal or two separately into this branch so you can see the changes more clearly...

lp:~vanvugt/mir/ddouble updated
2160. By Daniel van Vugt

Merge latest trunk

2161. By Daniel van Vugt

Use a C++ cast

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Correction:
A client keeping up for 80% of frames and only missing every 5th for example would get scaled back to two buffers for 20% of the time (on the fourth frame), and would get three buffers for the other 80% of the time. Depending on the specific client's behaviour, this could result in it keeping up only 60% of the time with this algorithm, instead of 80% of the time with always-triple-buffers.

Revision history for this message
Kevin DuBois (kdub) wrote :

@the policy discussion,
Its more of a request to define the relationships between the objects and allow for object-oriented substitution of the algorithm. I suppose this can be done later, but (imo) would help us review better if we teased out what policy decisions are being made, and what is being optimized for in the tradeoffs. This seems like something to sort out next week, abstain for now.

review: Abstain
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I will do it, but might not be ready till next week.

Revision history for this message
Alberto Aguirre (albaguirre) wrote :

@Daniel, thanks for the info.

Abstain for now until we have the buffer semantics discussion.

review: Abstain
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Talking about redesigning the buffer allocation architecture today, I won't bother redesigning this proposal for now. Sounds like the effort might go to waste within some months.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Ignore my most recent comments above.

Last week people were saying they'd be happy to land this as-is. Still true?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This branch is important and we need to not drop the ball. Unity8 needs every reduction in latency we can get, released as soon as possible.

If anyone truly wants it redesigned, please tell me ASAP.

Revision history for this message
Kevin DuBois (kdub) wrote :

I guess still (if I can introduce a new branch of arithmetic to expand on the nuances):

+1 that using some sort of heuristic to dynamically trade off between "double buffering" and "triple buffering" is beneficial.

-1 that the heuristic is not easily extractable from BufferQueue.

+? that it improves some situtations, as described in the description, but its difficult to see that we're making a good universal tradeoff.

+? On how to prove that this new feature has transferred to 'the new buffer semantics'. If its something in trunk, we don't want to remove the feature with 'the new buffer semantics'; but it will look markedly different once we have server-initiated buffer transfers. Its difficult to see how to transfer the feature because we don't have an established way to test a variety of loads and pop a hard number out at the end to compare.

so I guess all that adds up to "0??", which (I suppose) translates to abstain.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This branch is visibly beneficial. When used on the phone you can see shell elements are least are a bit more responsive.

And the heuristic is mostly not a heuristic. It's based on careful logic and reasoning which is why it took me over 6 months to bring this version to reality, now 12 months after we first started trying to get back to double buffering.

I don't care to think about new buffer semantics right now. Future work may replace it, but this branch works today and solves 40% of the lag. So let's take advantage of it.

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

As with most additions to BufferQueue, it's not easy to convince oneself that this is bug free. However, it seems to work well in practice, and if we are going to have it, let's introduce it early in the cycle.

My main concern is that this branch would be largely irrelevant after the planned move to client-owned buffers (although we could provide a helper library that implements similar logic on the client side). However, we are not there yet, and since changing our buffer semantics is a long-term effort, I am OK "living for today" in this case.

There are a few things to improve (in the future):

As mentioned in previous comments, I would very much like to see tests that simulate our manual tests, i.e., they check the effects that buffer queue changes have end-to-end.

This branch optimizes latency, but leaves memory consumption intact (i.e. we mantain 3 buffers internally even if we don't need them). It would be nice to see memory consumption addressed too.

review: Approve
Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

There is some healthy skepticism but no strong objections. I'm willing to try this out.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/server/compositor/buffer_queue.cpp'
2--- src/server/compositor/buffer_queue.cpp 2015-05-01 08:54:27 +0000
3+++ src/server/compositor/buffer_queue.cpp 2015-05-14 06:01:15 +0000
4@@ -137,6 +137,8 @@
5 graphics::BufferProperties const& props,
6 mc::FrameDroppingPolicyFactory const& policy_provider)
7 : nbuffers{nbuffers},
8+ frame_deadlines_threshold{3},
9+ frame_deadlines_met{0},
10 frame_dropping_enabled{false},
11 current_compositor_buffer_valid{false},
12 the_properties{props},
13@@ -168,30 +170,63 @@
14 std::make_shared<BufferQueue::LockableCallback>(this));
15 }
16
17+void mc::BufferQueue::set_scaling_delay(int nframes)
18+{
19+ std::unique_lock<decltype(guard)> lock(guard);
20+ frame_deadlines_threshold = nframes;
21+}
22+
23+int mc::BufferQueue::scaling_delay() const
24+{
25+ std::unique_lock<decltype(guard)> lock(guard);
26+ return frame_deadlines_threshold;
27+}
28+
29+bool mc::BufferQueue::client_ahead_of_compositor() const
30+{
31+ return nbuffers > 1 &&
32+ !frame_dropping_enabled && // Never throttle frame droppers
33+ frame_deadlines_threshold >= 0 && // Queue scaling enabled
34+ !ready_to_composite_queue.empty() && // At least one frame is ready
35+ frame_deadlines_met >= frame_deadlines_threshold; // Is smooth
36+}
37+
38+mg::Buffer* mc::BufferQueue::get_a_free_buffer()
39+{
40+ mg::Buffer* buf = nullptr;
41+
42+ if (!free_buffers.empty())
43+ {
44+ buf = free_buffers.back();
45+ free_buffers.pop_back();
46+ }
47+ else if (static_cast<int>(buffers.size()) < nbuffers)
48+ {
49+ auto const& buffer = gralloc->alloc_buffer(the_properties);
50+ buffers.push_back(buffer);
51+ buf = buffer.get();
52+ }
53+
54+ return buf;
55+}
56+
57 void mc::BufferQueue::client_acquire(mc::BufferQueue::Callback complete)
58 {
59 std::unique_lock<decltype(guard)> lock(guard);
60
61 pending_client_notifications.push_back(std::move(complete));
62
63- if (!free_buffers.empty())
64- {
65- auto const buffer = free_buffers.back();
66- free_buffers.pop_back();
67- give_buffer_to_client(buffer, lock);
68- return;
69- }
70-
71- /* No empty buffers, attempt allocating more
72- * TODO: need a good heuristic to switch
73- * between double-buffering to n-buffering
74+ /*
75+ * Fast clients that can be safely throttled should be. This keeps the
76+ * number of prefilled buffers limited to one (equivalent to double-
77+ * buffering) for minimal latency.
78 */
79- int const allocated_buffers = buffers.size();
80- if (allocated_buffers < nbuffers)
81+ if (client_ahead_of_compositor())
82+ return;
83+
84+ if (auto buf = get_a_free_buffer())
85 {
86- auto const& buffer = gralloc->alloc_buffer(the_properties);
87- buffers.push_back(buffer);
88- give_buffer_to_client(buffer.get(), lock);
89+ give_buffer_to_client(buf, lock);
90 return;
91 }
92
93@@ -235,7 +270,14 @@
94 std::unique_lock<decltype(guard)> lock(guard);
95
96 bool use_current_buffer = false;
97- if (!is_a_current_buffer_user(user_id))
98+ if (is_a_current_buffer_user(user_id)) // Primary/fastest display
99+ {
100+ if (ready_to_composite_queue.empty())
101+ frame_deadlines_met = 0;
102+ else if (frame_deadlines_met < frame_deadlines_threshold)
103+ ++frame_deadlines_met;
104+ }
105+ else // Second and subsequent displays sync to the primary one
106 {
107 use_current_buffer = true;
108 current_buffer_users.push_back(user_id);
109@@ -267,6 +309,15 @@
110 current_buffer_users.push_back(user_id);
111 current_compositor_buffer = pop(ready_to_composite_queue);
112 current_compositor_buffer_valid = true;
113+
114+ /*
115+ * If we just emptied the ready queue above and hold this compositor
116+ * buffer for very long (e.g. bypass) then there's a chance the client
117+ * would starve and miss a frame. Make sure that can't happen, by
118+ * using more than double buffers...
119+ */
120+ if (!buffer_to_release && !client_ahead_of_compositor())
121+ buffer_to_release = get_a_free_buffer();
122 }
123 else if (current_buffer_users.empty())
124 { // current_buffer_users and ready_to_composite_queue both empty
125@@ -375,14 +426,40 @@
126 ++count;
127 }
128
129+ /*
130+ * Intentionally schedule one more frame than we need, and for good
131+ * reason... We can only accurately detect frame_deadlines_met in
132+ * compositor_acquire if compositor_acquire is still waking up at full
133+ * frame rate even with a slow client. This is crucial to scaling the
134+ * queue performance dynamically in "client_ahead_of_compositor".
135+ * But don't be concerned; very little is lost by over-scheduling. Under
136+ * normal smooth rendering conditions all frames are used (not wasted).
137+ * And under sluggish client rendering conditions the extra frame has a
138+ * critical role in providing a sample point in which we detect if the
139+ * client is keeping up. Only when the compositor changes from active to
140+ * idle is the extra frame wasted. Sounds like a reasonable price to pay
141+ * for dynamic performance monitoring.
142+ */
143+ if (count && frame_deadlines_threshold >= 0)
144+ ++count;
145+
146 return count;
147 }
148
149+/*
150+ * This function is a kludge used in tests only. It attempts to predict how
151+ * many calls to client_acquire you can make before it blocks.
152+ * Future tests should not use it.
153+ */
154 int mc::BufferQueue::buffers_free_for_client() const
155 {
156 std::lock_guard<decltype(guard)> lock(guard);
157 int ret = 1;
158- if (nbuffers > 1)
159+ if (nbuffers == 1)
160+ ret = 1;
161+ else if (client_ahead_of_compositor()) // client_acquire will block
162+ ret = 0;
163+ else
164 {
165 int nfree = free_buffers.size();
166 int future_growth = nbuffers - buffers.size();
167@@ -456,7 +533,7 @@
168 mg::Buffer* buffer,
169 std::unique_lock<std::mutex> lock)
170 {
171- if (!pending_client_notifications.empty())
172+ if (!pending_client_notifications.empty() && !client_ahead_of_compositor())
173 {
174 framedrop_policy->swap_unblocked();
175 give_buffer_to_client(buffer, lock);
176
177=== modified file 'src/server/compositor/buffer_queue.h'
178--- src/server/compositor/buffer_queue.h 2015-04-30 11:36:36 +0000
179+++ src/server/compositor/buffer_queue.h 2015-05-14 06:01:15 +0000
180@@ -65,6 +65,15 @@
181 void drop_old_buffers() override;
182 void drop_client_requests() override;
183
184+ /**
185+ * Set the minimum number of smooth frames the client must keep up with
186+ * the compositor for in order to qualify for queue scaling (dynamic
187+ * double buffering for reduced latency). A negative value means never
188+ * but it's recommended that you never change this.
189+ */
190+ void set_scaling_delay(int nframes);
191+ int scaling_delay() const;
192+
193 private:
194 class LockableCallback;
195 enum SnapshotWait
196@@ -94,7 +103,12 @@
197
198 std::deque<Callback> pending_client_notifications;
199
200+ bool client_ahead_of_compositor() const;
201+ graphics::Buffer* get_a_free_buffer();
202+
203 int nbuffers;
204+ int frame_deadlines_threshold;
205+ int frame_deadlines_met;
206 bool frame_dropping_enabled;
207 bool current_compositor_buffer_valid;
208 graphics::BufferProperties the_properties;
209
210=== modified file 'tests/unit-tests/compositor/test_buffer_queue.cpp'
211--- tests/unit-tests/compositor/test_buffer_queue.cpp 2015-04-30 11:36:36 +0000
212+++ tests/unit-tests/compositor/test_buffer_queue.cpp 2015-05-14 06:01:15 +0000
213@@ -151,7 +151,8 @@
214 return handle->buffer();
215 }
216
217-void compositor_thread(mc::BufferQueue &bundle, std::atomic<bool> &done)
218+void unthrottled_compositor_thread(mc::BufferQueue &bundle,
219+ std::atomic<bool> &done)
220 {
221 while (!done)
222 {
223@@ -160,6 +161,36 @@
224 }
225 }
226
227+std::chrono::milliseconds const throttled_compositor_rate(10);
228+void throttled_compositor_thread(mc::BufferQueue &bundle,
229+ std::atomic<bool> &done)
230+{
231+ while (!done)
232+ {
233+ bundle.compositor_release(bundle.compositor_acquire(nullptr));
234+ std::this_thread::sleep_for(throttled_compositor_rate);
235+ }
236+}
237+
238+void overlapping_compositor_thread(mc::BufferQueue &bundle,
239+ std::atomic<bool> &done)
240+{
241+ std::shared_ptr<mg::Buffer> b[2];
242+ int i = 0;
243+
244+ b[0] = bundle.compositor_acquire(nullptr);
245+ while (!done)
246+ {
247+ b[i^1] = bundle.compositor_acquire(nullptr);
248+ bundle.compositor_release(b[i]);
249+ std::this_thread::sleep_for(throttled_compositor_rate);
250+ i ^= 1;
251+ }
252+
253+ if (b[i])
254+ bundle.compositor_release(b[i]);
255+}
256+
257 void snapshot_thread(mc::BufferQueue &bundle,
258 std::atomic<bool> &done)
259 {
260@@ -420,10 +451,13 @@
261 {
262 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
263
264+ // This test is technically not valid with dynamic queue scaling on
265+ q.set_scaling_delay(-1);
266+
267 std::atomic<bool> done(false);
268 auto unblock = [&done] { done = true; };
269 mt::AutoUnblockThread compositor(unblock,
270- compositor_thread, std::ref(q), std::ref(done));
271+ unthrottled_compositor_thread, std::ref(q), std::ref(done));
272
273 std::unordered_set<uint32_t> ids_acquired;
274 int const max_ownable_buffers = nbuffers - 1;
275@@ -537,8 +571,7 @@
276 {
277 std::deque<mg::BufferID> client_release_sequence;
278 std::vector<mg::Buffer *> buffers;
279- int const max_ownable_buffers = nbuffers - 1;
280- for (int i = 0; i < max_ownable_buffers; ++i)
281+ for (int i = 0; i < q.buffers_free_for_client(); ++i)
282 {
283 auto handle = client_acquire_async(q);
284 ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
285@@ -804,7 +837,7 @@
286
287 auto unblock = [&done]{ done = true;};
288
289- mt::AutoUnblockThread compositor(unblock, compositor_thread,
290+ mt::AutoUnblockThread compositor(unblock, unthrottled_compositor_thread,
291 std::ref(q),
292 std::ref(done));
293 mt::AutoUnblockThread snapshotter1(unblock, snapshot_thread,
294@@ -1393,8 +1426,14 @@
295 // Consume
296 for (int m = 0; m < nmonitors; ++m)
297 {
298- ASSERT_EQ(1, q.buffers_ready_for_compositor(&monitor[m]));
299- q.compositor_release(q.compositor_acquire(&monitor[m]));
300+ ASSERT_NE(0, q.buffers_ready_for_compositor(&monitor[m]));
301+
302+ // Double consume to account for the +1 that
303+ // buffers_ready_for_compositor adds to do dynamic performance
304+ // detection.
305+ q.compositor_release(q.compositor_acquire(&monitor[m]));
306+ q.compositor_release(q.compositor_acquire(&monitor[m]));
307+
308 ASSERT_EQ(0, q.buffers_ready_for_compositor(&monitor[m]));
309 }
310 }
311@@ -1466,7 +1505,7 @@
312 std::atomic<bool> done(false);
313
314 auto unblock = [&done]{ done = true; };
315- mt::AutoUnblockThread compositor_thread(unblock, [&]
316+ mt::AutoUnblockThread unthrottled_compositor_thread(unblock, [&]
317 {
318 while (!done)
319 {
320@@ -1552,6 +1591,9 @@
321 {
322 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
323
324+ // This test is technically not valid with dynamic queue scaling on
325+ q.set_scaling_delay(-1);
326+
327 void const* main_compositor = reinterpret_cast<void const*>(0);
328 void const* second_compositor = reinterpret_cast<void const*>(1);
329
330@@ -1579,7 +1621,7 @@
331 std::atomic<bool> done(false);
332 auto unblock = [&done] { done = true; };
333 mt::AutoUnblockThread compositor(unblock,
334- compositor_thread, std::ref(q), std::ref(done));
335+ unthrottled_compositor_thread, std::ref(q), std::ref(done));
336
337 std::unordered_set<mg::Buffer *> unique_buffers_acquired;
338 int const max_ownable_buffers = nbuffers - 1;
339@@ -1606,32 +1648,131 @@
340 }
341 }
342
343-/* FIXME (enabling this optimization breaks timing tests) */
344-TEST_F(BufferQueueTest, DISABLED_synchronous_clients_only_get_two_real_buffers)
345+// Test that dynamic queue scaling/throttling actually works
346+TEST_F(BufferQueueTest, queue_size_scales_with_client_performance)
347+{
348+ for (int nbuffers = 3; nbuffers <= max_nbuffers_to_test; ++nbuffers)
349+ {
350+ mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
351+ q.allow_framedropping(false);
352+
353+ std::atomic<bool> done(false);
354+ auto unblock = [&done] { done = true; };
355+
356+ // To emulate a "fast" client we use a "slow" compositor
357+ mt::AutoUnblockThread compositor(unblock,
358+ throttled_compositor_thread, std::ref(q), std::ref(done));
359+
360+ std::unordered_set<mg::Buffer *> buffers_acquired;
361+
362+ int const delay = q.scaling_delay();
363+ EXPECT_EQ(3, delay); // expect a sane default
364+
365+ for (int frame = 0; frame < 10; frame++)
366+ {
367+ auto handle = client_acquire_async(q);
368+ handle->wait_for(std::chrono::seconds(1));
369+ ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
370+
371+ if (frame > delay)
372+ buffers_acquired.insert(handle->buffer());
373+ handle->release_buffer();
374+ }
375+ // Expect double-buffers for fast clients
376+ EXPECT_THAT(buffers_acquired.size(), Eq(2));
377+
378+ // Now check what happens if the client becomes slow...
379+ buffers_acquired.clear();
380+ for (int frame = 0; frame < 10; frame++)
381+ {
382+ auto handle = client_acquire_async(q);
383+ handle->wait_for(std::chrono::seconds(1));
384+ ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
385+
386+ if (frame > delay)
387+ buffers_acquired.insert(handle->buffer());
388+
389+ // Client is just too slow to keep up:
390+ std::this_thread::sleep_for(throttled_compositor_rate * 1.5);
391+
392+ handle->release_buffer();
393+ }
394+ // Expect at least triple buffers for sluggish clients
395+ EXPECT_THAT(buffers_acquired.size(), Ge(3));
396+
397+ // And what happens if the client becomes fast again?...
398+ buffers_acquired.clear();
399+ for (int frame = 0; frame < 10; frame++)
400+ {
401+ auto handle = client_acquire_async(q);
402+ handle->wait_for(std::chrono::seconds(1));
403+ ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
404+
405+ if (frame > delay)
406+ buffers_acquired.insert(handle->buffer());
407+ handle->release_buffer();
408+ }
409+ // Expect double-buffers for fast clients
410+ EXPECT_THAT(buffers_acquired.size(), Eq(2));
411+ }
412+}
413+
414+TEST_F(BufferQueueTest, greedy_compositors_need_triple_buffers)
415+{
416+ /*
417+ * "Greedy" compositors means those that can hold multiple buffers from
418+ * the same client simultaneously or a single buffer for a long time.
419+ * This usually means bypass/overlays, but can also mean multi-monitor.
420+ */
421+ for (int nbuffers = 3; nbuffers <= max_nbuffers_to_test; ++nbuffers)
422+ {
423+ mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
424+ q.allow_framedropping(false);
425+
426+ std::atomic<bool> done(false);
427+ auto unblock = [&done] { done = true; };
428+
429+ mt::AutoUnblockThread compositor(unblock,
430+ overlapping_compositor_thread, std::ref(q), std::ref(done));
431+
432+ std::unordered_set<mg::Buffer *> buffers_acquired;
433+ int const delay = q.scaling_delay();
434+
435+ for (int frame = 0; frame < 10; frame++)
436+ {
437+ auto handle = client_acquire_async(q);
438+ handle->wait_for(std::chrono::seconds(1));
439+ ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
440+
441+ if (frame > delay)
442+ buffers_acquired.insert(handle->buffer());
443+ handle->release_buffer();
444+ }
445+ // Expect triple buffers for the whole time
446+ EXPECT_THAT(buffers_acquired.size(), Ge(3));
447+ }
448+}
449+
450+TEST_F(BufferQueueTest, compositor_double_rate_of_slow_client)
451 {
452 for (int nbuffers = 2; nbuffers <= max_nbuffers_to_test; ++nbuffers)
453 {
454 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
455 q.allow_framedropping(false);
456
457- std::atomic<bool> done(false);
458- auto unblock = [&done] { done = true; };
459- mt::AutoUnblockThread compositor(unblock,
460- compositor_thread, std::ref(q), std::ref(done));
461-
462- std::unordered_set<mg::Buffer *> buffers_acquired;
463-
464- for (int frame = 0; frame < 100; frame++)
465+ for (int frame = 0; frame < 10; frame++)
466 {
467- auto handle = client_acquire_async(q);
468- handle->wait_for(std::chrono::seconds(1));
469- ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
470+ ASSERT_EQ(0, q.buffers_ready_for_compositor(this));
471+ q.client_release(client_acquire_sync(q));
472
473- buffers_acquired.insert(handle->buffer());
474- handle->release_buffer();
475+ // Detecting a slow client requires scheduling at least one extra
476+ // frame...
477+ int nready = q.buffers_ready_for_compositor(this);
478+ ASSERT_EQ(2, nready);
479+ for (int i = 0; i < nready; ++i)
480+ q.compositor_release(q.compositor_acquire(this));
481+ ASSERT_EQ(0, q.buffers_ready_for_compositor(this));
482 }
483-
484- EXPECT_THAT(buffers_acquired.size(), Eq(2));
485 }
486 }
487

Subscribers

People subscribed via source and target branches