Mir

Merge lp:~vanvugt/mir/ddouble into lp:mir

Proposed by Daniel van Vugt
Status: Merged
Approved by: Daniel van Vugt
Approved revision: no longer in the source branch.
Merged at revision: 2673
Proposed branch: lp:~vanvugt/mir/ddouble
Merge into: lp:mir
Diff against target: 486 lines (+277/-45)
3 files modified
src/server/compositor/buffer_queue.cpp (+96/-19)
src/server/compositor/buffer_queue.h (+14/-0)
tests/unit-tests/compositor/test_buffer_queue.cpp (+167/-26)
To merge this branch: bzr merge lp:~vanvugt/mir/ddouble
Reviewer Review Type Date Requested Status
Cemil Azizoglu (community) Approve
Alexandros Frantzis (community) Approve
Alberto Aguirre (community) Abstain
Kevin DuBois (community) Abstain
PS Jenkins bot (community) continuous-integration Approve
Alan Griffiths Abstain
Review via email: mp+258351@code.launchpad.net

Commit message

Introducing dynamic buffer queue scaling to minimize lag.
(LP: #1240909)

Actually the size of the queue doesn't really change with this branch.
What it does do is reliably detect when a client is staying ahead of
the compositor and throttle it so that no more than one future frame
is ever pre-rendered. Hence you get the same single-frame latency as
double buffering. But the moment the client fails to keep up, it will
get to use all three buffers again.

Description of the change

Background reading and imagery to help you understand the improvement:
https://docs.google.com/document/d/116i4TC0rls4wKFmbaRrHL_UT_Jg22XI8YqpXGLLTVCc/edit#

Fun facts:

* For manual testing you can see the switching in action if you set MIR_CLIENT_PERF_REPORT=log and then resize mir_demo_client_flicker to large enough that it takes too long to render (>16ms). It will get triple buffers then. If you shrink it, the client can render quickly again and reports double buffers.

* Strange and wonderful things happen with input resampling. Event-driven clients that use the default 59Hz resampling rate (e.g. mir_demo_client_eglsquare) will generally not be detected as "keeping up" and so will always be on triple buffers. But surprisingly this is a good thing -- the low input rate means the compositor is always consuming buffers faster than we're producing and you get lag around one frame as the consumer eats through the queue up to where the producer is (so the queue length is almost irrelevant). Other clients that reliably produce at full frame rate or higher however will get switched to double buffers (one frame lag).

* Additional manual testing has been done on a couple of phones (I backported this branch). Everything remains smooth. Unity8 is fast enough on some devices to qualify for double buffering. Unity8-dash is generally not fast enough on any device to qualify. You can see the stats on a live phone with:
    restart unity8 MIR_CLIENT_PERF_REPORT=log
    tail -f ~/.cache/upstart/unity8.log

* In raw numbers this branch at least reduces end-to-end lag on the phone from 5 frames to 4 (when unity8 qualifies for double buffering). And if your app (not unity8-dash) is fast enough to keep up too, it's then only 3 frames lag. So this branch reduces Unity8/USC total graphics latency by 20-40%, but more consistent results closer to 40% will depend on Unity8/dash performance improving in future. Also keep in mind on common mobile devices the touchscreen hardware itself may represent most of the latency. So even a 100% reduction in graphics latency could likely still feel slightly laggy to the user. And thus a 40% improvement is actually less than 40% of the total lag you perceive.

* A "slow" client that gets triple buffers is actually one that's achieving between 50% and 100% of the compositor frame rate. 100% or more is obviously fast and gets double buffers. Less obvious is that really slow clients doing less than 50% frame rate (e.g. a clock) actually keep the system idle enough that the third buffer doesn't get touched so you might see really slow clients only report 1 or 2 buffers in action even though we allow them 3.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

I don't see a way of disabling the prediction.

review: Needs Fixing
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

While descriptions of manual testing scenarios are interesting we all know that manual testing doesn't scale and we'll miss regressions.

Can we think of a way to automate any of this? Even if we need to define a new test category "test_needing_graphics_hardware"? (I know this problem existed before this MP but here's as good a place as any to discuss.)

It would be really nice to have something more than unit tests to prove the /system/ behavior is as intended, not just the buffer queue.

review: Needs Information
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The new tests cover the new functionality completely. And what you don't see in the diff is existing test `slow_client_framerate_matches_compositor' covering off any potential for performance regressions really well (it took quite some effort to make that test pass).

Of course more tests always help. But AFAIK we're at 100% functionality coverage here already so it shouldn't be a blocking issue. All pre-existing tests still pass too.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I agree that the new unit tests cover the new behavior of BufferQueue.

But I disagree that the proposed tests cover the new functionality of the system: there are no new tests of /the system as a whole/. The potential value of such tests is clear from the "Description of the Change" detailing how to /manually/ test the new functionality.

We should always be looking for ways to avoid relying on manual testing and this seems like an opportunity.

Revision history for this message
Kevin DuBois (kdub) wrote :

Seems like the idea can get a better latency/throughput balance, some questions...

Questions:
I can see some sort of policy (mc::BufferingPolicy?) being added to the constructor to try to keep the complexity of BufferQueue down, and make the policies interchangeable (also could providing a null/disabled policy to turn off dynamism)

190+ void set_scaling_delay(int nframes);
191+ int scaling_delay() const;
these are just here for for the benefit of the tests... is there a plan in the works so that something else in the system can change this?

249+ std::this_thread::sleep_for(throttled_compositor_rate);
Is this timing sensitive?

nits:
47+ else if (int(buffers.size()) < nbuffers)
c-cast

review: Needs Information
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

set_scaling_delay was only just added to satisfy Alberto. I preferred not having that at all because in the absence of bugs (and none are known) it makes no sense to ever turn it off. That just allows people to shoot themselves in the proverbial foot, performance wise. Throughput is never compromised so it should always be turned on. Although it does help with a couple of tests that are invalid with queue scaling so would need to be deleted/disabled in the absence of set_scaling_delay.

The sleep_for was not my first preference. I toyed a lot with timerless tests but what I found was that they were nowhere near as effective at finding subtle bugs (when I intentionally introduced them) as the tests using real time. So I switched back to using real time because they're stronger tests more sensitive to subtle mistakes. Just as the existing `slow_client_framerate_matches_compositor' is so effective it continues to surprise me.

Yeah that cast is there for readability. If I said:
  int x = buffers.size();
  if (x < nbuffers)
then no one would complain. It's all the same and not something we should bother blocking on. The current form is more readable.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Alan:
I only mentioned manual testing in the description because people should have some healthy suspicion of my claims and test for themselves. The suggestion to use manual testing is for healthy quality assurance and does not indicate any gap in the automated test coverage.

Revision history for this message
Kevin DuBois (kdub) wrote :

> Yeah that cast is there for readability. If I said:
> int x = buffers.size();
> if (x < nbuffers)
> then no one would complain. It's all the same and not something we should
> bother blocking on. The current form is more readable.

Just trying to stick to the style guide :)
http://unity.ubuntu.com/mir/cppguide/index.html?showone=Casting#Casting
We've been chatting about some of the style guide stuff in the EU/US standup (and probably at the sprint next week too), maybe just another point of the style guide to bring up

Revision history for this message
Kevin DuBois (kdub) wrote :

> The sleep_for was not my first preference. I toyed a lot with timerless tests
> but what I found was that they were nowhere near as effective at finding
> subtle bugs (when I intentionally introduced them) as the tests using real
> time. So I switched back to using real time because they're stronger tests
> more sensitive to subtle mistakes. Just as the existing
> `slow_client_framerate_matches_compositor' is so effective it continues to
> surprise me.

So, obviously the worst thing possible is to have a bug or regression, so a strong test is good. Another bad thing though is to have tests break under loaded CI, or to have the tests take a long time to run.

The test seems robust enough, but just leery of timing tests from past bugs.
The unit test time increases significantly with these tests. At least on my system, the old BufferQueueTests take 3-4s to run, but the new tests increase that time to about 12-13s. (log of run: http://pastebin.ubuntu.com/11100093/). Is there a way to decrease the test time while still having bugs guarded against? Its difficult to know where to draw the line as to how long a test can take to run, but this seems like an increase in test run time that should be avoided if at all possible.

Revision history for this message
Kevin DuBois (kdub) wrote :

> set_scaling_delay was only just added to satisfy Alberto. I preferred not
> having that at all because in the absence of bugs (and none are known) it
> makes no sense to ever turn it off. That just allows people to shoot
> themselves in the proverbial foot, performance wise. Throughput is never
> compromised so it should always be turned on. Although it does help with a
> couple of tests that are invalid with queue scaling so would need to be
> deleted/disabled in the absence of set_scaling_delay.

Side-stepping the argument of if it should be disabled or configurable a bit [1], the current code does have disable/configure abilities. So, it makes more sense to have this be a policy interface (or even an int parameter) passed in via the BufferQueue constructor than it does to have dangling functions that we don't intend to use in production code.

[1] I buy that its more of a nice-to-have (and easy-to-have) than something we're planning on making use of often.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Indeed I think I can shorten the new tests more and maintain robustness... They're not "time sensitive" like other tests though. The only timing I introduce is the sleep in throttled_compositor_thread() to ensure that the client loop (which is not throttled at all) runs faster than the compositor. That holds true even under load on a slow system so no false positives are expected (but the probability is never quite zero).

As for a "policy" interface. It's possible but would be messy, requiring more code (more classes) and end up less readable than "q.scaling_delay()" is right now. So I'd rather not.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I'll leave it to those with stronger opinions

review: Abstain
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

"set_scaling_delay was only just added to satisfy Alberto. "

It shouldn't just be me. ANY predictive algorithm will get wrong predictions. And I would prefer something less cryptic than setting -1 on set_scaling_delay. Just a enable_dynamic_depth or something like that should suffice.

47 + else if (int(buffers.size()) < nbuffers)

C-cast... No c-casts per mir style guide.

> As for a "policy" interface. It's possible but would be messy, requiring more
> code (more classes) and end up less readable than "q.scaling_delay()" is right
> now. So I'd rather not.

As oppossed to the complex web of logic introduced in BufferQueue? The algorithm is not clear (which is one benefit of an interface with clear boundaries).

Also it's not obvious what the trade-offs are in this scheme.

Needs fixing, needs info.

review: Needs Fixing
Revision history for this message
Kevin DuBois (kdub) wrote :

@test shortening, thanks!

> As for a "policy" interface. It's possible but would be messy, requiring more
> code (more classes) and end up less readable than "q.scaling_delay()" is right
> now. So I'd rather not.

Sure, it requires some interfaces, but the expressiveness of the code improves by not having to figure out what a 'scaling delay' is. As the code stands now, we have to bake the policies and the swapping all into on BufferQueue object, which has always been pretty complex with just the swapping requirements. It also diffuses the BufferQueue logic a bit, because we have a dividing line that 'this is the size/scaling/queing policy' and 'this is the swapping queue'.

Like Alberto points out, this is something that can be optimized for in different ways, and I can see that using differing policies could be helpful. Having to define the interfaces and relationships between the BufferQueue and the policy-interface would stop the BufferQueue from mixing up the policy portion with the swapping portion, and hopefully lead to a a more flexible system.

So, I guess 'needs-fixing' on the c-cast, and 'appreciates-further-discussion' on the interface (its seeming like a good vs better situation)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I glad people are suspicious. You should be since we've been trying to achieve this for over a year.

Although please be wary of "rearranging deck chairs on the titanic". Requests to change designs that contribute nothing functional and result in more complex code will be accommodated to some extent to avoid argument, but quickly become a waste of time. Try not to be Vogons...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

C cast:
Now removed. Not my preferred syntax but the hit to readability is minor.

What's the trade-off?
Although none has been observed this this current implementation (I never put it up for review until I believed all the corner cases had been solved), in theory there is a trade-off. That is for clients who are on the bleeding edge of keeping up most-of-but-not-quite-all the time. A client keeping up for 80% of frames and only missing every 5th for example would get scaled back to two buffers for 60% of the time, and would get three buffers for the 40% of the time. Depending on the specific client's behaviour, this could result in it keeping up only 60% of the time with this algorithm, instead of 80% of the time with always-triple-buffers.
  But this whole branch is an exercise in working around sub-optimal clients (most notably unity8-dash). I have worked for months on and off to ensure those edge cases we actually see in Ubuntu where our clients are too slow are correctly handled by this algorithm. But in the long run we should of course aim to make all our graphics code fast (so double buffering can be used). It's not hard; just requires some commitment.
  The current algorithm presented here has no visible stuttering on our phones, and provides a slightly visible reduction in lag.

Redesign with a policy object:
I'll make a proposal or two separately into this branch so you can see the changes more clearly...

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Correction:
A client keeping up for 80% of frames and only missing every 5th for example would get scaled back to two buffers for 20% of the time (on the fourth frame), and would get three buffers for the other 80% of the time. Depending on the specific client's behaviour, this could result in it keeping up only 60% of the time with this algorithm, instead of 80% of the time with always-triple-buffers.

Revision history for this message
Kevin DuBois (kdub) wrote :

@the policy discussion,
Its more of a request to define the relationships between the objects and allow for object-oriented substitution of the algorithm. I suppose this can be done later, but (imo) would help us review better if we teased out what policy decisions are being made, and what is being optimized for in the tradeoffs. This seems like something to sort out next week, abstain for now.

review: Abstain
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I will do it, but might not be ready till next week.

Revision history for this message
Alberto Aguirre (albaguirre) wrote :

@Daniel, thanks for the info.

Abstain for now until we have the buffer semantics discussion.

review: Abstain
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Talking about redesigning the buffer allocation architecture today, I won't bother redesigning this proposal for now. Sounds like the effort might go to waste within some months.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Ignore my most recent comments above.

Last week people were saying they'd be happy to land this as-is. Still true?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This branch is important and we need to not drop the ball. Unity8 needs every reduction in latency we can get, released as soon as possible.

If anyone truly wants it redesigned, please tell me ASAP.

Revision history for this message
Kevin DuBois (kdub) wrote :

I guess still (if I can introduce a new branch of arithmetic to expand on the nuances):

+1 that using some sort of heuristic to dynamically trade off between "double buffering" and "triple buffering" is beneficial.

-1 that the heuristic is not easily extractable from BufferQueue.

+? that it improves some situtations, as described in the description, but its difficult to see that we're making a good universal tradeoff.

+? On how to prove that this new feature has transferred to 'the new buffer semantics'. If its something in trunk, we don't want to remove the feature with 'the new buffer semantics'; but it will look markedly different once we have server-initiated buffer transfers. Its difficult to see how to transfer the feature because we don't have an established way to test a variety of loads and pop a hard number out at the end to compare.

so I guess all that adds up to "0??", which (I suppose) translates to abstain.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This branch is visibly beneficial. When used on the phone you can see shell elements are least are a bit more responsive.

And the heuristic is mostly not a heuristic. It's based on careful logic and reasoning which is why it took me over 6 months to bring this version to reality, now 12 months after we first started trying to get back to double buffering.

I don't care to think about new buffer semantics right now. Future work may replace it, but this branch works today and solves 40% of the lag. So let's take advantage of it.

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

As with most additions to BufferQueue, it's not easy to convince oneself that this is bug free. However, it seems to work well in practice, and if we are going to have it, let's introduce it early in the cycle.

My main concern is that this branch would be largely irrelevant after the planned move to client-owned buffers (although we could provide a helper library that implements similar logic on the client side). However, we are not there yet, and since changing our buffer semantics is a long-term effort, I am OK "living for today" in this case.

There are a few things to improve (in the future):

As mentioned in previous comments, I would very much like to see tests that simulate our manual tests, i.e., they check the effects that buffer queue changes have end-to-end.

This branch optimizes latency, but leaves memory consumption intact (i.e. we mantain 3 buffers internally even if we don't need them). It would be nice to see memory consumption addressed too.

review: Approve
Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

There is some healthy skepticism but no strong objections. I'm willing to try this out.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'src/server/compositor/buffer_queue.cpp'
--- src/server/compositor/buffer_queue.cpp 2015-05-01 08:54:27 +0000
+++ src/server/compositor/buffer_queue.cpp 2015-05-14 06:01:15 +0000
@@ -137,6 +137,8 @@
137 graphics::BufferProperties const& props,137 graphics::BufferProperties const& props,
138 mc::FrameDroppingPolicyFactory const& policy_provider)138 mc::FrameDroppingPolicyFactory const& policy_provider)
139 : nbuffers{nbuffers},139 : nbuffers{nbuffers},
140 frame_deadlines_threshold{3},
141 frame_deadlines_met{0},
140 frame_dropping_enabled{false},142 frame_dropping_enabled{false},
141 current_compositor_buffer_valid{false},143 current_compositor_buffer_valid{false},
142 the_properties{props},144 the_properties{props},
@@ -168,30 +170,63 @@
168 std::make_shared<BufferQueue::LockableCallback>(this));170 std::make_shared<BufferQueue::LockableCallback>(this));
169}171}
170172
173void mc::BufferQueue::set_scaling_delay(int nframes)
174{
175 std::unique_lock<decltype(guard)> lock(guard);
176 frame_deadlines_threshold = nframes;
177}
178
179int mc::BufferQueue::scaling_delay() const
180{
181 std::unique_lock<decltype(guard)> lock(guard);
182 return frame_deadlines_threshold;
183}
184
185bool mc::BufferQueue::client_ahead_of_compositor() const
186{
187 return nbuffers > 1 &&
188 !frame_dropping_enabled && // Never throttle frame droppers
189 frame_deadlines_threshold >= 0 && // Queue scaling enabled
190 !ready_to_composite_queue.empty() && // At least one frame is ready
191 frame_deadlines_met >= frame_deadlines_threshold; // Is smooth
192}
193
194mg::Buffer* mc::BufferQueue::get_a_free_buffer()
195{
196 mg::Buffer* buf = nullptr;
197
198 if (!free_buffers.empty())
199 {
200 buf = free_buffers.back();
201 free_buffers.pop_back();
202 }
203 else if (static_cast<int>(buffers.size()) < nbuffers)
204 {
205 auto const& buffer = gralloc->alloc_buffer(the_properties);
206 buffers.push_back(buffer);
207 buf = buffer.get();
208 }
209
210 return buf;
211}
212
171void mc::BufferQueue::client_acquire(mc::BufferQueue::Callback complete)213void mc::BufferQueue::client_acquire(mc::BufferQueue::Callback complete)
172{214{
173 std::unique_lock<decltype(guard)> lock(guard);215 std::unique_lock<decltype(guard)> lock(guard);
174216
175 pending_client_notifications.push_back(std::move(complete));217 pending_client_notifications.push_back(std::move(complete));
176218
177 if (!free_buffers.empty())219 /*
178 {220 * Fast clients that can be safely throttled should be. This keeps the
179 auto const buffer = free_buffers.back();221 * number of prefilled buffers limited to one (equivalent to double-
180 free_buffers.pop_back();222 * buffering) for minimal latency.
181 give_buffer_to_client(buffer, lock);
182 return;
183 }
184
185 /* No empty buffers, attempt allocating more
186 * TODO: need a good heuristic to switch
187 * between double-buffering to n-buffering
188 */223 */
189 int const allocated_buffers = buffers.size();224 if (client_ahead_of_compositor())
190 if (allocated_buffers < nbuffers)225 return;
226
227 if (auto buf = get_a_free_buffer())
191 {228 {
192 auto const& buffer = gralloc->alloc_buffer(the_properties);229 give_buffer_to_client(buf, lock);
193 buffers.push_back(buffer);
194 give_buffer_to_client(buffer.get(), lock);
195 return;230 return;
196 }231 }
197232
@@ -235,7 +270,14 @@
235 std::unique_lock<decltype(guard)> lock(guard);270 std::unique_lock<decltype(guard)> lock(guard);
236271
237 bool use_current_buffer = false;272 bool use_current_buffer = false;
238 if (!is_a_current_buffer_user(user_id))273 if (is_a_current_buffer_user(user_id)) // Primary/fastest display
274 {
275 if (ready_to_composite_queue.empty())
276 frame_deadlines_met = 0;
277 else if (frame_deadlines_met < frame_deadlines_threshold)
278 ++frame_deadlines_met;
279 }
280 else // Second and subsequent displays sync to the primary one
239 {281 {
240 use_current_buffer = true;282 use_current_buffer = true;
241 current_buffer_users.push_back(user_id);283 current_buffer_users.push_back(user_id);
@@ -267,6 +309,15 @@
267 current_buffer_users.push_back(user_id);309 current_buffer_users.push_back(user_id);
268 current_compositor_buffer = pop(ready_to_composite_queue);310 current_compositor_buffer = pop(ready_to_composite_queue);
269 current_compositor_buffer_valid = true;311 current_compositor_buffer_valid = true;
312
313 /*
314 * If we just emptied the ready queue above and hold this compositor
315 * buffer for very long (e.g. bypass) then there's a chance the client
316 * would starve and miss a frame. Make sure that can't happen, by
317 * using more than double buffers...
318 */
319 if (!buffer_to_release && !client_ahead_of_compositor())
320 buffer_to_release = get_a_free_buffer();
270 }321 }
271 else if (current_buffer_users.empty())322 else if (current_buffer_users.empty())
272 { // current_buffer_users and ready_to_composite_queue both empty323 { // current_buffer_users and ready_to_composite_queue both empty
@@ -375,14 +426,40 @@
375 ++count;426 ++count;
376 }427 }
377428
429 /*
430 * Intentionally schedule one more frame than we need, and for good
431 * reason... We can only accurately detect frame_deadlines_met in
432 * compositor_acquire if compositor_acquire is still waking up at full
433 * frame rate even with a slow client. This is crucial to scaling the
434 * queue performance dynamically in "client_ahead_of_compositor".
435 * But don't be concerned; very little is lost by over-scheduling. Under
436 * normal smooth rendering conditions all frames are used (not wasted).
437 * And under sluggish client rendering conditions the extra frame has a
438 * critical role in providing a sample point in which we detect if the
439 * client is keeping up. Only when the compositor changes from active to
440 * idle is the extra frame wasted. Sounds like a reasonable price to pay
441 * for dynamic performance monitoring.
442 */
443 if (count && frame_deadlines_threshold >= 0)
444 ++count;
445
378 return count;446 return count;
379}447}
380448
449/*
450 * This function is a kludge used in tests only. It attempts to predict how
451 * many calls to client_acquire you can make before it blocks.
452 * Future tests should not use it.
453 */
381int mc::BufferQueue::buffers_free_for_client() const454int mc::BufferQueue::buffers_free_for_client() const
382{455{
383 std::lock_guard<decltype(guard)> lock(guard);456 std::lock_guard<decltype(guard)> lock(guard);
384 int ret = 1;457 int ret = 1;
385 if (nbuffers > 1)458 if (nbuffers == 1)
459 ret = 1;
460 else if (client_ahead_of_compositor()) // client_acquire will block
461 ret = 0;
462 else
386 {463 {
387 int nfree = free_buffers.size();464 int nfree = free_buffers.size();
388 int future_growth = nbuffers - buffers.size();465 int future_growth = nbuffers - buffers.size();
@@ -456,7 +533,7 @@
456 mg::Buffer* buffer,533 mg::Buffer* buffer,
457 std::unique_lock<std::mutex> lock)534 std::unique_lock<std::mutex> lock)
458{535{
459 if (!pending_client_notifications.empty())536 if (!pending_client_notifications.empty() && !client_ahead_of_compositor())
460 {537 {
461 framedrop_policy->swap_unblocked();538 framedrop_policy->swap_unblocked();
462 give_buffer_to_client(buffer, lock);539 give_buffer_to_client(buffer, lock);
463540
=== modified file 'src/server/compositor/buffer_queue.h'
--- src/server/compositor/buffer_queue.h 2015-04-30 11:36:36 +0000
+++ src/server/compositor/buffer_queue.h 2015-05-14 06:01:15 +0000
@@ -65,6 +65,15 @@
65 void drop_old_buffers() override;65 void drop_old_buffers() override;
66 void drop_client_requests() override;66 void drop_client_requests() override;
6767
68 /**
69 * Set the minimum number of smooth frames the client must keep up with
70 * the compositor for in order to qualify for queue scaling (dynamic
71 * double buffering for reduced latency). A negative value means never
72 * but it's recommended that you never change this.
73 */
74 void set_scaling_delay(int nframes);
75 int scaling_delay() const;
76
68private:77private:
69 class LockableCallback;78 class LockableCallback;
70 enum SnapshotWait79 enum SnapshotWait
@@ -94,7 +103,12 @@
94103
95 std::deque<Callback> pending_client_notifications;104 std::deque<Callback> pending_client_notifications;
96105
106 bool client_ahead_of_compositor() const;
107 graphics::Buffer* get_a_free_buffer();
108
97 int nbuffers;109 int nbuffers;
110 int frame_deadlines_threshold;
111 int frame_deadlines_met;
98 bool frame_dropping_enabled;112 bool frame_dropping_enabled;
99 bool current_compositor_buffer_valid;113 bool current_compositor_buffer_valid;
100 graphics::BufferProperties the_properties;114 graphics::BufferProperties the_properties;
101115
=== modified file 'tests/unit-tests/compositor/test_buffer_queue.cpp'
--- tests/unit-tests/compositor/test_buffer_queue.cpp 2015-04-30 11:36:36 +0000
+++ tests/unit-tests/compositor/test_buffer_queue.cpp 2015-05-14 06:01:15 +0000
@@ -151,7 +151,8 @@
151 return handle->buffer();151 return handle->buffer();
152}152}
153153
154void compositor_thread(mc::BufferQueue &bundle, std::atomic<bool> &done)154void unthrottled_compositor_thread(mc::BufferQueue &bundle,
155 std::atomic<bool> &done)
155{156{
156 while (!done)157 while (!done)
157 {158 {
@@ -160,6 +161,36 @@
160 }161 }
161}162}
162163
164std::chrono::milliseconds const throttled_compositor_rate(10);
165void throttled_compositor_thread(mc::BufferQueue &bundle,
166 std::atomic<bool> &done)
167{
168 while (!done)
169 {
170 bundle.compositor_release(bundle.compositor_acquire(nullptr));
171 std::this_thread::sleep_for(throttled_compositor_rate);
172 }
173}
174
175void overlapping_compositor_thread(mc::BufferQueue &bundle,
176 std::atomic<bool> &done)
177{
178 std::shared_ptr<mg::Buffer> b[2];
179 int i = 0;
180
181 b[0] = bundle.compositor_acquire(nullptr);
182 while (!done)
183 {
184 b[i^1] = bundle.compositor_acquire(nullptr);
185 bundle.compositor_release(b[i]);
186 std::this_thread::sleep_for(throttled_compositor_rate);
187 i ^= 1;
188 }
189
190 if (b[i])
191 bundle.compositor_release(b[i]);
192}
193
163void snapshot_thread(mc::BufferQueue &bundle,194void snapshot_thread(mc::BufferQueue &bundle,
164 std::atomic<bool> &done)195 std::atomic<bool> &done)
165{196{
@@ -420,10 +451,13 @@
420 {451 {
421 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);452 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
422453
454 // This test is technically not valid with dynamic queue scaling on
455 q.set_scaling_delay(-1);
456
423 std::atomic<bool> done(false);457 std::atomic<bool> done(false);
424 auto unblock = [&done] { done = true; };458 auto unblock = [&done] { done = true; };
425 mt::AutoUnblockThread compositor(unblock,459 mt::AutoUnblockThread compositor(unblock,
426 compositor_thread, std::ref(q), std::ref(done));460 unthrottled_compositor_thread, std::ref(q), std::ref(done));
427461
428 std::unordered_set<uint32_t> ids_acquired;462 std::unordered_set<uint32_t> ids_acquired;
429 int const max_ownable_buffers = nbuffers - 1;463 int const max_ownable_buffers = nbuffers - 1;
@@ -537,8 +571,7 @@
537 {571 {
538 std::deque<mg::BufferID> client_release_sequence;572 std::deque<mg::BufferID> client_release_sequence;
539 std::vector<mg::Buffer *> buffers;573 std::vector<mg::Buffer *> buffers;
540 int const max_ownable_buffers = nbuffers - 1;574 for (int i = 0; i < q.buffers_free_for_client(); ++i)
541 for (int i = 0; i < max_ownable_buffers; ++i)
542 {575 {
543 auto handle = client_acquire_async(q);576 auto handle = client_acquire_async(q);
544 ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));577 ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
@@ -804,7 +837,7 @@
804837
805 auto unblock = [&done]{ done = true;};838 auto unblock = [&done]{ done = true;};
806839
807 mt::AutoUnblockThread compositor(unblock, compositor_thread,840 mt::AutoUnblockThread compositor(unblock, unthrottled_compositor_thread,
808 std::ref(q),841 std::ref(q),
809 std::ref(done));842 std::ref(done));
810 mt::AutoUnblockThread snapshotter1(unblock, snapshot_thread,843 mt::AutoUnblockThread snapshotter1(unblock, snapshot_thread,
@@ -1393,8 +1426,14 @@
1393 // Consume1426 // Consume
1394 for (int m = 0; m < nmonitors; ++m)1427 for (int m = 0; m < nmonitors; ++m)
1395 {1428 {
1396 ASSERT_EQ(1, q.buffers_ready_for_compositor(&monitor[m]));1429 ASSERT_NE(0, q.buffers_ready_for_compositor(&monitor[m]));
1397 q.compositor_release(q.compositor_acquire(&monitor[m]));1430
1431 // Double consume to account for the +1 that
1432 // buffers_ready_for_compositor adds to do dynamic performance
1433 // detection.
1434 q.compositor_release(q.compositor_acquire(&monitor[m]));
1435 q.compositor_release(q.compositor_acquire(&monitor[m]));
1436
1398 ASSERT_EQ(0, q.buffers_ready_for_compositor(&monitor[m]));1437 ASSERT_EQ(0, q.buffers_ready_for_compositor(&monitor[m]));
1399 }1438 }
1400 }1439 }
@@ -1466,7 +1505,7 @@
1466 std::atomic<bool> done(false);1505 std::atomic<bool> done(false);
14671506
1468 auto unblock = [&done]{ done = true; };1507 auto unblock = [&done]{ done = true; };
1469 mt::AutoUnblockThread compositor_thread(unblock, [&]1508 mt::AutoUnblockThread unthrottled_compositor_thread(unblock, [&]
1470 {1509 {
1471 while (!done)1510 while (!done)
1472 {1511 {
@@ -1552,6 +1591,9 @@
1552 {1591 {
1553 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);1592 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
15541593
1594 // This test is technically not valid with dynamic queue scaling on
1595 q.set_scaling_delay(-1);
1596
1555 void const* main_compositor = reinterpret_cast<void const*>(0);1597 void const* main_compositor = reinterpret_cast<void const*>(0);
1556 void const* second_compositor = reinterpret_cast<void const*>(1);1598 void const* second_compositor = reinterpret_cast<void const*>(1);
15571599
@@ -1579,7 +1621,7 @@
1579 std::atomic<bool> done(false);1621 std::atomic<bool> done(false);
1580 auto unblock = [&done] { done = true; };1622 auto unblock = [&done] { done = true; };
1581 mt::AutoUnblockThread compositor(unblock,1623 mt::AutoUnblockThread compositor(unblock,
1582 compositor_thread, std::ref(q), std::ref(done));1624 unthrottled_compositor_thread, std::ref(q), std::ref(done));
15831625
1584 std::unordered_set<mg::Buffer *> unique_buffers_acquired;1626 std::unordered_set<mg::Buffer *> unique_buffers_acquired;
1585 int const max_ownable_buffers = nbuffers - 1;1627 int const max_ownable_buffers = nbuffers - 1;
@@ -1606,32 +1648,131 @@
1606 }1648 }
1607}1649}
16081650
1609/* FIXME (enabling this optimization breaks timing tests) */1651// Test that dynamic queue scaling/throttling actually works
1610TEST_F(BufferQueueTest, DISABLED_synchronous_clients_only_get_two_real_buffers)1652TEST_F(BufferQueueTest, queue_size_scales_with_client_performance)
1653{
1654 for (int nbuffers = 3; nbuffers <= max_nbuffers_to_test; ++nbuffers)
1655 {
1656 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
1657 q.allow_framedropping(false);
1658
1659 std::atomic<bool> done(false);
1660 auto unblock = [&done] { done = true; };
1661
1662 // To emulate a "fast" client we use a "slow" compositor
1663 mt::AutoUnblockThread compositor(unblock,
1664 throttled_compositor_thread, std::ref(q), std::ref(done));
1665
1666 std::unordered_set<mg::Buffer *> buffers_acquired;
1667
1668 int const delay = q.scaling_delay();
1669 EXPECT_EQ(3, delay); // expect a sane default
1670
1671 for (int frame = 0; frame < 10; frame++)
1672 {
1673 auto handle = client_acquire_async(q);
1674 handle->wait_for(std::chrono::seconds(1));
1675 ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
1676
1677 if (frame > delay)
1678 buffers_acquired.insert(handle->buffer());
1679 handle->release_buffer();
1680 }
1681 // Expect double-buffers for fast clients
1682 EXPECT_THAT(buffers_acquired.size(), Eq(2));
1683
1684 // Now check what happens if the client becomes slow...
1685 buffers_acquired.clear();
1686 for (int frame = 0; frame < 10; frame++)
1687 {
1688 auto handle = client_acquire_async(q);
1689 handle->wait_for(std::chrono::seconds(1));
1690 ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
1691
1692 if (frame > delay)
1693 buffers_acquired.insert(handle->buffer());
1694
1695 // Client is just too slow to keep up:
1696 std::this_thread::sleep_for(throttled_compositor_rate * 1.5);
1697
1698 handle->release_buffer();
1699 }
1700 // Expect at least triple buffers for sluggish clients
1701 EXPECT_THAT(buffers_acquired.size(), Ge(3));
1702
1703 // And what happens if the client becomes fast again?...
1704 buffers_acquired.clear();
1705 for (int frame = 0; frame < 10; frame++)
1706 {
1707 auto handle = client_acquire_async(q);
1708 handle->wait_for(std::chrono::seconds(1));
1709 ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
1710
1711 if (frame > delay)
1712 buffers_acquired.insert(handle->buffer());
1713 handle->release_buffer();
1714 }
1715 // Expect double-buffers for fast clients
1716 EXPECT_THAT(buffers_acquired.size(), Eq(2));
1717 }
1718}
1719
1720TEST_F(BufferQueueTest, greedy_compositors_need_triple_buffers)
1721{
1722 /*
1723 * "Greedy" compositors means those that can hold multiple buffers from
1724 * the same client simultaneously or a single buffer for a long time.
1725 * This usually means bypass/overlays, but can also mean multi-monitor.
1726 */
1727 for (int nbuffers = 3; nbuffers <= max_nbuffers_to_test; ++nbuffers)
1728 {
1729 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
1730 q.allow_framedropping(false);
1731
1732 std::atomic<bool> done(false);
1733 auto unblock = [&done] { done = true; };
1734
1735 mt::AutoUnblockThread compositor(unblock,
1736 overlapping_compositor_thread, std::ref(q), std::ref(done));
1737
1738 std::unordered_set<mg::Buffer *> buffers_acquired;
1739 int const delay = q.scaling_delay();
1740
1741 for (int frame = 0; frame < 10; frame++)
1742 {
1743 auto handle = client_acquire_async(q);
1744 handle->wait_for(std::chrono::seconds(1));
1745 ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
1746
1747 if (frame > delay)
1748 buffers_acquired.insert(handle->buffer());
1749 handle->release_buffer();
1750 }
1751 // Expect triple buffers for the whole time
1752 EXPECT_THAT(buffers_acquired.size(), Ge(3));
1753 }
1754}
1755
1756TEST_F(BufferQueueTest, compositor_double_rate_of_slow_client)
1611{1757{
1612 for (int nbuffers = 2; nbuffers <= max_nbuffers_to_test; ++nbuffers)1758 for (int nbuffers = 2; nbuffers <= max_nbuffers_to_test; ++nbuffers)
1613 {1759 {
1614 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);1760 mc::BufferQueue q(nbuffers, allocator, basic_properties, policy_factory);
1615 q.allow_framedropping(false);1761 q.allow_framedropping(false);
16161762
1617 std::atomic<bool> done(false);1763 for (int frame = 0; frame < 10; frame++)
1618 auto unblock = [&done] { done = true; };
1619 mt::AutoUnblockThread compositor(unblock,
1620 compositor_thread, std::ref(q), std::ref(done));
1621
1622 std::unordered_set<mg::Buffer *> buffers_acquired;
1623
1624 for (int frame = 0; frame < 100; frame++)
1625 {1764 {
1626 auto handle = client_acquire_async(q);1765 ASSERT_EQ(0, q.buffers_ready_for_compositor(this));
1627 handle->wait_for(std::chrono::seconds(1));1766 q.client_release(client_acquire_sync(q));
1628 ASSERT_THAT(handle->has_acquired_buffer(), Eq(true));
16291767
1630 buffers_acquired.insert(handle->buffer());1768 // Detecting a slow client requires scheduling at least one extra
1631 handle->release_buffer();1769 // frame...
1770 int nready = q.buffers_ready_for_compositor(this);
1771 ASSERT_EQ(2, nready);
1772 for (int i = 0; i < nready; ++i)
1773 q.compositor_release(q.compositor_acquire(this));
1774 ASSERT_EQ(0, q.buffers_ready_for_compositor(this));
1632 }1775 }
1633
1634 EXPECT_THAT(buffers_acquired.size(), Eq(2));
1635 }1776 }
1636}1777}
16371778

Subscribers

People subscribed via source and target branches