Bug #1350674 “GLRenderer: The default fragment shader is sub-opt...” : Bugs : Mir

kevin gunn (kgunn72) on 2014-08-19

Changed in mir:
importance:	Undecided → High

Mirco Müller (macslow) on 2014-09-08

Changed in mir:
assignee:	nobody → Mirco Müller (macslow)

Mirco Müller (macslow) on 2014-09-25

Changed in mir:
status:	New → In Progress

Revision history for this message

Alberto Aguirre (albaguirre) wrote on 2014-09-25:

#1

@Mirco

I don't think replacing a multiply with a branch will be faster. It most likely will be slower specially in embedded devices.

Instead I would create two separate shaders, one that does not do alpha multiplies and the default one and activate the respective shader by testing renderable.alpha() == 1.0

Daniel van Vugt (vanvugt) on 2014-09-26

Changed in mir:
milestone:	none → 0.8.0

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-09-26:

#2

Thanks Mirco!

> I don't think replacing a multiply with a branch will be faster. It most likely will be slower specially in embedded devices.
> Instead I would create two separate shaders, one that does not do alpha multiplies and the default one and activate
> the respective shader by testing renderable.alpha() == 1.0

It's difficult to tell. I remember from past benchmarks that a single if() in a shader didn't bring any performance penalty, although it could bring a slight power penalty, since the GPUs tended to handle this case by executing both paths concurrently.
It's also possible that the overhead of the multiplication is actually so small, that it doesn't matter if we just keep the current shader and use it all the time. Using separate shaders we get a simpler/better shader for the opaque case, but we also have the overhead of switching GL programs.

So, it's not immediately clear what's best, and my plan is to benchmark all of these approaches. In all approaches there are trade-offs, so we need to ensure that what we select the one that offers the best performance for our use cases and hardware.

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-09-26:

#3

I had the same idea as what Alberto suggests. At least that way we don't need to benchmark and it would be obvious that we're then using the most efficient shader possible.

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-09-26:

#4

> At least that way we don't need to benchmark and it would be obvious that we're then using the most efficient shader possible.

We still need to benchmark, because, at least for a scene that contains both opaque and translucent elements, we have the potential overhead of switching GL programs.

Revision history for this message

Mirco Müller (macslow) wrote on 2014-09-26:

#5

It all comes down to either create a good set of synthetic test-cases or use current unity8/dash rendering-load to determine what shader-approach gives the best balance between speed-up and power-consumption penalty.

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-09-26:

#6

I am tempted to mark this bug as invalid/opinion. GPUs are really optimized for x*y operations (either vectors or matrices), so I think special casing on alpha != 1.0 (however we choose to do it) for such a simple shader is pointless (unless we get hard data that proves otherwise). Performing some synthetic benchmarks on the phone, which showed no benefit when using either an if() clause or a separate shader, supports this.

Revision history for this message

Mirco Müller (macslow) wrote on 2014-09-26:

#7

I'm ok with marking this bug as "invalid", since you were able to verify that our planned optimizations do not result in any improvements.

Out of pure curiosity... on which SoC/desktop-GPUs did you check this?

Revision history for this message

Cemil Azizoglu (cemil-azizoglu) wrote on 2014-09-26:

#8

Composition is a memory bw bounded operation rather than shader bounded. I agree with Alberto and add that if you really want to see perf/power improvements then use a 16-bit renderable (like RGB565) whenever renderable.alpha()==1.0, instead of ARGB32.

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-09-29:

#9

> Out of pure curiosity... on which SoC/desktop-GPUs did you check this?

Mali-400 MP and Intel Ironlake (Arrandale) Mobile. Some interesting observations:

* On both GPUs replacing "gl_FragColor = alpha * frag" with "gl_FragColor = frag" has no performance effect. That is, for our simple shader, the muplication is performed for "free". Just this indicates that there is no point in trying to improve the shader.

* On Ironlake using the if clause version reduces the performance noticeably, whereas on Mali-400 MP it has no effect at all.

If you want to experiment further, I have found the glmark2 "effects2d" benchmark to be good starting point. Just edit the data/shaders/effect-2d-convolution.frag shader to your liking (don't forget to remove the $convolution$ replacement variable) and adjust the source code accordingly if you introduce any "uniform" variables. There is also the --off-screen option which renders to an offs-creen surface if the on-screen results are too unstable.

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-09-30:

#10

OK, we don't have any evidence that the current shader is any slower than a simpler shader. It's possibly not.

Although I would also suggest we need to do some long term power usage comparison too. Because maybe we won't notice the performance difference but those extra millions of vector multiplications per frame could impact power draw. Sounds plausible.

Changed in mir:
milestone:	0.8.0 → none
status:	In Progress → Opinion
status:	Opinion → Incomplete

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-10-06:

#11

Actually, I would be astonished if a simpler shader did not significantly help power usage or performance. Consider the numbers:

1920x1200 @ 60Hz, 32bpp = 552960000 alpha multiplications per second.

That 553 MFLOPS to be saved from the poor overworked GPU! :)

Changed in mir:
assignee:	Mirco Müller (macslow) → nobody

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-12-06:

#12

[Expired for Mir because there has been no activity for 60 days.]

Changed in mir:
status:	Incomplete → Expired

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-12-08:

#13

I did some power usage testing on a laptop a couple of weeks ago and found no benefit to using a simpler shader. But that's only one data point. As soon as we find a single system that a simpler shader helps (maybe software rendering?) then it will become important.

Changed in mir:
status:	Expired → Triaged
importance:	High → Medium

Daniel van Vugt (vanvugt) on 2014-12-31

Changed in mir:
assignee:	nobody → Daniel van Vugt (vanvugt)
milestone:	none → 0.10.0
status:	Triaged → In Progress

Daniel van Vugt (vanvugt) on 2015-01-08

Changed in mir:
milestone:	0.10.0 → 0.11.0

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2015-01-09:

#14

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.11.0

Changed in mir:
status:	In Progress → Fix Committed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-02-10:

#15

Download full text (3.2 KiB)

This bug was fixed in the package mir - 0.11.0+15.04.20150209.1-0ubuntu1

---------------
mir (0.11.0+15.04.20150209.1-0ubuntu1) vivid; urgency=medium

  [ Daniel van Vugt ]
  * New upstream release 0.11.0 (https://launchpad.net/mir/+milestone/0.11.0)
    - Enhancements:
      . Lots more major plumbing in the Android code, on the path to
        supporting external displays.
      . Add support for clang 3.6.
      . Major redesign of server classes in mir::shell,scene and friends
        (still in progress).
      . Added client API for creating dialogs and tooltips.
      . Added new surface states: mir_surface_state_hidden and
        mir_surface_state_horizmaximized.
      . Performance: Use optimally efficient fragment shading when possible.
      . Performance: (Desktop) Composite using double buffering instead of
        triple to reduce visible lag.
      . mir_proving_server: Can now resize windows from any edge or corner
        using the existing Alt+middlebuttondrag.
      . mir_proving_server: Added some demo custom shaders (negative and
        high contrast modes: Super+N/C).
      . mir_proving_server: Can now close clients politely via Alt+F4.
      . Added MirPointerInputEvent (part of the new input API, the old
        MirMotionEvent is still supported also for now).
    - ABI summary: Servers need rebuilding, but clients do not;
      . Mirclient ABI unchanged at 8
      . Mircommon ABI unchanged at 3
      . Mirplatform ABI bumped to 6
      . Mirserver ABI bumped to 29
    - Bug fixes:
      . [regression] mir_demo_server exits immediately with boost
        bad_any_cast exception (LP: #1414630)
      . need way to position menus and tooltips (relative positioning to
        parent) (LP: #1324101)
      . GLibMainLoopTest failure seen in CI (LP: #1413748)
      . Clang builds fail in CI (LP: #1416317)
      . segfault in mir::compositor::GLProgramFamily::Shader::init()
        (LP: #1416482)
      . GLRenderer: The default fragment shader is sub-optimal for alpha=1.0
        (LP: #1350674)
      . mesa::DisplayBuffer::post_update is triple buffered - more laggy than
        it needs to be (LP: #1350725)
      . Cannot connect to nested server when started from a differen vt
        (LP: #1379266)
      . [testfail] AsioMainLoopAlarmTest fails in CI (LP: #1392256)
      . Compositor report inconsistently reports frame time during bypass,
        and render time otherwise (LP: #1408906)
      . [regression] mir_demo_client_fingerpaint doesn't paint anything any
        more (with the mouse) (LP: #1413139)
      . Hardware cursor is always slightly ahead of the composited image
        (LP: #1274408)
      . integration tests are outputting (too many) DisplayServer log
        messages (LP: #1408231)
      . [regression] deploy-and-test.sh doesn't work any more (unless you
        have umockdev installed already) (LP: #1413479)
      . Color Inverse on display. Toggle Negative Image (LP: #1400580)
      . mir-ubuntu-vivid-armhf-ci fails consistently (LP: #1407863)
      . Double-buffered surfaces may lag or freeze if event driven and not
        constantly redrawing (LP: #1395581)
      . Pointer motion and crossing events...

This bug was fixed in the package mir - 0.11.0+15.04.20150209.1-0ubuntu1

---------------
mir (0.11.0+15.04.20150209.1-0ubuntu1) vivid; urgency=medium

[ Daniel van Vugt ]
  * New upstream release 0.11.0 (https://launchpad.net/mir/+milestone/0.11.0)
    - Enhancements:
      . Lots more major plumbing in the Android code, on the path to
        supporting external displays.
      . Add support for clang 3.6.
      . Major redesign of server classes in mir::shell,scene and friends
        (still in progress).
      . Added client API for creating dialogs and tooltips.
      . Added new surface states: mir_surface_state_hidden and
        mir_surface_state_horizmaximized.
      . Performance: Use optimally efficient fragment shading when possible.
      . Performance: (Desktop) Composite using double buffering instead of
        triple to reduce visible lag.
      . mir_proving_server: Can now resize windows from any edge or corner
        using the existing Alt+middlebuttondrag.
      . mir_proving_server: Added some demo custom shaders (negative and
        high contrast modes: Super+N/C).
      . mir_proving_server: Can now close clients politely via Alt+F4.
      . Added MirPointerInputEvent (part of the new input API, the old
        MirMotionEvent is still supported also for now).
    - ABI summary: Servers need rebuilding, but clients do not;
      . Mirclient ABI unchanged at 8
      . Mircommon ABI unchanged at 3
      . Mirplatform ABI bumped to 6
      . Mirserver ABI bumped to 29
    - Bug fixes:
      . [regression] mir_demo_server exits immediately with boost
        bad_any_cast exception (LP: #1414630)
      . need way to position menus and tooltips (relative positioning to
        parent) (LP: #1324101)
      . GLibMainLoopTest failure seen in CI (LP: #1413748)
      . Clang builds fail in CI (LP: #1416317)
      . segfault in mir::compositor::GLProgramFamily::Shader::init()
        (LP: #1416482)
      . GLRenderer: The default fragment shader is sub-optimal for alpha=1.0
        (LP: #1350674)
      . mesa::DisplayBuffer::post_update is triple buffered - more laggy than
        it needs to be (LP: #1350725)
      . Cannot connect to nested server when started from a differen vt
        (LP: #1379266)
      . [testfail] AsioMainLoopAlarmTest fails in CI (LP: #1392256)
      . Compositor report inconsistently reports frame time during bypass,
        and render time otherwise (LP: #1408906)
      . [regression] mir_demo_client_fingerpaint doesn't paint anything any
        more (with the mouse) (LP: #1413139)
      . Hardware cursor is always slightly ahead of the composited image
        (LP: #1274408)
      . integration tests are outputting (too many) DisplayServer log
        messages (LP: #1408231)
      . [regression] deploy-and-test.sh doesn't work any more (unless you
        have umockdev installed already) (LP: #1413479)
      . Color Inverse on display. Toggle Negative Image (LP: #1400580)
      . mir-ubuntu-vivid-armhf-ci fails consistently (LP: #1407863)
      . Double-buffered surfaces may lag or freeze if event driven and not
        constantly redrawing (LP: #1395581)
      . Pointer motion and crossing events are missing (LP: #1417650)
 -- Ubuntu daily release <ps-jenkins@lists.canonical.com>   Mon, 09 Feb 2015 21:27:35 +0000

Changed in mir (Ubuntu):
status:	New → Fix Released

Daniel van Vugt (vanvugt) on 2015-02-11

Changed in mir:
status:	Fix Committed → Fix Released

Mir

GLRenderer: The default fragment shader is sub-optimal for alpha=1.0

Bug Description

Related branches

Other bug subscribers

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	Mir	Fix Released	Medium	Daniel van Vugt	Mir 0.11.0
	mir (Ubuntu)	Fix Released	Undecided	Unassigned