Mir

GLRenderer: The default fragment shader is sub-optimal for alpha=1.0

Bug #1350674 reported by Daniel van Vugt
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Medium
Daniel van Vugt
mir (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

GLRenderer: The default fragment shader is sub-optimal for alpha=1.0. It needlessly multiplies all four colour components on every fragment...

const GLchar* fragment_shader_src =
{
    "precision mediump float;\n"
    "uniform sampler2D tex;\n"
    "uniform float alpha;\n"
    "varying vec2 v_texcoord;\n"
    "void main() {\n"
    " vec4 frag = texture2D(tex, v_texcoord);\n"
    " gl_FragColor = alpha*frag;\n"
    "}\n"
};

If alpha is 1.0 (as it usually is) then we should take a more efficient shader than that.

Tags: performance

Related branches

kevin gunn (kgunn72)
Changed in mir:
importance: Undecided → High
Mirco Müller (macslow)
Changed in mir:
assignee: nobody → Mirco Müller (macslow)
Mirco Müller (macslow)
Changed in mir:
status: New → In Progress
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

@Mirco

I don't think replacing a multiply with a branch will be faster. It most likely will be slower specially in embedded devices.

Instead I would create two separate shaders, one that does not do alpha multiplies and the default one and activate the respective shader by testing renderable.alpha() == 1.0

Changed in mir:
milestone: none → 0.8.0
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

Thanks Mirco!

> I don't think replacing a multiply with a branch will be faster. It most likely will be slower specially in embedded devices.
> Instead I would create two separate shaders, one that does not do alpha multiplies and the default one and activate
> the respective shader by testing renderable.alpha() == 1.0

It's difficult to tell. I remember from past benchmarks that a single if() in a shader didn't bring any performance penalty, although it could bring a slight power penalty, since the GPUs tended to handle this case by executing both paths concurrently.
It's also possible that the overhead of the multiplication is actually so small, that it doesn't matter if we just keep the current shader and use it all the time. Using separate shaders we get a simpler/better shader for the opaque case, but we also have the overhead of switching GL programs.

So, it's not immediately clear what's best, and my plan is to benchmark all of these approaches. In all approaches there are trade-offs, so we need to ensure that what we select the one that offers the best performance for our use cases and hardware.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I had the same idea as what Alberto suggests. At least that way we don't need to benchmark and it would be obvious that we're then using the most efficient shader possible.

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> At least that way we don't need to benchmark and it would be obvious that we're then using the most efficient shader possible.

We still need to benchmark, because, at least for a scene that contains both opaque and translucent elements, we have the potential overhead of switching GL programs.

Revision history for this message
Mirco Müller (macslow) wrote :

It all comes down to either create a good set of synthetic test-cases or use current unity8/dash rendering-load to determine what shader-approach gives the best balance between speed-up and power-consumption penalty.

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

I am tempted to mark this bug as invalid/opinion. GPUs are really optimized for x*y operations (either vectors or matrices), so I think special casing on alpha != 1.0 (however we choose to do it) for such a simple shader is pointless (unless we get hard data that proves otherwise). Performing some synthetic benchmarks on the phone, which showed no benefit when using either an if() clause or a separate shader, supports this.

Revision history for this message
Mirco Müller (macslow) wrote :

I'm ok with marking this bug as "invalid", since you were able to verify that our planned optimizations do not result in any improvements.

Out of pure curiosity... on which SoC/desktop-GPUs did you check this?

Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

Composition is a memory bw bounded operation rather than shader bounded. I agree with Alberto and add that if you really want to see perf/power improvements then use a 16-bit renderable (like RGB565) whenever renderable.alpha()==1.0, instead of ARGB32.

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> Out of pure curiosity... on which SoC/desktop-GPUs did you check this?

Mali-400 MP and Intel Ironlake (Arrandale) Mobile. Some interesting observations:

* On both GPUs replacing "gl_FragColor = alpha * frag" with "gl_FragColor = frag" has no performance effect. That is, for our simple shader, the muplication is performed for "free". Just this indicates that there is no point in trying to improve the shader.

* On Ironlake using the if clause version reduces the performance noticeably, whereas on Mali-400 MP it has no effect at all.

If you want to experiment further, I have found the glmark2 "effects2d" benchmark to be good starting point. Just edit the data/shaders/effect-2d-convolution.frag shader to your liking (don't forget to remove the $convolution$ replacement variable) and adjust the source code accordingly if you introduce any "uniform" variables. There is also the --off-screen option which renders to an offs-creen surface if the on-screen results are too unstable.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK, we don't have any evidence that the current shader is any slower than a simpler shader. It's possibly not.

Although I would also suggest we need to do some long term power usage comparison too. Because maybe we won't notice the performance difference but those extra millions of vector multiplications per frame could impact power draw. Sounds plausible.

Changed in mir:
milestone: 0.8.0 → none
status: In Progress → Opinion
status: Opinion → Incomplete
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Actually, I would be astonished if a simpler shader did not significantly help power usage or performance. Consider the numbers:

1920x1200 @ 60Hz, 32bpp = 552960000 alpha multiplications per second.

That 553 MFLOPS to be saved from the poor overworked GPU! :)

Changed in mir:
assignee: Mirco Müller (macslow) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Mir because there has been no activity for 60 days.]

Changed in mir:
status: Incomplete → Expired
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I did some power usage testing on a laptop a couple of weeks ago and found no benefit to using a simpler shader. But that's only one data point. As soon as we find a single system that a simpler shader helps (maybe software rendering?) then it will become important.

Changed in mir:
status: Expired → Triaged
importance: High → Medium
Changed in mir:
assignee: nobody → Daniel van Vugt (vanvugt)
milestone: none → 0.10.0
status: Triaged → In Progress
Changed in mir:
milestone: 0.10.0 → 0.11.0
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.11.0

Changed in mir:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package mir - 0.11.0+15.04.20150209.1-0ubuntu1

---------------
mir (0.11.0+15.04.20150209.1-0ubuntu1) vivid; urgency=medium

  [ Daniel van Vugt ]
  * New upstream release 0.11.0 (https://launchpad.net/mir/+milestone/0.11.0)
    - Enhancements:
      . Lots more major plumbing in the Android code, on the path to
        supporting external displays.
      . Add support for clang 3.6.
      . Major redesign of server classes in mir::shell,scene and friends
        (still in progress).
      . Added client API for creating dialogs and tooltips.
      . Added new surface states: mir_surface_state_hidden and
        mir_surface_state_horizmaximized.
      . Performance: Use optimally efficient fragment shading when possible.
      . Performance: (Desktop) Composite using double buffering instead of
        triple to reduce visible lag.
      . mir_proving_server: Can now resize windows from any edge or corner
        using the existing Alt+middlebuttondrag.
      . mir_proving_server: Added some demo custom shaders (negative and
        high contrast modes: Super+N/C).
      . mir_proving_server: Can now close clients politely via Alt+F4.
      . Added MirPointerInputEvent (part of the new input API, the old
        MirMotionEvent is still supported also for now).
    - ABI summary: Servers need rebuilding, but clients do not;
      . Mirclient ABI unchanged at 8
      . Mircommon ABI unchanged at 3
      . Mirplatform ABI bumped to 6
      . Mirserver ABI bumped to 29
    - Bug fixes:
      . [regression] mir_demo_server exits immediately with boost
        bad_any_cast exception (LP: #1414630)
      . need way to position menus and tooltips (relative positioning to
        parent) (LP: #1324101)
      . GLibMainLoopTest failure seen in CI (LP: #1413748)
      . Clang builds fail in CI (LP: #1416317)
      . segfault in mir::compositor::GLProgramFamily::Shader::init()
        (LP: #1416482)
      . GLRenderer: The default fragment shader is sub-optimal for alpha=1.0
        (LP: #1350674)
      . mesa::DisplayBuffer::post_update is triple buffered - more laggy than
        it needs to be (LP: #1350725)
      . Cannot connect to nested server when started from a differen vt
        (LP: #1379266)
      . [testfail] AsioMainLoopAlarmTest fails in CI (LP: #1392256)
      . Compositor report inconsistently reports frame time during bypass,
        and render time otherwise (LP: #1408906)
      . [regression] mir_demo_client_fingerpaint doesn't paint anything any
        more (with the mouse) (LP: #1413139)
      . Hardware cursor is always slightly ahead of the composited image
        (LP: #1274408)
      . integration tests are outputting (too many) DisplayServer log
        messages (LP: #1408231)
      . [regression] deploy-and-test.sh doesn't work any more (unless you
        have umockdev installed already) (LP: #1413479)
      . Color Inverse on display. Toggle Negative Image (LP: #1400580)
      . mir-ubuntu-vivid-armhf-ci fails consistently (LP: #1407863)
      . Double-buffered surfaces may lag or freeze if event driven and not
        constantly redrawing (LP: #1395581)
      . Pointer motion and crossing events...

Read more...

Changed in mir (Ubuntu):
status: New → Fix Released
Changed in mir:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.