[regression][DRI] SubBuffer rendering is much slower in compiz 0.9.8.0 than it was in 0.9.7

Bug #1037411 reported by Daniel van Vugt
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Compiz
Fix Released
Medium
Daniel van Vugt
Mesa
Won't Fix
Medium
compiz (Ubuntu)
Fix Released
Medium
Unassigned
Precise
Invalid
Undecided
Unassigned
Quantal
Fix Released
Medium
Unassigned
mesa (Ubuntu)
Invalid
Undecided
Unassigned
Precise
Invalid
Undecided
Unassigned
Quantal
Won't Fix
Undecided
Unassigned

Bug Description

(This is no longer the primary performance regression bug for compiz 0.9.8.0. Look at bug 1024304 instead)

TEST CASE:
1. CCSM > OpenGL >
     framebuffer_object = OFF
     vertex_buffer_object = OFF
     always_swap_buffers = OFF
2. Run graphics benchmarks.

Expected: Similar results to compiz 0.9.7
Observed: Much lower results than compiz 0.9.7

ORIGINAL DESCRIPTION:
Comparing graphics performance in a two-monitor configuration, I find the gles2 branch is 25-40% slower than trunk.

This would not normally be surprising, however the slowdown REMAINS even when I turn off the new rendering features in the gles2 branch: framebuffer_object, vertex_buffer_object, always_swap_buffers

So both branches should be using the same code path. But gles2 is still dramatically slower than trunk using two monitors.

The good news is that this bug only affects benchmark results and seemingly a little lag. The physical frame rate achieved with two monitors still seems to be higher using the gles2 branch, meaning it drops from 60Hz to 30Hz much less often than trunk does.

NOTE 1: This regression was allowed in compiz 0.9.8.0 because it is generally only visible in benchmark results. Meanwhile, physical compiz rendering performance (as reported by the compiz Benchmark plugin) is higher in compiz 0.9.8.0 than previous versions, in most cases.

NOTE 2: If you're just worried about fullscreen game performance, then you don't need to wait for this bug to be resolved. You can get optimal graphics performance with unredirect mode. But see bug 980663 first.

Related branches

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I'm guessing there's a logic change somewhere in the gles2 code, where something previously only done per output is now done on the whole screen.

framebuffer_object comes to mind, but it's not a factor here, as mentioned above.

Maybe it's because we use triangles instead of quads now. So the number of vertices etc is 50% higher in gles2.

Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [GLES] Graphics performance is 25-40% lower with gles2 than trunk

Confirmed with a single monitor too. Even if I turn off all the new gles2 features in the opengl plugin, the gles2 branch gets 15-20% lower scores from glmark2 and glxgears (env vblank_mode=0).

summary: - [GLES] Multimonitor graphics performance is 25-40% lower with gles2 than
- trunk
+ [GLES] Graphics performance is 25-40% lower with gles2 than trunk
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This might be due to the fact that gles2 has to render everything as triangles instead of quads. So we're now pushing around 2x the polygons and 1.5x the number of vertices that trunk does.

Changed in compiz:
assignee: nobody → Daniel van Vugt (vanvugt)
summary: - [GLES] Graphics performance is 25-40% lower with gles2 than trunk
+ [regression][GLES] Graphics performance is 25-40% lower with gles2 than
+ trunk
Changed in compiz:
milestone: 0.9.8.0 → 0.9.8.1
summary: - [regression][GLES] Graphics performance is 25-40% lower with gles2 than
- trunk
+ [regression][GLES] Benchmark results are 15-40% lower with the gles2
+ code
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [regression][GLES] Benchmark results are 15-40% lower with the gles2 code

OK, the problem is not our use of triangles instead of quads. I just tested quads and the slowdown remains.

Changed in compiz (Ubuntu):
milestone: none → ubuntu-12.10-beta-2
importance: Undecided → Medium
status: New → Triaged
Changed in compiz:
assignee: Daniel van Vugt (vanvugt) → nobody
summary: [regression][GLES] Benchmark results are 15-40% lower with the gles2
- code
+ code (compiz 0.9.8.0)
description: updated
description: updated
Changed in compiz:
milestone: 0.9.8.2 → 0.9.8.4
Changed in compiz:
assignee: nobody → Daniel van Vugt (vanvugt)
status: Triaged → In Progress
tags: added: performance
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [regression][GLES] Benchmark results are 15-40% lower with the gles2 code (compiz 0.9.8.0)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I'm pretty sure revision 3255 came from smspillaz. Though it's not obvious.

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

Ah okay, its the fact that we draw from an fbo to the backbuffer. Seems fair.

I'll have a look into using glBlitFramebuffer

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

Okay, I have a half implementation of using glBlitFramebuffer, and have some ideas as to how we might be able to preserve vsync while not using fbo's at all (still necessary for nvidia).

In progress.

Changed in compiz:
assignee: Daniel van Vugt (vanvugt) → Sam Spilsbury (smspillaz)
Changed in compiz:
assignee: Sam Spilsbury (smspillaz) → Daniel van Vugt (vanvugt)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I suspect the cause is either glClear (very expensive) being called too much, or this:
- gScreen->glPaintCompositedOutput (tmpRegion, scratchFbo, mask);
+ gScreen->glPaintCompositedOutput (screen->region (), scratchFbo, mask);

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

I'm a little confused by that.

Changing from tmpRegion to screen->region () effectively meant that the entire backing framebuffer object is painted to the backbuffer, rather than a small portion of it. This is necessary because we are always using glXSwapBuffers and the backbuffer becomes invalidated upon swap.

Why would this change have anything to do with the slowdown if the slowdown still occurrs when painting from a framebuffer object is turned off as you put in the description?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

No comment, until I actually test my theories and maybe come up with more.

Revision history for this message
Sam Spilsbury (smspillaz) wrote :

You could be right about glClear though. I notice that clearBuffers is not set to false again anywhere in screen.cpp

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The point is that forcing the glXCopySubBufferMESA code path is still slow after r3255. I was surprised too.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Sorry, my mistake.

The slowdown in r3255 is only due to using the FBO on every frame. If I modify r3255 to use regional redraws then its still fast. So the regression in regional redraw performance happened after r3255.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This is a little bit mind-bending, but I've found the problem, at least on my precise desktop using Intel Sandy Bridge graphics...

1. I disable all the fancy new opengl rendering options so compiz should use regional redraws, so useFbo is always false.
2. On startup, I get a couple of frames with (mask & COMPOSITE_SCREEN_DAMAGE_ALL_MASK) so doubleBuffer.render is called with fullscreen==true. This calls: GLXDoubleBuffer::swap() --> copyFrontToBack() because useFbo was false --> glCopyPixels().
3. After startup, all frames use GLXDoubleBuffer::blit() --> GL::copySubBuffer(), which is now slow.

To make rendering fast again, all I have to do is break the sequence in #2. Either:
  (a) Force fullscreen=false; or
  (b) Comment out copyFrontToBack(); or
  (c) Comment out glCopyPixels()

So the problem, on my desktop where I've done all my performance comparisons, is that calling glCopyPixels just once or twice on startup will forever make GL::copySubBuffer take a slow rendering path thereafter. If I ensure glCopyPixels is never touched on startup then Mesa stays in fast mode for GL::copySubBuffer.

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

I call glCopyPixels a couple of times briefly on startup and then never again. The problem is that doing this makes all subsequent rendering much slower. If I never call glCopyPixels on startup then rendering remains fast thereafter.

It seems glCopyPixels is modifying the context in a way that permanently cripples later operations. The only possible cause I can see so far is:

src/mesa/main/drawpix.c: _mesa_CopyPixels:
   /* We're not using the current vertex program, and the driver may install
    * it's own. Note: this may dirty some state.
    */
   _mesa_set_vp_override(ctx, GL_TRUE);

This seems to set a flag in the ctx which is never cleared.

Using Mesa 8.0.2 in Ubuntu 12.04

summary: - [regression][GLES] Benchmark results are 15-40% lower with the gles2
- code (compiz 0.9.8.0)
+ [regression][GLES] SubBuffer rendering is much slower in compiz 0.9.8.0
+ than it was in 0.9.7
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [regression][GLES] SubBuffer rendering is much slower in compiz 0.9.8.0 than it was in 0.9.7

All,

I have modified this bug to just be about a very specific problem I find on my machine. If you're concerned about general performance of compiz 0.9.8.0 with the default settings then please look at bug 1024304 instead.

description: updated
Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

I am using Intel Sandy Bridge HD 2000 by the way. I believe that means i965.

summary: - [regression][GLES] SubBuffer rendering is much slower in compiz 0.9.8.0
+ [regression][DRI] SubBuffer rendering is much slower in compiz 0.9.8.0
than it was in 0.9.7
Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

(In reply to comment #2)
> I call glCopyPixels a couple of times briefly on startup and then never again.
> The problem is that doing this makes all subsequent rendering much slower. If I
> never call glCopyPixels on startup then rendering remains fast thereafter.

What are the read and draw buffers for glCopyPixels? If either of them is GL_FRONT*, that will cause a DRI2 fake front buffer to be allocated and thereafter kept up to date wrt the real front buffer.

> This seems to set a flag in the ctx which is never cleared.

It is cleared:

end:
   _mesa_set_vp_override(ctx, GL_FALSE);

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

Yes, the read buffer is GL_FRONT in this case. So I guess the slow-down is by design in Mesa. I'm going to work around it in compiz anyway. glCopyPixels should never be touched at all really.

P.S. _mesa_set_vp_override(ctx, GL_FALSE) does not clear NewState. Which is what I was concerned about:

void
_mesa_set_vp_override(struct gl_context *ctx, GLboolean flag)
{
   if (ctx->VertexProgram._Overriden != flag) {
      ctx->VertexProgram._Overriden = flag;

      /* Set one of the bits which will trigger fragment program
       * regeneration:
       */
      ctx->NewState |= _NEW_PROGRAM;
   }
}

Changed in compiz (Ubuntu Precise):
status: New → Invalid
Changed in mesa (Ubuntu Precise):
status: New → Invalid
Changed in mesa (Ubuntu Quantal):
status: New → Triaged
Revision history for this message
In , Marek Olšák (maraeo) wrote :

Don't worry about NewState. It's cleared after every draw operation.

description: updated
Revision history for this message
In , Chris Forbes (chrisf-ijw) wrote :

Would it be reasonable to put a performance note in
ARB_debug_output/KHR_debug when mesa falls into this slow state?

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

(In reply to comment #3)
> Yes, the read buffer is GL_FRONT in this case. So I guess the slow-down is by
> design in Mesa.

Rather the X server / DRI2 protocol. It *might* be possible to make xserver not enforce the fake front buffer for the Composite Overlay Window, not sure.

> I'm going to work around it in compiz anyway. glCopyPixels
> should never be touched at all really.

Out of curiosity, what are you using it for?

Revision history for this message
In , Daniel van Vugt (vanvugt) wrote :

It is a fallback used for maintaining a persistent backbuffer if FBOs are not available. However it's not an important one because all drivers provide FBOs now.

http://bazaar.launchpad.net/~compiz-team/compiz/0.9.8/view/head:/plugins/opengl/src/screen.cpp#L1726

Changed in compiz:
status: In Progress → Fix Committed
Changed in mesa:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Fix committed into lp:compiz at revision 3370

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.2 KiB)

This bug was fixed in the package compiz - 1:0.9.8.2+bzr3377-0ubuntu1

---------------
compiz (1:0.9.8.2+bzr3377-0ubuntu1) quantal-proposed; urgency=low

  [ Sam Spilsbury ]
  * debian/python-compizconfig.install
    - Install compizconfig-python.pc
  * debian/patches/100_expo_layout.patch
    - re-add the expo layout that used to be in precise (LP: #1047067)
    - add some testcases

  [ Timo Jyrinki ]
  * New upstream snapshot.
    - Fix multiple window placement bugs (LP: #974242) (LP: #976032)
    - Don't waste CPU looping through and looking at all the windows if you're
      rendering an output that has no damage on it. (LP: #1014986)
    - Updated convert files to fix some typos in the key names. (LP: #1041631)
    - Fix crash when imgsvg is loaded, due to missing symbol
      (decor_apply_gravity from libdecoration). (LP: #956986)
    - Treat unresolved symbols at link time as an error, rather than letting
      them through and cause strange crashes later. (LP: #1043143)
    - Refactors a little bit of the upgrade code and gets it under test to
      prepare to fix (LP: #1042537)
    - Updated AUTHORS from the full bzr log, and re-sort the list.
      (LP: #1042095)
    - Fixes FTBFS for kde4-window-decorator (LP: #1041310)
    - Fix obvious omissions from the introduction of unminimize_*,
      which were causing the unminimize animation settings to be ignored
      (LP: #1040455)
    - resize plugin: don't crash if resize wasn't initiated externally
      (LP: #1045191)
    - Clean up capitalization (LP: #1045652)
    - Avoid division by zero, if plugins try to deform a window down to size
      zero. (LP: #1045235)
    - Make "Unredirect Fullscreen Windows" more reliable. This fixes the
      problem with unredirection failing to engage at all (LP: #1041066) when
      gtk-window-decorator creates offscreen windows that are stacked on top.
      This also fixes the problem with unredirect hiding all windows,
      because it thinks the desktop window should be stacked on top
      (LP: #980663).
    - Ensure unredirected windows don't stay unredirected if they're no longer
      on top. (LP: #1041047)
    - Fix launching terminal functionality and make show-hud default key
      visible. Update the defaults to org.compiz.integrated to reflect the
      actual gnome values pre-gnome-3. (LP: #1040081) (LP: #1046199)
      (LP: #1046190)
    - Fix show-hud, bump COMPIZ_GNOME_INTEGRATED_SETTINGS_LIST_SIZE.
      (LP: #1046212)
    - Fixed: Windows with an alpha-channel, like gnome-terminal, were not
      being considered as possibly covering fullscreen windows. But they most
      certainly can. This ensures such RGBA windows are visible if they're
      stacked above a fullscreen window. (LP: #1046661)
    - Remove ListToStringList (LP: #1046184)
    - Fix typo causing CMake Error (LP: #1045665)
    - Transitions gtk-window-decorator over to use GSettings. Add a testing
      framework for the options code. (LP: #1042323)
    - Also need kdeworkspace since kdecorationbridge.h is there
      (LP: #1046770)
    - Implements some cleanup that was suggested on the merge for the original
      port to gsettings. Other issues fixed as wel...

Read more...

Changed in compiz (Ubuntu Quantal):
status: Triaged → Fix Released
Changed in compiz:
status: Fix Committed → Fix Released
Revision history for this message
Rolf Leggewie (r0lf) wrote :

quantal has seen the end of its life and is no longer receiving any updates. Marking the quantal task for this ticket as "Won't Fix".

Changed in mesa (Ubuntu Quantal):
status: Triaged → Won't Fix
Changed in mesa:
status: Confirmed → Won't Fix
Timo Aaltonen (tjaalton)
Changed in mesa (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.