Bug #1037411 “[regression][DRI] SubBuffer rendering is much slow...” : Bugs : Compiz

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-08-16:

#1

I'm guessing there's a logic change somewhere in the gles2 code, where something previously only done per output is now done on the whole screen.

framebuffer_object comes to mind, but it's not a factor here, as mentioned above.

Maybe it's because we use triangles instead of quads now. So the number of vertices etc is 50% higher in gles2.

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-08-21: Re: [GLES] Graphics performance is 25-40% lower with gles2 than trunk

#2

Confirmed with a single monitor too. Even if I turn off all the new gles2 features in the opengl plugin, the gles2 branch gets 15-20% lower scores from glmark2 and glxgears (env vblank_mode=0).

summary:

- [GLES] Multimonitor graphics performance is 25-40% lower with gles2 than
- trunk
+ [GLES] Graphics performance is 25-40% lower with gles2 than trunk

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-08-21:

#3

This might be due to the fact that gles2 has to render everything as triangles instead of quads. So we're now pushing around 2x the polygons and 1.5x the number of vertices that trunk does.

Changed in compiz:
assignee:	nobody → Daniel van Vugt (vanvugt)

Daniel van Vugt (vanvugt) on 2012-08-22

summary:

- [GLES] Graphics performance is 25-40% lower with gles2 than trunk
+ [regression][GLES] Graphics performance is 25-40% lower with gles2 than
+ trunk

Daniel van Vugt (vanvugt) on 2012-08-23

Changed in compiz:
milestone:	0.9.8.0 → 0.9.8.1
summary:	- [regression][GLES] Graphics performance is 25-40% lower with gles2 than - trunk + [regression][GLES] Benchmark results are 15-40% lower with the gles2 + code

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-08-23: Re: [regression][GLES] Benchmark results are 15-40% lower with the gles2 code

#4

OK, the problem is not our use of triangles instead of quads. I just tested quads and the slowdown remains.

Sebastien Bacher (seb128) on 2012-08-30

Changed in compiz (Ubuntu):
milestone:	none → ubuntu-12.10-beta-2
importance:	Undecided → Medium
status:	New → Triaged

Daniel van Vugt (vanvugt) on 2012-08-31

Changed in compiz:
assignee:	Daniel van Vugt (vanvugt) → nobody

Daniel van Vugt (vanvugt) on 2012-09-05

summary:

[regression][GLES] Benchmark results are 15-40% lower with the gles2
- code
+ code (compiz 0.9.8.0)

Daniel van Vugt (vanvugt) on 2012-09-05

description:	updated
description:	updated

Sam Spilsbury (smspillaz) on 2012-09-10

Changed in compiz:
milestone:	0.9.8.2 → 0.9.8.4

Daniel van Vugt (vanvugt) on 2012-09-10

Changed in compiz:
assignee:	nobody → Daniel van Vugt (vanvugt)
status:	Triaged → In Progress
tags:	added: performance

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-09-10: Re: [regression][GLES] Benchmark results are 15-40% lower with the gles2 code (compiz 0.9.8.0)

#5

Bisected. The major cause of this regression is:
http://bazaar.launchpad.net/~compiz-linaro-team/compiz/gles2/revision/3255

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-09-10:

#6

I'm pretty sure revision 3255 came from smspillaz. Though it's not obvious.

Revision history for this message

Sam Spilsbury (smspillaz) wrote on 2012-09-10:

#7

Ah okay, its the fact that we draw from an fbo to the backbuffer. Seems fair.

I'll have a look into using glBlitFramebuffer

Revision history for this message

Sam Spilsbury (smspillaz) wrote on 2012-09-10:

#8

Okay, I have a half implementation of using glBlitFramebuffer, and have some ideas as to how we might be able to preserve vsync while not using fbo's at all (still necessary for nvidia).

In progress.

Changed in compiz:
assignee:	Daniel van Vugt (vanvugt) → Sam Spilsbury (smspillaz)

Daniel van Vugt (vanvugt) on 2012-09-11

Changed in compiz:
assignee:	Sam Spilsbury (smspillaz) → Daniel van Vugt (vanvugt)

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-09-11:

#9

I suspect the cause is either glClear (very expensive) being called too much, or this:
- gScreen->glPaintCompositedOutput (tmpRegion, scratchFbo, mask);
+ gScreen->glPaintCompositedOutput (screen->region (), scratchFbo, mask);

Revision history for this message

Sam Spilsbury (smspillaz) wrote on 2012-09-11:

#10

I'm a little confused by that.

Changing from tmpRegion to screen->region () effectively meant that the entire backing framebuffer object is painted to the backbuffer, rather than a small portion of it. This is necessary because we are always using glXSwapBuffers and the backbuffer becomes invalidated upon swap.

Why would this change have anything to do with the slowdown if the slowdown still occurrs when painting from a framebuffer object is turned off as you put in the description?

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-09-11:

#11

No comment, until I actually test my theories and maybe come up with more.

Revision history for this message

Sam Spilsbury (smspillaz) wrote on 2012-09-11:

#12

You could be right about glClear though. I notice that clearBuffers is not set to false again anywhere in screen.cpp

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-09-11:

#13

The point is that forcing the glXCopySubBufferMESA code path is still slow after r3255. I was surprised too.

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-09-11:

#14

Sorry, my mistake.

The slowdown in r3255 is only due to using the FBO on every frame. If I modify r3255 to use regional redraws then its still fast. So the regression in regional redraw performance happened after r3255.

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-09-11:

#15

This is a little bit mind-bending, but I've found the problem, at least on my precise desktop using Intel Sandy Bridge graphics...

1. I disable all the fancy new opengl rendering options so compiz should use regional redraws, so useFbo is always false.
2. On startup, I get a couple of frames with (mask & COMPOSITE_SCREEN_DAMAGE_ALL_MASK) so doubleBuffer.render is called with fullscreen==true. This calls: GLXDoubleBuffer::swap() --> copyFrontToBack() because useFbo was false --> glCopyPixels().
3. After startup, all frames use GLXDoubleBuffer::blit() --> GL::copySubBuffer(), which is now slow.

To make rendering fast again, all I have to do is break the sequence in #2. Either:
  (a) Force fullscreen=false; or
  (b) Comment out copyFrontToBack(); or
  (c) Comment out glCopyPixels()

So the problem, on my desktop where I've done all my performance comparisons, is that calling glCopyPixels just once or twice on startup will forever make GL::copySubBuffer take a slow rendering path thereafter. If I ensure glCopyPixels is never touched on startup then Mesa stays in fast mode for GL::copySubBuffer.

Revision history for this message

In freedesktop.org Bugzilla #54763, Daniel van Vugt (vanvugt) wrote on 2012-09-11:

#17

I call glCopyPixels a couple of times briefly on startup and then never again. The problem is that doing this makes all subsequent rendering much slower. If I never call glCopyPixels on startup then rendering remains fast thereafter.

It seems glCopyPixels is modifying the context in a way that permanently cripples later operations. The only possible cause I can see so far is:

src/mesa/main/drawpix.c: _mesa_CopyPixels:
   /* We're not using the current vertex program, and the driver may install
    * it's own. Note: this may dirty some state.
    */
   _mesa_set_vp_override(ctx, GL_TRUE);

This seems to set a flag in the ctx which is never cleared.

Using Mesa 8.0.2 in Ubuntu 12.04

summary:

- [regression][GLES] Benchmark results are 15-40% lower with the gles2
- code (compiz 0.9.8.0)
+ [regression][GLES] SubBuffer rendering is much slower in compiz 0.9.8.0
+ than it was in 0.9.7

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-09-11: Re: [regression][GLES] SubBuffer rendering is much slower in compiz 0.9.8.0 than it was in 0.9.7

#16

All,

I have modified this bug to just be about a very specific problem I find on my machine. If you're concerned about general performance of compiz 0.9.8.0 with the default settings then please look at bug 1024304 instead.

description:

updated

Revision history for this message

In freedesktop.org Bugzilla #54763, Daniel van Vugt (vanvugt) wrote on 2012-09-11:

#18

I am using Intel Sandy Bridge HD 2000 by the way. I believe that means i965.

Daniel van Vugt (vanvugt) on 2012-09-11

summary:

- [regression][GLES] SubBuffer rendering is much slower in compiz 0.9.8.0
+ [regression][DRI] SubBuffer rendering is much slower in compiz 0.9.8.0
than it was in 0.9.7

Revision history for this message

In freedesktop.org Bugzilla #54763, Michel Dänzer (michel-daenzer) wrote on 2012-09-11:

#19

(In reply to comment #2)
> I call glCopyPixels a couple of times briefly on startup and then never again.
> The problem is that doing this makes all subsequent rendering much slower. If I
> never call glCopyPixels on startup then rendering remains fast thereafter.

What are the read and draw buffers for glCopyPixels? If either of them is GL_FRONT*, that will cause a DRI2 fake front buffer to be allocated and thereafter kept up to date wrt the real front buffer.

> This seems to set a flag in the ctx which is never cleared.

It is cleared:

end:
_mesa_set_vp_override(ctx, GL_FALSE);

Revision history for this message

In freedesktop.org Bugzilla #54763, Daniel van Vugt (vanvugt) wrote on 2012-09-12:

#20

Yes, the read buffer is GL_FRONT in this case. So I guess the slow-down is by design in Mesa. I'm going to work around it in compiz anyway. glCopyPixels should never be touched at all really.

P.S. _mesa_set_vp_override(ctx, GL_FALSE) does not clear NewState. Which is what I was concerned about:

void
_mesa_set_vp_override(struct gl_context *ctx, GLboolean flag)
{
if (ctx->VertexProgram._Overriden != flag) {
ctx->VertexProgram._Overriden = flag;

      /* Set one of the bits which will trigger fragment program
       * regeneration:
       */
      ctx->NewState |= _NEW_PROGRAM;
   }
}

Daniel van Vugt (vanvugt) on 2012-09-12

Changed in compiz (Ubuntu Precise):
status:	New → Invalid
Changed in mesa (Ubuntu Precise):
status:	New → Invalid
Changed in mesa (Ubuntu Quantal):
status:	New → Triaged

Revision history for this message

In freedesktop.org Bugzilla #54763, Marek Olšák (maraeo) wrote on 2012-09-12:

#21

Don't worry about NewState. It's cleared after every draw operation.

Daniel van Vugt (vanvugt) on 2012-09-12

description:

updated

Revision history for this message

In freedesktop.org Bugzilla #54763, Chris Forbes (chrisf-ijw) wrote on 2012-09-12:

#22

Would it be reasonable to put a performance note in
ARB_debug_output/KHR_debug when mesa falls into this slow state?

Revision history for this message

In freedesktop.org Bugzilla #54763, Michel Dänzer (michel-daenzer) wrote on 2012-09-12:

#23

(In reply to comment #3)
> Yes, the read buffer is GL_FRONT in this case. So I guess the slow-down is by
> design in Mesa.

Rather the X server / DRI2 protocol. It *might* be possible to make xserver not enforce the fake front buffer for the Composite Overlay Window, not sure.

> I'm going to work around it in compiz anyway. glCopyPixels
> should never be touched at all really.

Out of curiosity, what are you using it for?

Revision history for this message

In freedesktop.org Bugzilla #54763, Daniel van Vugt (vanvugt) wrote on 2012-09-12:

#24

It is a fallback used for maintaining a persistent backbuffer if FBOs are not available. However it's not an important one because all drivers provide FBOs now.

http://bazaar.launchpad.net/~compiz-team/compiz/0.9.8/view/head:/plugins/opengl/src/screen.cpp#L1726

Unity Merger (unity-merger) on 2012-09-13

Changed in compiz:
status:	In Progress → Fix Committed

Bug Watch Updater (bug-watch-updater) on 2012-09-14

Changed in mesa:
importance:	Unknown → Medium
status:	Unknown → Confirmed

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2012-09-14:

#25

Fix committed into lp:compiz at revision 3370

Revision history for this message

Launchpad Janitor (janitor) wrote on 2012-09-21:

#26

Download full text (5.2 KiB)

This bug was fixed in the package compiz - 1:0.9.8.2+bzr3377-0ubuntu1

---------------
compiz (1:0.9.8.2+bzr3377-0ubuntu1) quantal-proposed; urgency=low

  [ Sam Spilsbury ]
  * debian/python-compizconfig.install
    - Install compizconfig-python.pc
  * debian/patches/100_expo_layout.patch
    - re-add the expo layout that used to be in precise (LP: #1047067)
    - add some testcases

  [ Timo Jyrinki ]
  * New upstream snapshot.
    - Fix multiple window placement bugs (LP: #974242) (LP: #976032)
    - Don't waste CPU looping through and looking at all the windows if you're
      rendering an output that has no damage on it. (LP: #1014986)
    - Updated convert files to fix some typos in the key names. (LP: #1041631)
    - Fix crash when imgsvg is loaded, due to missing symbol
      (decor_apply_gravity from libdecoration). (LP: #956986)
    - Treat unresolved symbols at link time as an error, rather than letting
      them through and cause strange crashes later. (LP: #1043143)
    - Refactors a little bit of the upgrade code and gets it under test to
      prepare to fix (LP: #1042537)
    - Updated AUTHORS from the full bzr log, and re-sort the list.
      (LP: #1042095)
    - Fixes FTBFS for kde4-window-decorator (LP: #1041310)
    - Fix obvious omissions from the introduction of unminimize_*,
      which were causing the unminimize animation settings to be ignored
      (LP: #1040455)
    - resize plugin: don't crash if resize wasn't initiated externally
      (LP: #1045191)
    - Clean up capitalization (LP: #1045652)
    - Avoid division by zero, if plugins try to deform a window down to size
      zero. (LP: #1045235)
    - Make "Unredirect Fullscreen Windows" more reliable. This fixes the
      problem with unredirection failing to engage at all (LP: #1041066) when
      gtk-window-decorator creates offscreen windows that are stacked on top.
      This also fixes the problem with unredirect hiding all windows,
      because it thinks the desktop window should be stacked on top
      (LP: #980663).
    - Ensure unredirected windows don't stay unredirected if they're no longer
      on top. (LP: #1041047)
    - Fix launching terminal functionality and make show-hud default key
      visible. Update the defaults to org.compiz.integrated to reflect the
      actual gnome values pre-gnome-3. (LP: #1040081) (LP: #1046199)
      (LP: #1046190)
    - Fix show-hud, bump COMPIZ_GNOME_INTEGRATED_SETTINGS_LIST_SIZE.
      (LP: #1046212)
    - Fixed: Windows with an alpha-channel, like gnome-terminal, were not
      being considered as possibly covering fullscreen windows. But they most
      certainly can. This ensures such RGBA windows are visible if they're
      stacked above a fullscreen window. (LP: #1046661)
    - Remove ListToStringList (LP: #1046184)
    - Fix typo causing CMake Error (LP: #1045665)
    - Transitions gtk-window-decorator over to use GSettings. Add a testing
      framework for the options code. (LP: #1042323)
    - Also need kdeworkspace since kdecorationbridge.h is there
      (LP: #1046770)
    - Implements some cleanup that was suggested on the merge for the original
      port to gsettings. Other issues fixed as wel...

This bug was fixed in the package compiz - 1:0.9.8.2+bzr3377-0ubuntu1

---------------
compiz (1:0.9.8.2+bzr3377-0ubuntu1) quantal-proposed; urgency=low

[ Sam Spilsbury ]
  * debian/python-compizconfig.install
    - Install compizconfig-python.pc
  * debian/patches/100_expo_layout.patch
    - re-add the expo layout that used to be in precise (LP: #1047067)
    - add some testcases

[ Timo Jyrinki ]
  * New upstream snapshot.
    - Fix multiple window placement bugs (LP: #974242) (LP: #976032)
    - Don't waste CPU looping through and looking at all the windows if you're
      rendering an output that has no damage on it. (LP: #1014986)
    - Updated convert files to fix some typos in the key names. (LP: #1041631)
    - Fix crash when imgsvg is loaded, due to missing symbol
      (decor_apply_gravity from libdecoration). (LP: #956986)
    - Treat unresolved symbols at link time as an error, rather than letting
      them through and cause strange crashes later. (LP: #1043143)
    - Refactors a little bit of the upgrade code and gets it under test to
      prepare to fix (LP: #1042537)
    - Updated AUTHORS from the full bzr log, and re-sort the list.
      (LP: #1042095)
    - Fixes FTBFS for kde4-window-decorator (LP: #1041310)
    - Fix obvious omissions from the introduction of unminimize_*,
      which were causing the unminimize animation settings to be ignored
      (LP: #1040455)
    - resize plugin: don't crash if resize wasn't initiated externally
      (LP: #1045191)
    - Clean up capitalization (LP: #1045652)
    - Avoid division by zero, if plugins try to deform a window down to size
      zero. (LP: #1045235)
    - Make "Unredirect Fullscreen Windows" more reliable. This fixes the
      problem with unredirection failing to engage at all (LP: #1041066) when
      gtk-window-decorator creates offscreen windows that are stacked on top.
      This also fixes the problem with unredirect hiding all windows,
      because it thinks the desktop window should be stacked on top
      (LP: #980663).
    - Ensure unredirected windows don't stay unredirected if they're no longer
      on top. (LP: #1041047)
    - Fix launching terminal functionality and make show-hud default key
      visible. Update the defaults to org.compiz.integrated to reflect the
      actual gnome values pre-gnome-3. (LP: #1040081) (LP: #1046199)
      (LP: #1046190)
    - Fix show-hud, bump COMPIZ_GNOME_INTEGRATED_SETTINGS_LIST_SIZE.
      (LP: #1046212)
    - Fixed: Windows with an alpha-channel, like gnome-terminal, were not
      being considered as possibly covering fullscreen windows. But they most
      certainly can. This ensures such RGBA windows are visible if they're
      stacked above a fullscreen window. (LP: #1046661)
    - Remove ListToStringList (LP: #1046184)
    - Fix typo causing CMake Error (LP: #1045665)
    - Transitions gtk-window-decorator over to use GSettings. Add a testing
      framework for the options code. (LP: #1042323)
    - Also need kdeworkspace since kdecorationbridge.h is there
      (LP: #1046770)
    - Implements some cleanup that was suggested on the merge for the original
      port to gsettings. Other issues fixed as well. (LP: #1042323)
    - Fix the case where a new gsettings schema got added for building but
      it wasn't recompiled locally (LP: #1046701)
    - Scale: select the window under the pointer, when the scale animation
      is over. (LP: #1045127)
    - Fixes the some "Use of uninitialised value" warnings reported by
      valgrind (LP: #1004336))
    - Check if org.gnome.mutter is available before using it (LP: #1048551)
    - We don't need to map our style windows, prevent them from cluttering
      up the paint queue (LP: #1042552)
    - Migrate profile independent keys separately from the profile
      dependent keys (LP: #1046190)
    - Don't ever enter the subdir of a plugin that is disabled. (LP: #1049100)
    - Workaround SubBuffer performance regression (LP: #1037411)
    - Changed the default placement of the benchmark window from 0,0 to
      100,50. (LP: #1039406)
    - Ensure window decorations always get rendered after the window, not
      before. This is how it was in compiz 0.9.7, and is required in order
      to resolve unity panel shadow (LP: #1050704)
    - Fix CMakeLists.txt to bring an xslt file back to compiz-dev
    - Avoid a NULL dereference and give a useful error message instead
      (LP: #944653)
    - Fix (LP: #1050752)
    - Check that pixmaps which aren't managed by us actually exist before
      binding. (LP: #927168)
    - Fix flickering and performance problems with using Unredirect Fullscreen
      Windows with multiple monitors. (LP: #1050749) (LP: #1051885)
  * debian/compiz-dev.install
    - Remove compizconfig-python.pc, now in python-compizconfig.install
  * Drop dependency on libgconf2-dev, add gconf2 dependency to the
    transitional package for migrations
  * Add -DUSE_GCONF=OFF to debian/rules
  * debian/libdecoration0.symbols
    - Add decor_shadow_options_cmp
  * Cherry-pick fixes from trunk:
    - Fix FTBFS with BUILD_GLES
  * Restore 'Glide 2' unminimize animation via override

[ Didier Roche ]
  * debian/libdecoration0.symbols:
    - update the symbols file
  * add and cherry-pick missing ABI breakage bump
 -- Didier Roche <didrocks@ubuntu.com>   Thu, 20 Sep 2012 17:39:05 +0200

Changed in compiz (Ubuntu Quantal):
status:	Triaged → Fix Released

Daniel van Vugt (vanvugt) on 2012-09-28

Changed in compiz:
status:	Fix Committed → Fix Released

Revision history for this message

Rolf Leggewie (r0lf) wrote on 2014-12-05:

#27

quantal has seen the end of its life and is no longer receiving any updates. Marking the quantal task for this ticket as "Won't Fix".

Changed in mesa (Ubuntu Quantal):
status:	Triaged → Won't Fix

Bug Watch Updater (bug-watch-updater) on 2018-03-02

Changed in mesa:
status:	Confirmed → Won't Fix

Timo Aaltonen (tjaalton) on 2020-05-28

Changed in mesa (Ubuntu):
status:	Triaged → Invalid

Compiz

[regression][DRI] SubBuffer rendering is much slower in compiz 0.9.8.0 than it was in 0.9.7

Bug Description

Related branches

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to	Milestone
Compiz	Fix Released	Medium	Daniel van Vugt	Compiz 0.9.8.4
Mesa	Won't Fix	Medium	freedesktop-bugs #54763
compiz (Ubuntu)	Fix Released	Medium	Unassigned	Ubuntu ubuntu-12.10-beta-2
Precise	Invalid	Undecided	Unassigned
Quantal	Fix Released	Medium	Unassigned	Ubuntu ubuntu-12.10-beta-2
mesa (Ubuntu)	Invalid	Undecided	Unassigned
Precise	Invalid	Undecided	Unassigned
Quantal	Won't Fix	Undecided	Unassigned