Comment 10 for bug 1638695

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello,

I have been working to track down the origin of the performance penalty exposed by this bug.

All the tests that I am performing are made on top of a locally compiled version of python 2.7.12 (from upstream sources, not applying any ubuntu patch on it)
built with different versions of GCC, 5.3.1 (current) and 4.8.0 both coming from the Ubuntu archives.

I can see important performance differences as I mentioned on my previous comments (check the full comparisons stats) just by
switching the GCC version. I decided to focus my investigation on the pickle module, since it seems to be the most affected one being
approximately 1.17x slower between the different gcc versions.

Due to the amount of changes introduced between 4.8.0 and 5.3.1 I decided to not persue the approach
of doing a bisection of the changes for identifying an offending commit yet, until we can identify which optimization or change
at compile time is causing the regression and focus our investigation on that specific area.

My understanding is that the performance penalty caused by the compiler might be related
to 2 factors, a important change on the linked libc or a optimization made by the compiler in the resulting object.

Since the resulting objects are linked against the same glibc version 2.23, I will not consider that factor as part of the analysis,
instead I will focus on analyzing the performance of the resulting objects generated by the compiler.

For following this approach I ran the pyperformance suite and used a valgrind session excluding all the modules with the exception of the pickle module,
using the default supressions to avoid missing any reference in the python runtime with the following arguments:

valgrind --tool=callgrind --instr-atstart=no --trace-children=yes venv/cpython2.7-6ed9b6df9cd4/bin/python -m performance run --python /usr/local/bin/python2.7 -b pickle --inside-venv

I did run this process multiple times with both GCC 4.8.0 and 5.3.1 to produce a large set of callgrind files to analyze , those callgrind files contains the full tree of execution
including all the relocations, jumps, calls to the libc and the python runtime itself and of course time spent per function and the amount of calls made to it.

I cleaned out all the resulting callgrind files removing the files smaller than 100k and the ones that were not loading the cPickle
extension (https://pastebin.canonical.com/175951/).

Over that set of files I executed callgrind_annotate to generate the stats per function ordered by the exclusive cost of function,
Then with this script (http://paste.ubuntu.com/23795048/
) I added all the costs per function per GCC version (4.8 and 5.3.1) and then I calculated the variance in cost between them.

The resulting file contains a tuple with the following format:

function name - gcc 4.8 cost - gcc 5.3.1 cost - variance in percent

As an example:

/home/ubuntu/python/cpython/Objects/tupleobject.c:tupleiter_dealloc 258068.000000 445009.000000 (variance: 0.724387)
/home/ubuntu/python/cpython/Objects/object.c:try_3way_compare 984860.000000 1676351.000000 (variance: 0.702121)
/home/ubuntu/python/cpython/Python/marshal.c:r_object 183524.000000 27742.000000 (variance: -0.848837)

The full results can be located here sorted by variance in descending order http://paste.ubuntu.com/23795023/

Now that we have these results we can move forward comparing the generated code for the functions with bigger variance
and track which optimization done by GCC might be altering the resulting objects.

I will update this case after further investigation.