Bug #1638695 “Python 2.7.12 performance regression” : Bugs : python2.7 package : Ubuntu

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-11-02:

#1

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in python2.7 (Ubuntu):
status:	New → Confirmed

Revision history for this message

Matthias Klose (doko) wrote on 2016-11-03:

#2

> I compiled python 2.7.12 from source on 14.04
> and found the performance to be unchanged there.

unchanged compared to what? the python binaries in 14.04, or 16.04?

could you check with a python version in between? e.g.
https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/ppa

Revision history for this message

Major Hayden (rackerhacker) wrote on 2016-11-03:

#3

Hello Matthias,

I'm sorry for the confusion there. What I meant is that I compiled 2.7.12 on 14.04 and found that it had the same performance as 2.7.6 (from the default Ubuntu python package) on 14.04. I also loaded Xenial's kernel on the 14.04 installation and found no performance difference either.

The problem seems to be unique to 2.7.12 on 16.04.

Revision history for this message

Matthias Klose (doko) wrote on 2016-11-03:

#4

please try to build using gcc-4.8 on 16.04 LTS (it's still available in the archive)

Revision history for this message

Major Hayden (rackerhacker) wrote on 2016-11-03:

#5

I can try that. Just to be clear, you're suggesting to do the following:

1) Install gcc-4.8 on 16.04
2) Compile 2.7.12 with gcc-4.8 on 16.04
3) Re-run tests

Did I get that right?

Revision history for this message

Matthias Klose (doko) wrote on 2016-11-06:

#6

yes, setting CC=gcc-4.8 CXX=g++-4.8 ./configure ...

Revision history for this message

Major Hayden (rackerhacker) wrote on 2016-11-10:

#7

Thanks for confirming that, Matthias. Testing with GCC 4.8 seemed to yield (mostly) better results. I put the data into a Google Sheet:

https://goo.gl/9gW82j

Out of the 10 pyperformance tests:

  * 3 tests were actually faster with python compiled w/gcc-4.8
  * 4 tests were slightly slower (but within 5%)
  * 3 tests were ~ 20-25% slower

Overall, these numbers look quite a bit better.

Revision history for this message

Jorge Niedbalski (niedbalski) wrote on 2016-11-17:

#8

comparison-gcc-5.3.1-gcc-4.8.json Edit (7.4 KiB, application/json)

Hello,

I am in the process of verifying this performance regression on a 16.04 Xenial machine using the kernel Linux-4.4.0-38.

I ran a locally compiled python 2.7.12 version built with different versions of GCC, 5.3.1 (current) and 4.8.0 both coming from the Ubuntu archives.

The benchmark suite I am using is the pyperformance suite (https://github.com/python/performance), I am running the full test suite, using the following command:

$ pyperformance run --python=python2 -o xxx.json

According to the latest test run i did using Python 2.7.12/GCC 4.8 (using GCC 5.3.1 as the baseline), 50% of the tests (32/64) have a significant variance in performance from which 19/32 are slower (in times ranging from 5-15%).

Just for information, I am comparing results using the following command:

$ pyperformance compare python-2.7.12-gcc-5.3.1.json python-2.7.12-gcc-4.8.0.json

I am attaching here the current comparison results for analysis.

Revision history for this message

Matthew Thode (prometheanfire) wrote on 2016-12-13:

#9

gentoo-performance-compare Edit (13.5 KiB, text/plain)

This may not be the best comparison, as I don't have gcc 4.8.0 (I could test with gcc 4.8.5 though) Also, using a different toolchain, glibc-2.22 as well. But here are my outputs, attached showing 4.9.3 and and 5.4.0.

Jorge Niedbalski (niedbalski) on 2017-01-13

Changed in python2.7 (Ubuntu):
importance:	Undecided → High
assignee:	nobody → Jorge Niedbalski (niedbalski)

Revision history for this message

Jorge Niedbalski (niedbalski) wrote on 2017-01-13:

#10

Download full text (3.5 KiB)

Hello,

I have been working to track down the origin of the performance penalty exposed by this bug.

All the tests that I am performing are made on top of a locally compiled version of python 2.7.12 (from upstream sources, not applying any ubuntu patch on it)
built with different versions of GCC, 5.3.1 (current) and 4.8.0 both coming from the Ubuntu archives.

I can see important performance differences as I mentioned on my previous comments (check the full comparisons stats) just by
switching the GCC version. I decided to focus my investigation on the pickle module, since it seems to be the most affected one being
approximately 1.17x slower between the different gcc versions.

Due to the amount of changes introduced between 4.8.0 and 5.3.1 I decided to not persue the approach
of doing a bisection of the changes for identifying an offending commit yet, until we can identify which optimization or change
at compile time is causing the regression and focus our investigation on that specific area.

My understanding is that the performance penalty caused by the compiler might be related
to 2 factors, a important change on the linked libc or a optimization made by the compiler in the resulting object.

Since the resulting objects are linked against the same glibc version 2.23, I will not consider that factor as part of the analysis,
instead I will focus on analyzing the performance of the resulting objects generated by the compiler.

For following this approach I ran the pyperformance suite and used a valgrind session excluding all the modules with the exception of the pickle module,
using the default supressions to avoid missing any reference in the python runtime with the following arguments:

valgrind --tool=callgrind --instr-atstart=no --trace-children=yes venv/cpython2.7-6ed9b6df9cd4/bin/python -m performance run --python /usr/local/bin/python2.7 -b pickle --inside-venv

I did run this process multiple times with both GCC 4.8.0 and 5.3.1 to produce a large set of callgrind files to analyze , those callgrind files contains the full tree of execution
including all the relocations, jumps, calls to the libc and the python runtime itself and of course time spent per function and the amount of calls made to it.

I cleaned out all the resulting callgrind files removing the files smaller than 100k and the ones that were not loading the cPickle
extension (https://pastebin.canonical.com/175951/).

Over that set of files I executed callgrind_annotate to generate the stats per function ordered by the exclusive cost of function,
Then with this script (http://paste.ubuntu.com/23795048/
) I added all the costs per function per GCC version (4.8 and 5.3.1) and then I calculated the variance in cost between them.

The resulting file contains a tuple with the following format:

function name - gcc 4.8 cost - gcc 5.3.1 cost - variance in percent

As an example:

/home/ubuntu/python/cpython/Objects/tupleobject.c:tupleiter_dealloc 258068.000000 445009.000000 (variance: 0.724387)
/home/ubuntu/python/cpython/Objects/object.c:try_3way_compare 984860.000000 1676351.000000 (variance: 0.702121)
/home/ubuntu/python/cpython/Python/marshal.c:r_object 183524.000000 2...

Hello,

I have been working to track down the origin of the performance penalty exposed by this bug.

All the tests that I am performing are made on top of a locally compiled version of python 2.7.12 (from upstream sources, not applying any ubuntu patch on it)
built with different versions of GCC, 5.3.1 (current) and 4.8.0 both coming from the Ubuntu archives.

I can see important performance differences as I mentioned on my previous comments (check the full comparisons stats) just by
switching the GCC version. I decided to focus my investigation on the pickle module, since it seems to be the most affected one being
approximately 1.17x slower between the different gcc versions.

Due to the amount of changes introduced between 4.8.0 and 5.3.1 I decided to not persue the approach
of doing a bisection of the changes for identifying an offending commit yet, until we can identify which optimization or change
at compile time is causing the regression and focus our investigation on that specific area.

My understanding is that the performance penalty caused by the compiler might be related
to 2 factors, a important change on the linked libc or a optimization made by the compiler in the resulting object.

Since the resulting objects are linked against the same glibc version 2.23, I will not consider that factor as part of the analysis,
instead I will focus on analyzing the performance of the resulting objects generated by the compiler.

For following this approach I ran the pyperformance suite and used a valgrind session excluding all the modules with the exception of the pickle module, 
using the default supressions to avoid missing any reference in the python runtime with the following arguments:

valgrind --tool=callgrind --instr-atstart=no --trace-children=yes  venv/cpython2.7-6ed9b6df9cd4/bin/python -m performance run --python /usr/local/bin/python2.7 -b pickle --inside-venv

I did run this process multiple times with both GCC 4.8.0 and 5.3.1  to produce a large set of callgrind files to analyze , those callgrind files contains the full tree of execution 
including all the relocations, jumps, calls to the libc and the python runtime itself and of course time spent per function and the amount of calls made to it.

I cleaned out all the resulting callgrind files removing the files smaller than 100k and the ones that were not loading the cPickle
extension (https://pastebin.canonical.com/175951/).

Over that set of files I executed callgrind_annotate to generate the stats per function ordered by the exclusive cost of function, 
Then with this script (http://paste.ubuntu.com/23795048/
) I added all the costs per function per GCC version (4.8 and 5.3.1) and then I calculated the variance in cost between them.

The resulting file contains a tuple with the following format:

function name - gcc 4.8 cost - gcc 5.3.1 cost - variance in percent

As an example:

/home/ubuntu/python/cpython/Objects/tupleobject.c:tupleiter_dealloc 258068.000000 445009.000000 (variance: 0.724387)
/home/ubuntu/python/cpython/Objects/object.c:try_3way_compare 984860.000000 1676351.000000 (variance: 0.702121)
/home/ubuntu/python/cpython/Python/marshal.c:r_object 183524.000000 27742.000000 (variance: -0.848837)

The full results can be located here sorted by variance in descending order http://paste.ubuntu.com/23795023/

Now that we have these results we can move forward comparing the generated code for the functions with bigger variance 
and track which optimization done by GCC might be altering the resulting objects.

I will update this case after further investigation.

Revision history for this message

Dominique Poulain (dominique-poulain) wrote on 2017-01-15:

#11

Just a small precision about Jorge's last comment above (<https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1638695/comments/10>):

"I cleaned out all the resulting callgrind files removing the files smaller than 100k and the ones that were not loading the cPickle extension (https://pastebin.canonical.com/175951/)."

That URL is not publicly accessible, here are the commands Jorge ran:

find . -type f -size -100k -exec rm {} \;
for f in $(ack-grep -i pickle | grep callgrind.out | cut -d":" -f1 | uniq); do mv $f $f.pick; done
for f in $(ls *.pick); do callgrind_annotate $f > $f.annotate; done

Revision history for this message

Louis Bouchard (louis) wrote on 2017-02-03:

#12

Hello,

Just to clarify something that I have just realized using :

$ pyperformance run -p={some python} means that {some python} will be used to run PYPERFORMANCE, not to run the benchmarks !!! So changing -p to use different builds of python will not run proper comparaison of the different builds.

Revision history for this message

Louis Bouchard (louis) wrote on 2017-02-10:

#13

Here are the results of the comparative tests I ran :

https://docs.google.com/spreadsheets/d/1MyNBPVZlBeic1OLqVKe_bcPk2deO_pQs9trIfOFefM0/edit#gid=2034603487

It confirms the assumptions but unfortunately, rebuilding 2.7.12 without the -fstack-protector-strong leads to worse performances than the stock 2.7.12 build. I'm continuing my investigations.

Revision history for this message

Carlos L. Torres (carlos-torres) wrote on 2017-02-13:

#14

Where are these tests being executed on? Are these virtual machines or bare-metal instances? If these are VMs, what hypervisor is being used?

Revision history for this message

Major Hayden (rackerhacker) wrote on 2017-02-13:

#15

My testing was done on Xen virtual machines, KVM virtual machines, and bare metal.

Revision history for this message

Louis Bouchard (louis) wrote on 2017-02-21:

#16

Hello,

The tests are run in LXC containers on a bare metal server with two physical CPU, 6 cores, 2 threads per core (Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz).

Following's doko's advice, I have built two new versions, one with LTO optimisation disabled and the other one with PGO optimisation disabled. In both cases, it makes things worse.

Ryan Beisner (1chb1n) on 2017-02-21

tags:

added: uosci

Revision history for this message

Louis Bouchard (louis) wrote on 2017-02-22:

#17

Hello,

Following doko's advice, I ran a set of test with PGO & LTO optimization disabled.

Here are the results : https://docs.google.com/spreadsheets/d/1tTlEOvMypwKwi99XHjvuQFE14_jpBBLy0-Mk6bjkvL0/edit#gid=1169944329

This may bring more light to the investigation as it appear that with LTO & PGO optimisation disabled on Trusty, the trusty version becomes slower than the Xenial stock version. Disabling optimisation on Xenial makes little difference though.

So maybe the PGO & LTO optimisation on Trusty is more efficient than on Xenial and leads to better results,hence better performance. Just a thought

Louis Bouchard (louis) on 2017-02-27

Changed in python2.7 (Ubuntu):
assignee:	Jorge Niedbalski (niedbalski) → Louis Bouchard (louis-bouchard)

Revision history for this message

Louis Bouchard (louis) wrote on 2017-02-27:

#18

Download full text (12.2 KiB)

Following the results of the previous comparison, i've used Jorge's profiling example on the 'call_method' bench for trusty stock, no LTO, no PGO and Xenial stock, no PGO and no LTO. Here are the results. Notice the difference between Trusty Stock & Trusty nopgo, as opposed to the execution profiles of the other tests.

Trusty Stock
============
callgrind_annotate:
Profiled target: /home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python /home/ubuntu/venv/cpython2.7-d0d7712d4e1d/li
b/python2.7/site-packages/performance/benchmarks/bm_call_method.py --worker --pipe 4 --worker-task=0 --samples 3 --
warmups 1 --loops 1 --min-time 0.1 (PID 25150, part 1)
4,918,198,605 ???:PyEval_EvalFrameEx'2 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
1,180,507,700 ???:0x00000000005368f0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
1,109,707,368 ???:PyObject_GetAttr [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
  823,065,734 ???:PyFrame_New [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
  552,755,137 ???:0x00000000004a5c90 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
  525,836,692 ???:0x00000000004bedf0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
   12,732,711 ???:PyParser_AddToken [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    6,120,934 ???:PyDict_GetItem [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    4,700,333 ???:0x00000000004bc0e0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    4,647,564 ???:0x00000000004afe90'2 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,724,240 ???:PyObject_Free [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,526,112 ???:PyDict_SetItem [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,407,575 ???:0x0000000000571fd0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,304,198 ???:PyObject_Hash [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,055,436 ???:0x00000000005495a0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    2,796,306 ???:0x0000000000535070 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]

Trusty nopgo:
=============
Profiled target: /home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python /home/ubuntu/venv/cpython2.7-d217262e7ee7/li
b/python2.7/site-packages/performance/benchmarks/bm_call_method.py --worker --pipe 4 --worker-task=0 --samples 3 --
warmups 1 --loops 1 --min-time 0.1 (PID 28073, part 1)
5,362,602,828 ???:PyEval_EvalFrameEx'2 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
1,250,195,637 ???:0x0000000000585e90 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  890,479,191 ???:PyFrame_New [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  836,574,419 ???:PyObject_GenericGetAttr [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  552,808,267 ???:0x000000000049ef00 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  539,318,922 ???:0x0000000000493710 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  488,401,927 ???:_PyType_Lookup [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  258,028,053 ???:PyObject_GetAttr [/home/ubunt...

Following the results of the previous comparison, i've used Jorge's profiling example on the 'call_method' bench for trusty stock, no LTO, no PGO and Xenial stock, no PGO and no LTO. Here are the results. Notice the difference between Trusty Stock & Trusty nopgo, as opposed to the execution profiles of the other tests.

Trusty Stock
============
callgrind_annotate:
Profiled target:  /home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python /home/ubuntu/venv/cpython2.7-d0d7712d4e1d/li
b/python2.7/site-packages/performance/benchmarks/bm_call_method.py --worker --pipe 4 --worker-task=0 --samples 3 --
warmups 1 --loops 1 --min-time 0.1 (PID 25150, part 1)
4,918,198,605  ???:PyEval_EvalFrameEx'2 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
1,180,507,700  ???:0x00000000005368f0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
1,109,707,368  ???:PyObject_GetAttr [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
  823,065,734  ???:PyFrame_New [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
  552,755,137  ???:0x00000000004a5c90 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7] 
  525,836,692  ???:0x00000000004bedf0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
   12,732,711  ???:PyParser_AddToken [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    6,120,934  ???:PyDict_GetItem [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    4,700,333  ???:0x00000000004bc0e0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    4,647,564  ???:0x00000000004afe90'2 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,724,240  ???:PyObject_Free [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,526,112  ???:PyDict_SetItem [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,407,575  ???:0x0000000000571fd0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,304,198  ???:PyObject_Hash [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    3,055,436  ???:0x00000000005495a0 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]
    2,796,306  ???:0x0000000000535070 [/home/ubuntu/venv/cpython2.7-d0d7712d4e1d/bin/python2.7]

Trusty nopgo:
=============
Profiled target:  /home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python /home/ubuntu/venv/cpython2.7-d217262e7ee7/li
b/python2.7/site-packages/performance/benchmarks/bm_call_method.py --worker --pipe 4 --worker-task=0 --samples 3 --
warmups 1 --loops 1 --min-time 0.1 (PID 28073, part 1)
5,362,602,828  ???:PyEval_EvalFrameEx'2 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
1,250,195,637  ???:0x0000000000585e90 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  890,479,191  ???:PyFrame_New [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  836,574,419  ???:PyObject_GenericGetAttr [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  552,808,267  ???:0x000000000049ef00 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  539,318,922  ???:0x0000000000493710 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  488,401,927  ???:_PyType_Lookup [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
  258,028,053  ???:PyObject_GetAttr [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
   12,247,026  ???:0x00000000005937f0 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
   11,727,983  ???:PyParser_AddToken [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
    6,409,083  ???:PyDict_GetItem [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
    4,948,165  ???:PyObject_Malloc [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
    4,746,403  ???:0x00000000004f7b70'2 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
    4,525,638  ???:PyNode_AddChild [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
    4,429,698  ???:PyObject_Free [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
    3,582,953  ???:0x000000000041aff0 [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]
    3,240,230  ???:PyObject_Hash [/home/ubuntu/venv/cpython2.7-d217262e7ee7/bin/python2.7]

Trusty nolto:
=============
Profiled target:  /home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7 /home/ubuntu/venv/cpython2.7-d8da6ef977cc
/lib/python2.7/site-packages/performance/benchmarks/bm_call_method.py --worker --pipe 4 --worker-task=0 --samples 3
 --warmups 1 --loops 1 --min-time 0.1 (PID 27353, part 1)
4,930,764,066  ???:PyEval_EvalFrameEx'2 [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
1,059,123,292  ???:0x0000000000486370 [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
  879,185,389  ???:PyObject_GetAttr [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
  823,049,135  ???:PyFrame_New [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
  552,712,907  ???:0x0000000000477410 [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
  539,326,714  ???:PyMethod_New [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
  472,694,320  ???:_PyType_Lookup [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
  148,897,549  ???:PyObject_GC_UnTrack [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
   40,448,799  ???:0x00000000004877d0 [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
   10,899,345  ???:PyParser_AddToken [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
   10,348,722  ???:PyDict_GetItem [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
    4,662,730  ???:0x00000000004cd110 [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
    4,637,942  ???:0x000000000050b720'2 [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
    4,432,451  ???:PyObject_Free [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
    4,094,020  ???:PyDict_SetItem [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
    3,629,015  ???:PyNode_AddChild [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
    3,408,211  ???:0x00000000004a2260 [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
    3,232,715  ???:PyObject_Malloc [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
    3,206,499  ???:PyObject_Hash [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]
    3,055,436  ???:0x000000000050b610 [/home/ubuntu/venv/cpython2.7-d8da6ef977cc/bin/python2.7]

Xenial Stock
============
Profiled target:  /home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python /home/ubuntu/venv/cpython2.7-abc0c129a9a8/li
b/python2.7/site-packages/performance/benchmarks/bm_call_method.py --worker --pipe 4 --worker-task=0 --samples 3 --
warmups 1 --loops 1 --min-time 0.1 (PID 28373, part 1)
5,293,380,177  ???:PyEval_EvalFrameEx'2 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
1,058,376,436  ???:0x00000000004d9030 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  850,012,013  ???:PyFrame_New [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  823,053,334  ???:_PyObject_GenericGetAttrWithDict [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  525,847,184  ???:PyMethod_New [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  471,794,110  ???:0x00000000004e1030 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  461,203,120  ???:_PyType_Lookup [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  312,611,371  ???:PyObject_GetAttr [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  148,905,231  ???:PyObject_GC_UnTrack [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
   40,444,971  ???:0x00000000004e09c0 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
   12,120,701  ???:PyDict_GetItem [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
   10,987,593  ???:PyParser_AddToken [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    6,574,452  ???:PyObject_Malloc [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    5,006,318  ???:0x00000000004bdc60'2 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    4,842,496  ???:PyDict_SetItem [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    4,519,155  ???:PyDict_Next [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    3,835,459  ???:PyObject_Free [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    3,270,809  ???:PyObject_Hash [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    3,257,249  ???:PyNode_AddChild [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]

Xenial nopgo:
=============
Profiled target:  /home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python /home/ubuntu/venv/cpython2.7-abc0c129a9a8/li
b/python2.7/site-packages/performance/benchmarks/bm_call_method.py --worker --pipe 4 --worker-task=0 --samples 3 --
warmups 1 --loops 1 --min-time 0.1 (PID 31025, part 1)
5,293,380,790  ???:PyEval_EvalFrameEx'2 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
1,058,376,436  ???:0x00000000004d9030 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  850,012,013  ???:PyFrame_New [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  823,053,388  ???:_PyObject_GenericGetAttrWithDict [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  525,847,184  ???:PyMethod_New [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  471,794,110  ???:0x00000000004e1030 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  461,203,154  ???:_PyType_Lookup [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  312,611,394  ???:PyObject_GetAttr [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
  148,905,242  ???:PyObject_GC_UnTrack [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
   40,444,971  ???:0x00000000004e09c0 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
   12,120,701  ???:PyDict_GetItem [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
   10,987,593  ???:PyParser_AddToken [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    6,569,654  ???:PyObject_Malloc [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    5,006,318  ???:0x00000000004bdc60'2 [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    4,842,666  ???:PyDict_SetItem [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    4,519,155  ???:PyDict_Next [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    3,835,456  ???:PyObject_Free [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    3,270,809  ???:PyObject_Hash [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]
    3,257,249  ???:PyNode_AddChild [/home/ubuntu/venv/cpython2.7-abc0c129a9a8/bin/python2]

Xenial nolto
============
4,638,372,768  ???:PyEval_EvalFrameEx'2 [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
1,032,105,069  ???:0x00000000004baed0 [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
  930,988,654  ???:_PyObject_GenericGetAttrWithDict [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
  836,521,421  ???:PyFrame_New [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
  539,330,714  ???:PyMethod_New [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
  471,820,360  ???:0x00000000004ab110 [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
  459,027,697  ???:_PyType_Lookup [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
  231,463,205  ???:PyObject_GetAttr [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
  148,905,033  ???:PyObject_GC_UnTrack [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
   40,447,959  ???:0x00000000004bb9c0 [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
   26,987,922  ???:PyObject_GenericGetAttr [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
   10,987,547  ???:PyParser_AddToken [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
   10,817,485  ???:0x00000000004d5100 [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
    6,371,045  ???:PyDict_GetItem [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
    4,810,558  ???:0x0000000000531b20'2 [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
    4,434,281  ???:0x00000000004f4f70 [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
    4,103,768  ???:PyObject_Free [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
    3,779,149  ???:0x00000000004d66a0 [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
    3,406,572  ???:PyDict_SetItem [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
    3,363,736  ???:0x0000000000596340 [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
    3,257,249  ???:PyNode_AddChild [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]
    3,245,139  ???:PyObject_Malloc [/home/ubuntu/venv/cpython2.7-67a0e317201d/bin/python2]

Revision history for this message

Louis Bouchard (louis) wrote on 2017-02-27:

#19

Here is the pastebin for better readability : http://paste.ubuntu.com/24078834/

Louis Bouchard (louis) on 2017-05-10

Changed in python2.7 (Ubuntu):
assignee:	Louis Bouchard (louis) → nobody

Revision history for this message

Joe Gordon (jogo) wrote on 2017-07-31:

#20

Any updates on this? Are there plans to release a faster python build for Xenial?

Revision history for this message

Elvis Pranskevichus (elprans) wrote on 2017-08-16:

#21

After much testing I found what is causing the regression in 16.04 and later. There are several distinct causes which are attributed to the choices made in debian/rules and the changes in GCC.

Cause #1: the decision to compile `Modules/_math.c` with `-fPIC` *and* link it statically into the python executable [1]. This causes the majority of the slowdown. This may be a bug in GCC or simply a constraint, I didn't find anything specific on this topic, although there are a lot of old bug reports regarding the interaction of -fPIC with -flto.

Cause #2: the enablement of `fpectl` [2], specifically the passage of `--with-fpectl` to `configure`. fpectl is disabled in python.org builds by default and its use is discouraged. Yet, Debian builds enable it unconditionally, and it seems to cause a significant performance degradation. It's much less noticeable on 14.04 with GCC 4.8.0, but on more recent releases the performance difference seems to be larger.

Plausible Cause #3: stronger stack smashing protection in 16.04, which uses --fstack-protector-strong, wherease 14.04 and earlier used --fstack-protector (with lesser performance overhead).

Also, debian/rules limits the scope of PGO's PROFILE_TASK to 377 test suites vs upstream's 397, which affects performance somewhat negatively, but this is not definitive. What are the reasons behind the trimming of the tests used for PGO?

Without fpectl, and without -fPIC on _math.c, 2.7.12 built on 16.04 is slower than stock 2.7.6 on 14.04 by about 0.9% in my pyperformance runs [3]. This is in contrast to a whopping 7.95% slowdown when comparing stock versions.

Finally, a vanilla Python 2.7.12 build using GCC 5.4.0, default CFLAGS, default PROFILE_TASK and default Modules/Setup.local consistently runs faster in benchmarks than 2.7.6 (by about 0.7%), but I was not able to pinpoint the exact reason for that.

Note: the percentages above are the relative change in the geometric mean of pyperformance benchmark results.

[1] https://git.launchpad.net/~usd-import-team/ubuntu/+source/python2.7/tree/debian/rules?h=ubuntu/xenial-updates#n421

[2] https://git.launchpad.net/~usd-import-team/ubuntu/+source/python2.7/tree/debian/rules?h=ubuntu/xenial-updates#n117

[3] https://docs.google.com/spreadsheets/d/1L3_gxe-AOYJsXFwGZgFko8jaChB0dFPjK5oMO5T5vj4/edit?usp=sharing

After much testing I found what is causing the regression in 16.04 and later.  There are several distinct causes which are attributed to the choices made in debian/rules and the changes in GCC.

Cause #1: the decision to compile `Modules/_math.c` with `-fPIC` *and* link it statically into the python executable [1].  This causes the majority of the slowdown.  This may be a bug in GCC or simply a constraint, I didn't find anything specific on this topic, although there are a lot of old bug reports regarding the interaction of -fPIC with -flto.

Cause #2: the enablement of `fpectl` [2], specifically the passage of `--with-fpectl` to `configure`.  fpectl is disabled in python.org builds by default and its use is discouraged.  Yet, Debian builds enable it unconditionally, and it seems to cause a significant performance degradation.  It's much less noticeable on 14.04 with GCC 4.8.0, but on more recent releases the performance difference seems to be larger.

Plausible Cause #3: stronger stack smashing protection in 16.04, which uses --fstack-protector-strong, wherease 14.04 and earlier used --fstack-protector (with lesser performance overhead).

Also, debian/rules limits the scope of PGO's PROFILE_TASK to 377 test suites vs upstream's 397, which affects performance somewhat negatively, but this is not definitive.  What are the reasons behind the trimming of the tests used for PGO?

Without fpectl, and without -fPIC on _math.c, 2.7.12 built on 16.04 is slower than stock 2.7.6 on 14.04 by about 0.9% in my pyperformance runs [3].  This is in contrast to a whopping 7.95% slowdown when comparing stock versions.

Finally, a vanilla Python 2.7.12 build using GCC 5.4.0, default CFLAGS, default PROFILE_TASK and default Modules/Setup.local consistently runs faster in benchmarks than 2.7.6 (by about 0.7%), but I was not able to pinpoint the exact reason for that.

Note: the percentages above are the relative change in the geometric mean of pyperformance benchmark results.

[1] https://git.launchpad.net/~usd-import-team/ubuntu/+source/python2.7/tree/debian/rules?h=ubuntu/xenial-updates#n421

[2] https://git.launchpad.net/~usd-import-team/ubuntu/+source/python2.7/tree/debian/rules?h=ubuntu/xenial-updates#n117

[3] https://docs.google.com/spreadsheets/d/1L3_gxe-AOYJsXFwGZgFko8jaChB0dFPjK5oMO5T5vj4/edit?usp=sharing

Revision history for this message

Major Hayden (rackerhacker) wrote on 2017-08-17:

#22

Thanks for the deep dive, Elvis! :) Is it possible to adjust some of these settings in the Ubuntu packages, or is just the way it will be going forward?

Revision history for this message

Elvis Pranskevichus (elprans) wrote on 2017-08-17:

#23

0001-Disable-fpectl-and-fPIC-on-Modules-_math.c.patch Edit (1.2 KiB, text/plain)

We'll need the package maintainers to chime in on this. Attached is a patch that disables harmful settings.

Revision history for this message

Ubuntu Foundations Team Bug Bot (crichton) wrote on 2017-08-17:

#24

The attachment "0001-Disable-fpectl-and-fPIC-on-Modules-_math.c.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags:

added: patch

Revision history for this message

Matthias Klose (doko) wrote on 2017-08-25:

#25

thanks for the detailed analysis.

- #1: I'm stopping now to build the _fpectl module for the upcoming
17.10 release. I'm hesitant to disable it for 16.04.

- #2: 2.7.11-6: That's a fix done a year ago, I can't remember
   why I changed that. I'll try to remember ...
   _math.c is mentioned twice as a source file, same as
   timemodule.c

- #3: if the above change is necessary, then yes, it should only
be done for the shared builds, not the static ones.

   but starting with 17.04 we are building with -fPIE by default,
   which turns on PIC for everything again. So it is likely that
   you will see a decrease in performance again, unless the
   compiler go a little bit better in newer Ubuntu releases.

I'll look at #2 and try to come up with a non-invasive approach.

Matthias Klose (doko) on 2017-08-31

Changed in python2.7 (Ubuntu Xenial):
status:	New → Confirmed
importance:	Undecided → High

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-09-14:

#26

This bug was fixed in the package python2.7 - 2.7.14~rc1-3ubuntu1

---------------
python2.7 (2.7.14~rc1-3ubuntu1) artful; urgency=medium

* Regenerate the _PyFPE breaks list for Ubuntu.

-- Matthias Klose <email address hidden> Tue, 05 Sep 2017 20:19:52 +0200

Changed in python2.7 (Ubuntu):
status:	Confirmed → Fix Released

Revision history for this message

Dariusz Gadomski (dgadomski) wrote on 2017-09-19:

#27

Matthias, I have made a series of pyperformance benchmarks [1] to compare the influence of the factors listed by Elvis on Xenial and Artful. All runs were done on the same machine (metal) with a fresh Ubuntu cloud image.

My observations confirm that both: changing fpectl and fPIC for _math.c module bring significant improvement over corresponding versions without the changes. I have replaced -fstack-protector-strong with -fstack-protector to observe even better results in the benchmark. Although in the examined scope it's impact is not as significant as the former 2 factors.

The combination of all three factors make the results close to what we can observe on Trusty.

I believe backporting the fpectl and _math.c changes also to Xenial is worth considering.
The -fstack-protector setting brings performance improvement, but it also creates some security doubts.

[1] http://pyperformance.readthedocs.io
[2] https://docs.google.com/spreadsheets/d/1R83NQ7xzIfzFMVdbrh-zqK_iBuPcuhWa6KdTYPibFmE/edit?usp=sharing

Revision history for this message

Matthias Klose (doko) wrote on 2017-09-19:

#28

thanks for doing that!

more interesting numbers would be:

- artful with -fno-PIE -no-pie for the static build
- xenial with just no_fpic

the reason I'm asking for the latter is that you'll break a lot of packages, needing to rebuild

$ wc -l debian/pyfpe-breaks.Debian
70 debian/pyfpe-breaks.Debian

plus you would make every extension in PPA's and third party repositories unusable.

list of breaking packages (version numbers not updated for xenial):

cython (<< 0.26-2.1),
epigrass (<= 2.4.7-1),
invesalius-bin (<= 3.1.1-1),
macs (<= 2.1.1.20160309-1),
printrun (<= 0~20150310-5),
pycorrfit (<= 1.0.0+dfsg-1),
pyscanfcs (<= 0.2.3-3),
python-acora (<= 2.0-2+b1),
python-adios (<= 1.12.0-3),
python-astroml-addons (<= 0.2.2-4),
python-astropy (<= 2.0.1-2),
python-astroscrappy (<= 1.0.5-1+b1),
python-bcolz (<= 1.1.0+ds1-4+b1),
python-breezy (<= 3.0.0~bzr6772-1),
python-bzrlib (<= 2.7.0+bzr6622-7),
python-cartopy (<= 0.14.2+dfsg1-2+b1),
python-cogent (<= 1.9-11),
python-cutadapt (<= 1.13-1+b1),
python-cypari2 (<= 1.0.0-3),
python-dipy-lib (<= 0.12.0-1),
python-djvu (<= 0.8-2),
python-fabio (<= 0.4.0+dfsg-2+b1),
python-falcon (<= 1.0.0-2+b1),
python-fiona (<= 1.7.9-1),
python-fpylll (<= 0.2.4+ds-3),
python-grib (<= 2.0.2-2),
python-gssapi (<= 1.2.0-1+b1),
python-h5py (<= 2.7.0-1+b1),
python-healpy (<= 1.10.3-2+b1),
python-htseq (<= 0.6.1p1-4),
python-imobiledevice (<= 1.2.0+dfsg-3.1),
python-kivy (<= 1.9.1-1+b1),
python-libdiscid (<= 1.0-1+b1),
python-liblo (<= 0.10.0-3+b1),
python-llfuse (<= 1.2+dfsg-1+b1),
python-lxml (<< 3.8.0-2),
python-meliae (<= 0.4.0+bzr199-3),
python-netcdf4 (<= 1.2.9-1+b1),
python-nipy-lib (<= 0.4.1-1),
python-numpy (<< 1:1.12.1-3.1),
python-pandas-lib (<= 0.20.3-1),
python-petsc4py (<= 3.7.0-3+b1),
python-pybloomfiltermmap (<= 0.3.15-0.1+b1),
python-pyfai (<= 0.13.0+dfsg-1+b1),
python-pygame-sdl2 (<= 6.99.12.4-1),
python-pygpu (<= 0.6.9-2),
python-pymca5 (<= 5.1.3+dfsg-1+b1),
python-pymssql (<= 2.1.3+dfsg-1+b1),
python-pyresample (<= 1.5.0-3+b1),
python-pysam (<= 0.11.2.2+ds-3),
python-pysph (<= 0~20160514.git91867dc-4),
python-pywt (<= 0.5.1-1.1+b1),
python-rasterio (<= 0.36.0-2+b2),
python-renpy (<= 6.99.12.4+dfsg-1),
python-scipy (<< 0.18.1-2.1),
python-sfepy (<= 2016.2-2),
python-sfml (<= 2.2~git20150611.196c88+dfsg-4),
python-shapely (<= 1.6.1-1),
python-skimage-lib (<= 0.12.3-9+b1),
python-sklearn-lib (<= 0.19.0-1),
python-specutils (<= 0.2.2-1+b1),
python-statsmodels-lib (<= 0.8.0-3),
python-stemmer (<= 1.3.0+dfsg-1+b7),
python-tables-lib (<= 3.3.0-5+b1),
python-tinycss (<= 0.4-1+b1),
python-tk (<< 2.7.14~rc1-1~),
python-wheezy.template (<= 0.1.167-1.1+b1),
python-yt (<= 3.3.3-2+b1),
sagemath (<= 8.0-5),
xpra (<= 0.17.6+dfsg-1),

thanks for doing that!

more interesting numbers would be:

- artful with -fno-PIE -no-pie for the static build
 - xenial with just no_fpic

the reason I'm asking for the latter is that you'll break a lot of packages, needing to rebuild

$ wc -l debian/pyfpe-breaks.Debian 
  70 debian/pyfpe-breaks.Debian

plus you would make every extension in PPA's and third party repositories unusable.

list of breaking packages (version numbers not updated for xenial):

cython (<< 0.26-2.1),
epigrass (<= 2.4.7-1),
invesalius-bin (<= 3.1.1-1),
macs (<= 2.1.1.20160309-1),
printrun (<= 0~20150310-5),
pycorrfit (<= 1.0.0+dfsg-1),
pyscanfcs (<= 0.2.3-3),
python-acora (<= 2.0-2+b1),
python-adios (<= 1.12.0-3),
python-astroml-addons (<= 0.2.2-4),
python-astropy (<= 2.0.1-2),
python-astroscrappy (<= 1.0.5-1+b1),
python-bcolz (<= 1.1.0+ds1-4+b1),
python-breezy (<= 3.0.0~bzr6772-1),
python-bzrlib (<= 2.7.0+bzr6622-7),
python-cartopy (<= 0.14.2+dfsg1-2+b1),
python-cogent (<= 1.9-11),
python-cutadapt (<= 1.13-1+b1),
python-cypari2 (<= 1.0.0-3),
python-dipy-lib (<= 0.12.0-1),
python-djvu (<= 0.8-2),
python-fabio (<= 0.4.0+dfsg-2+b1),
python-falcon (<= 1.0.0-2+b1),
python-fiona (<= 1.7.9-1),
python-fpylll (<= 0.2.4+ds-3),
python-grib (<= 2.0.2-2),
python-gssapi (<= 1.2.0-1+b1),
python-h5py (<= 2.7.0-1+b1),
python-healpy (<= 1.10.3-2+b1),
python-htseq (<= 0.6.1p1-4),
python-imobiledevice (<= 1.2.0+dfsg-3.1),
python-kivy (<= 1.9.1-1+b1),
python-libdiscid (<= 1.0-1+b1),
python-liblo (<= 0.10.0-3+b1),
python-llfuse (<= 1.2+dfsg-1+b1),
python-lxml (<< 3.8.0-2),
python-meliae (<= 0.4.0+bzr199-3),
python-netcdf4 (<= 1.2.9-1+b1),
python-nipy-lib (<= 0.4.1-1),
python-numpy (<< 1:1.12.1-3.1),
python-pandas-lib (<= 0.20.3-1),
python-petsc4py (<= 3.7.0-3+b1),
python-pybloomfiltermmap (<= 0.3.15-0.1+b1),
python-pyfai (<= 0.13.0+dfsg-1+b1),
python-pygame-sdl2 (<= 6.99.12.4-1),
python-pygpu (<= 0.6.9-2),
python-pymca5 (<= 5.1.3+dfsg-1+b1),
python-pymssql (<= 2.1.3+dfsg-1+b1),
python-pyresample (<= 1.5.0-3+b1),
python-pysam (<= 0.11.2.2+ds-3),
python-pysph (<= 0~20160514.git91867dc-4),
python-pywt (<= 0.5.1-1.1+b1),
python-rasterio (<= 0.36.0-2+b2),
python-renpy (<= 6.99.12.4+dfsg-1),
python-scipy (<< 0.18.1-2.1),
python-sfepy (<= 2016.2-2),
python-sfml (<= 2.2~git20150611.196c88+dfsg-4),
python-shapely (<= 1.6.1-1),
python-skimage-lib (<= 0.12.3-9+b1),
python-sklearn-lib (<= 0.19.0-1),
python-specutils (<= 0.2.2-1+b1),
python-statsmodels-lib (<= 0.8.0-3),
python-stemmer (<= 1.3.0+dfsg-1+b7),
python-tables-lib (<= 3.3.0-5+b1),
python-tinycss (<= 0.4-1+b1),
python-tk (<< 2.7.14~rc1-1~),
python-wheezy.template (<= 0.1.167-1.1+b1),
python-yt (<= 3.3.3-2+b1),
sagemath (<= 8.0-5),
xpra (<= 0.17.6+dfsg-1),

Revision history for this message

Dariusz Gadomski (dgadomski) wrote on 2017-09-21:

#29

Thanks for the explanation Matthias. I have added the Xenial variant you asked for to the spreadsheet.

The artful will follow once I'm after a couple of days out.

Revision history for this message

Dariusz Gadomski (dgadomski) wrote on 2017-10-20:

#30

I have managed to prepare the static build without PIE on top the latest artful version [1].

I have added the results to the same spreadsheet. What I've found particularly interesting are the results of the python_startup & python_startup_no_site tests. In subsequent runs (the result in the spreadsheet was for the second run of the testsuite) the improvement is really significant.

I believe this is thanks to the fact that the relative addresses don't need to be patched before running the binary.

In case of the rest of the tests: there are some small improvements as well as some minor performance decreases.

[1] ppa:dgadomski/pyperf (python2.7 - 2.7.14-2ubuntu3~lp1638695~4)

Revision history for this message

Dariusz Gadomski (dgadomski) wrote on 2017-11-13:

#31

Hello Matthias. Is there any progress with applying those features to Xenial? Please let me know if you need any testing to be done.

Revision history for this message

Tyler Hicks (tyhicks) wrote on 2017-11-28:

#32

I don't feel like the change from fstack-protector-strong
to fstack-protector should be made. The performance testing results in
the spreadsheet don't suggest that the change positively impacts
performance in a meaningful way. fstack-protector-strong slightly
outperforms fstack-protector in some situations and slightly under
performs in others, suggesting that the difference is within the noise
threshold. I'd strongly prefer that we continue to use
fstack-protector-strong.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2017-11-28:

#33

How long did the benchmarks actually take? The sum of the runtimes appears to be about 11 seconds. Is that correct? Is that long enough to draw useful conclusions from the results?

Thanks

Revision history for this message

Dariusz Gadomski (dgadomski) wrote on 2017-11-28:

#34

Seth: those values were somehow calculated from a number of runs. A single pyperformance benchmark run took ~20 minutes and I repeated each of them 3 times.

I still have the 'raw' outputs of pyperformance if needed. From those I see that there are at least 3 values for each test and also there is also time named 'warmup' for each of the tests.

Attaching one of them as example.

Revision history for this message

Dariusz Gadomski (dgadomski) wrote on 2017-11-28:

#35

xenial-fsp-01.json Edit (774.6 KiB, application/json)

Xenial pyperformance results with -fstack-protector-strong changed to -fstack-protector.

Matthias Klose (doko) on 2017-12-04

description:

updated

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2017-12-04: Please test proposed package

#36

Hello Major, or anyone else affected,

Accepted python2.7 into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/python2.7/2.7.13-2ubuntu0.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in python2.7 (Ubuntu Zesty):
status:	New → Fix Committed
tags:	added: verification-needed verification-needed-zesty

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2017-12-04:

#37

Hello Major, or anyone else affected,

Accepted python2.7 into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/python2.7/2.7.12-1ubuntu0~16.04.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in python2.7 (Ubuntu Xenial):
status:	Confirmed → Fix Committed
tags:	added: verification-needed-xenial

Revision history for this message

Matthias Klose (doko) wrote on 2017-12-05:

#38

_math.o is now built without -fPIC for the static builds.

tags:	added: verification-done-zesty removed: verification-needed-zesty
tags:	added: verification-done-xenial removed: verification-needed-xenial
tags:	added: verification-done removed: verification-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-01-18:

#39

This bug was fixed in the package python2.7 - 2.7.12-1ubuntu0~16.04.3

---------------
python2.7 (2.7.12-1ubuntu0~16.04.3) xenial-proposed; urgency=medium

  * Some performance improvements: LP: #1638695.
    - Build the _math.o object file without -fPIC for static builds.
  * Rename md5_* functions to _Py_md5_*. Closes: #868366. LP: #1734109.
  * Explicitly use the system python for byte compilation in postinst scripts.
    LP: #1682934.
  * Fix issue #22636: Avoid shell injection problems with
    ctypes.util.find_library(). LP: #1512068.

-- Matthias Klose <email address hidden> Mon, 04 Dec 2017 15:50:18 +0100

Changed in python2.7 (Ubuntu Xenial):
status:	Fix Committed → Fix Released

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2018-01-18: Update Released

#40

The verification of the Stable Release Update for python2.7 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message

Matthias Klose (doko) wrote on 2018-01-25:

#41

zesty is EOL

Changed in python2.7 (Ubuntu Zesty):
status:	Fix Committed → Won't Fix

Ubuntu
python2.7 package

Python 2.7.12 performance regression

Bug Description

Other bug subscribers

Patches

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
python2.7 (Ubuntu)	Fix Released	High	Unassigned
Xenial	Fix Released	High	Unassigned
Zesty	Won't Fix	Undecided	Unassigned

Ubuntupython2.7 package

Python 2.7.12 performance regression

Bug Description

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
python2.7 package