MadGraph5_aMC@NLO

Merge lp:~maddevelopers/mg5amcnlo/smart_zeros into lp:~maddevelopers/mg5amcnlo/2.3.4

smart_zeros
Merge into 2.3.4

Proposed by Valentin Hirschi on 2016-03-02

Status:	Merged
Merged at revision:	393
Proposed branch:	lp:~maddevelopers/mg5amcnlo/smart_zeros
Merge into:	lp:~maddevelopers/mg5amcnlo/2.3.4
Diff against target:	782 lines (+306/-115) (has conflicts) 13 files modified Template/NLO/SubProcesses/makefile_loop.inc (+13/-2) Template/loop_material/StandAlone/SubProcesses/makefile (+13/-2) madgraph/iolibs/export_v4.py (+4/-4) madgraph/iolibs/file_writers.py (+1/-0) madgraph/iolibs/template_files/loop_optimized/helas_calls_split.inc (+5/-3) madgraph/iolibs/template_files/loop_optimized/loop_matrix_standalone.inc (+7/-5) madgraph/iolibs/template_files/loop_optimized/mp_compute_loop_coefs.inc (+9/-3) madgraph/iolibs/template_files/loop_optimized/mp_helas_calls_split.inc (+3/-4) madgraph/iolibs/template_files/loop_optimized/polynomial.inc (+6/-8) madgraph/loop/loop_exporters.py (+46/-21) madgraph/various/process_checks.py (+4/-0) madgraph/various/q_polynomial.py (+194/-63) tests/time_db (+1/-0) Text conflict in madgraph/various/process_checks.py
To merge this branch:	bzr merge lp:~maddevelopers/mg5amcnlo/smart_zeros
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Hua-Sheng Shao		2016-03-02	Approve on 2016-04-05
Review via email: mp+287741@code.launchpad.net

Description of the change

This branch brings small but significant improvement to the computation of loop polynomial coefficients in MadLoop.

Profiling shows that the overall majority of the computation time of these coefficients comes from the 'update_wl' functions that implement the tensorial product of the 'loop wavefunctions polynomials' with the 'vertex polynomial updaters' provided by aloha.

Basically I identified that within the current framework there is three ways of implementing this tensorial product:

a) Loop over only the wavefunction/updater indices and write explicitly the expanded tensor product (so expand explicitly over the loop wf and updater coefs).
This was the only technique used so far before this branch.

b) Perform all 5 do-loops ( 3 over loop wf+updater indices and 2 over polynomial coefficients) but looping first over the updater indices and filtering over the updater coefficients which are zero

c) Same as b, so all 5 do-loops, but this time first over the loop wavefunction coefficients with a filter on the loop wf coefs which are zero.

So a gain can be obtained by wisely choosing which strategy to use for the different cases of loop_wavefunction rank and updated rank.
The following choice is made, based on several empirical profiling.

-------------------------------

if ( loop_wf rank == 0 ) or (updater rank == 0 ) or (loop_wf rank == updater rank == 1)
-> Keep the original strategy a), which seems to be faster

else ( loop_wf rank ) >= ( updater rank )
-> Use the strategy b) which exploits the fact that the loop polynomial is high rank in comparison to the vertex polynomial.

else
-> Use the strategy c) which exploits the fact that the vertex polynomial is high rank in comparison to the loop wf polynomial.
This is typically not a very relevant change as it is basically used only for the combination
(loop wf rank =1 , updater rank =2) in effective theories.

-------------------------------

So the introduction of strategy b) is really what brings the improvement. However for this improvement to be large, it is necessary that polynomial.f be compiled without '-fbounds-check' and preferably with '-O3'.
I have therefore slight altered the makefiles so that this source files in particular enforces the above, irrespectively of what is in make_opts.

The improvements obtained are not a game-changer, but still welcome (these are gains relative to the loop polynomial coefficient computation only [i.e. not relative to the timing incl. loop reduction]):
Notice that even though the implementation is such that the gain should be larger for more complicated processes, this is not guaranteed as it also depends on the sparsity of the updater polynomial coefficients.

u d~ > e+ ve -> No gain, code completely identical in this case
g g > t t~ -> -25%
g g > t t~ g -> -18%
g g > t t~ g g -> -9%
u u~ > d d~ s s~ -> -27%
u u~ > d d~ s s~ g -> -23%
g g > x0 g (HEFT process) -> -40%
g g > x0 g g (HEFT process) -> -19%
g g > y2 g (y2 = massive spin-2 boson) -> -43%
g g > y2 g g (y2 = massive spin-2 boson) -> -31%
g g > h h -> -58%
g g > h h h -> -60%
g g > h h h h -> -64%
g g > z z -> -41%
g g > z z z -> -43%

The improvement is better for loop-induced processes, but unfortunately this is also where we are anyways already dominated by the reduction time, so that it doesn't matter much :(.

Finally one crazy idea would be to dynamically chose optimally between the three methods above for each UPDATE_WL call, with a training session. but ok, let's not go there...
And of course another avenue of optimization is to properly chose the optimal l-cut location of each loop so as to exactly maximize over the number of loop wavefunctions recycled.
But this was my one improvement of the year on the loop polynomial computation, anything more will wait for 2017 at least.

(Originally the hope was to get even larger gains by having aloha setting what coefficients can be zero and keeping track overall of the list of non-zero coefficients.
But after 2 long days of testing and trying hard, the full-fledged tracking of zero coefs seems more expensive than what it saves, except when done with a partial filtering like above. Well... at least I tried.)

Anyway, as far as the review goes (I picked you Olivier, because we already discussed this a bit, but others who are reading this are welcome to give their opinion), there is not much to be done here:
Just make sure that things go smooth for a couple runs and also double-check a couple of the timing improvements above (with the check timing -reuse command) and give me a green light (pretty please).

lp:~maddevelopers/mg5amcnlo/smart_zeros updated on 2016-03-02

350. By Valentin Hirschi on 2016-03-02: 1. Returned 'make_it_quick' internal option of check timing to the default false.
351. By Valentin Hirschi on 2016-03-02: 1. Removed so useless comments in the code

Revision history for this message

Rikkert Frederix (frederix) wrote on 2016-03-02:

Hi Valentin,

If it doesn't compile with the '-fbounds-check' doesn't that signify that some arrays go out of bound, which might lead to compiler dependent problems?

Cheers,
Rik

Revision history for this message

Valentin Hirschi (valentin-hirschi) wrote on 2016-03-02:

Hi Rik,

> Hi Valentin,
>
> If it doesn't compile with the '-fbounds-check' doesn't that signify that some
> arrays go out of bound, which might lead to compiler dependent problems?

Well first of all, it is only the file 'polynomial.f' for which I vetoed the compiler flag '-fbounds-check' and forced the optimization '-O3'.

They can still be turned on changing the following variables in the MadLoop makefile:

POLYNOMIAL_OPTIMIZATION = -O3
POLYNOMIAL_BOUNDS_CHECK =

(defined as above by default)

Anyway, the fact that by default '-fbounds-check' is absent means, indeed, that if an array goes out of bound the behavior can be anything: ranging from incorrect values retrieved from memory to segmentation faults (system+compiler-dependent behavior indeed).
But that's OK because it should not happen in this very generic file 'polynomial.f' which I have now tested for many processes *with* the bound checks.
It is the usual concept: one should test with boundchecks but then remove it for production.
We don't necessarily do this for other files because the gain is marginal, but this is not the case in polynomial.f anymore.

Also, if some out of bounds happen, then there is no way MadLoop's answer will be stable (no matter what happens with the memory access). So I think it is safe to assume that if incorrect bounds are present in polynomial.f, we would immediately realize it in any use of MadLoop that monitors its stability (as it should).

Cheers,

> Cheers,
> Rik

lp:~maddevelopers/mg5amcnlo/smart_zeros updated on 2016-03-03

352. By Valentin Hirschi on 2016-03-03: 1. Removed all comments and unecessary now-dummy code.

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2016-04-04:

Hi,

Is this merged in 2.3.4?
I plan to release 2.3.4 extremelly soon.
So what is the status of this branch?
Should it wait the next version? If not please finish this review asap.

Cheers,

Olivier

Revision history for this message

Valentin Hirschi (valentin-hirschi) wrote on 2016-04-04:

Not Yet. Huasheng, could you accept the merge? I'll carry on with it them.

Olivier, please wait until at least wednesday for the release, as we should have the arxiv number for Ninja by then.

Cheers

Revision history for this message

Hua-Sheng Shao (erdissshaw) on 2016-04-05:

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

MadDevelopers

 === modified file 'Template/NLO/SubProcesses/makefile_loop.inc'
 --- Template/NLO/SubProcesses/makefile_loop.inc	2014-09-30 06:56:29 +0000
 +++ Template/NLO/SubProcesses/makefile_loop.inc	2016-03-03 23:14:51 +0000
@@ -3,11 +3,16 @@
  LIBDIR = ../../../lib/
  LOOPLIB= libMadLoop.a
++# For the compilation of the MadLoop file polynomial.f it makes a big difference to use -O3 and
++# to turn off the bounds check. These can however be modified here if really necessary.
++POLYNOMIAL_OPTIMIZATION = -O3
++POLYNOMIAL_BOUNDS_CHECK =
++
  LINKLIBS =  -L$(LIBDIR) -lcts -ldhelas -lmodel %(link_tir_libs)s
  LIBS = $(LIBDIR)libcts.$(libext) $(LIBDIR)libdhelas.$(libext)	\
  	$(LIBDIR)libmodel.$(libext) %(tir_libs)s
--PROCESS= loop_matrix.o improve_ps.o born_matrix.o loop_num.o CT_interface.o	MadLoopCommons.o \
--		 $(patsubst %(dotf)s,%(doto)s,$(wildcard polynomial.f)) \
++PROCESS= $(patsubst %(dotf)s,%(doto)s,$(wildcard polynomial.f)) \
++         loop_matrix.o improve_ps.o born_matrix.o loop_num.o CT_interface.o	MadLoopCommons.o \
  		 $(patsubst %(dotf)s,%(doto)s,$(wildcard MadLoopParamReader.f)) \
  		 $(patsubst %(dotf)s,%(doto)s,$(wildcard helas_calls*.f)) \
  		 $(patsubst %(dotf)s,%(doto)s,$(wildcard jamp?_calls_*.f)) \
@@ -21,6 +26,12 @@
  		 $(patsubst %(dotf)s,%(doto)s,$(wildcard GOLEM_interface.f)) \
  		 $(patsubst %(dotf)s,%(doto)s,$(wildcard compute_color_flows.f))
++# This is the core of madloop computationally wise, so make sure to turn optimizations on and bound checks off.
++# We use %%olynomial.o and not directly polynomial.o because we want it to match when both doing make check here
++# or make OLP one directory above
++%%olynomial.o : %%olynomial.f
++	$(FC) $(patsubst -O%%,, $(subst -fbounds-check,,$(FFLAGS))) $(POLYNOMIAL_OPTIMIZATION) $(POLYNOMIAL_BOUNDS_CHECK) -c $< -o $@ $(LOOP_INCLUDE)
++
  %(doto)s : %(dotf)s
  	$(FC) $(FFLAGS) -c $< %(tir_include)s
 === modified file 'Template/loop_material/StandAlone/SubProcesses/makefile'
 --- Template/loop_material/StandAlone/SubProcesses/makefile	2016-02-12 01:48:16 +0000
 +++ Template/loop_material/StandAlone/SubProcesses/makefile	2016-03-03 23:14:51 +0000
@@ -9,6 +9,11 @@
      ROOT = ..
  endif
++# For the compilation of the MadLoop file polynomial.f it makes a big difference to use -O3 and
++# to turn off the bounds check. These can however be modified here if really necessary.
++POLYNOMIAL_OPTIMIZATION = -O3
++POLYNOMIAL_BOUNDS_CHECK =
++
  include $(ROOT)/Source/make_opts
  include $(ROOT)/SubProcesses/MadLoop_makefile_definitions
  SHELL = /bin/bash
@@ -23,12 +28,12 @@
  LIBS =  $(LIBDIR)libdhelas.$(libext) $(LIBDIR)libmodel.$(libext) $(LOOP_LIBS)
  PROCESS= MadLoopParamReader.o MadLoopCommons.o \
++ $(patsubst $(DOTF),$(DOTO),$(wildcard polynomial.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard loop_matrix.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard improve_ps.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard born_matrix.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard CT_interface.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard loop_num.f)) \
-- $(patsubst $(DOTF),$(DOTO),$(wildcard polynomial.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard helas_calls*.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard jamp?_calls_*.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard mp_born_amps_and_wfs.f)) \
@@ -42,12 +47,12 @@
   $(patsubst $(DOTF),$(DOTO),$(wildcard compute_color_flows.f))
  OLP_PROCESS= MadLoopParamReader.o MadLoopCommons.o \
++ $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/polynomial.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/loop_matrix.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/improve_ps.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/born_matrix.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/CT_interface.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/loop_num.f)) \
-- $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/polynomial.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/helas_calls*.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/jamp?_calls_*.f)) \
   $(patsubst $(DOTF),$(DOTO),$(wildcard $(LOOP_PREFIX)*/mp_born_amps_and_wfs.f)) \
@@ -70,6 +75,12 @@
  $(CHECK_SA_BORN_SPLITORDERS):  check_sa_born_splitOrders.o $(patsubst $(DOTF),$(DOTO),$(wildcard *born_matrix.f)) makefile $(LIBDIR)libdhelas.$(libext) $(LIBDIR)libmodel.$(libext)
  	$(FC) $(FFLAGS) -o $(CHECK_SA_BORN_SPLITORDERS) check_sa_born_splitOrders.o $(patsubst $(DOTF),$(DOTO),$(wildcard *born_matrix.f)) -L$(LIBDIR) -ldhelas -lmodel
++# This is the core of madloop computationally wise, so make sure to turn optimizations on and bound checks off.
++# We use %olynomial.o and not directly polynomial.o because we want it to match when both doing make check here
++# or make OLP one directory above
++%olynomial.o : %olynomial.f
++	$(FC) $(patsubst -O%,, $(subst -fbounds-check,,$(FFLAGS))) $(POLYNOMIAL_OPTIMIZATION) $(POLYNOMIAL_BOUNDS_CHECK) -c $< -o $@ $(LOOP_INCLUDE)
++
  $(DOTO) : $(DOTF)
  	$(FC) $(FFLAGS) -c $< -o $@ $(LOOP_INCLUDE)
 === modified file 'madgraph/iolibs/export_v4.py'
 --- madgraph/iolibs/export_v4.py	2016-03-02 04:03:52 +0000
 +++ madgraph/iolibs/export_v4.py	2016-03-03 23:14:51 +0000
@@ -786,10 +786,12 @@
          #copy Helas Template
          cp(MG5DIR + '/aloha/template_files/Makefile_F', write_dir+'/makefile')
          if any([any(['L' in tag for tag in d[1]]) for d in wanted_lorentz]):
--            cp(MG5DIR + '/aloha/template_files/aloha_functions_loop.f', write_dir+'/aloha_functions.f')
++            cp(MG5DIR + '/aloha/template_files/aloha_functions_loop.f',
++                                                 write_dir+'/aloha_functions.f')
              aloha_model.loop_mode = False
          else:
--            cp(MG5DIR + '/aloha/template_files/aloha_functions.f', write_dir+'/aloha_functions.f')
++            cp(MG5DIR + '/aloha/template_files/aloha_functions.f',
++                                                 write_dir+'/aloha_functions.f')
          create_aloha.write_aloha_file_inc(write_dir, '.f', '.o')
          # Make final link in the Process
@@ -5217,7 +5219,6 @@
          if self.opt['mp']:
              self.create_intparam_def(dp=False,mp=True)
--
          # definition of the coupling.
          self.create_actualize_mp_ext_param_inc()
          self.create_coupl_inc()
@@ -5245,7 +5246,6 @@
      def copy_standard_file(self):
          """Copy the standard files for the fortran model."""
--
          #copy the library files
          file_to_link = ['formats.inc','printout.f', \
 === modified file 'madgraph/iolibs/file_writers.py'
 --- madgraph/iolibs/file_writers.py	2015-10-01 16:00:08 +0000
 +++ madgraph/iolibs/file_writers.py	2016-03-03 23:14:51 +0000
@@ -177,6 +177,7 @@
                       '^type(?!\s*\()\s*.+\s*$': ('^endtype', 2),
                       '^do(?!\s+\d+)\s+': ('^enddo\s*$', 2),
                       '^subroutine': ('^end\s*$', 0),
++                     '^module': ('^end\s*$', 0),
                       'function': ('^end\s*$', 0)}
      single_indents = {'^else\s*$':-2,
                        '^else\s*if.+then\s*$':-2}
 === modified file 'madgraph/iolibs/template_files/loop_optimized/helas_calls_split.inc'
 --- madgraph/iolibs/template_files/loop_optimized/helas_calls_split.inc	2016-02-23 19:44:10 +0000
 +++ madgraph/iolibs/template_files/loop_optimized/helas_calls_split.inc	2016-03-03 23:14:51 +0000
@@ -1,5 +1,9 @@
        SUBROUTINE %(proc_prefix)s%(bunch_name)s_%(bunch_number)d(P,NHEL,H,IC)
--C
++C
++C Modules
++C
++      use %(proc_prefix)sPOLYNOMIAL_CONSTANTS
++C
        IMPLICIT NONE
+ C
  C CONSTANTS
@@ -18,8 +22,6 @@
  	  PARAMETER (NLOOPAMPS=%(nloopamps)d)
        INTEGER    NWAVEFUNCS,NLOOPWAVEFUNCS
        PARAMETER (NWAVEFUNCS=%(nwavefuncs)d,NLOOPWAVEFUNCS=%(nloopwavefuncs)d)
--      include 'loop_max_coefs.inc'
--      include 'coef_specs.inc'
        %(real_dp_format)s     ZERO
        PARAMETER (ZERO=0D0)
  	  %(real_mp_format)s     MP__ZERO
 === modified file 'madgraph/iolibs/template_files/loop_optimized/loop_matrix_standalone.inc'
 --- madgraph/iolibs/template_files/loop_optimized/loop_matrix_standalone.inc	2016-02-23 19:44:10 +0000
 +++ madgraph/iolibs/template_files/loop_optimized/loop_matrix_standalone.inc	2016-03-03 23:14:51 +0000
@@ -11,7 +11,11 @@
  c and external lines W(0:6,NEXTERNAL)
+ C
  %(process_lines)s
--C
++C
++C Modules
++C
++      use %(proc_prefix)sPOLYNOMIAL_CONSTANTS
++C
        IMPLICIT NONE
+ C
  C USER CUSTOMIZABLE OPTIONS
@@ -74,8 +78,6 @@
        PARAMETER (NEXTERNAL=%(nexternal)d)
        INTEGER    NWAVEFUNCS,NLOOPWAVEFUNCS
        PARAMETER (NWAVEFUNCS=%(nwavefuncs)d,NLOOPWAVEFUNCS=%(nloopwavefuncs)d)
--      include 'loop_max_coefs.inc'
--      include 'coef_specs.inc'
  	  INTEGER    NCOMB
        PARAMETER (NCOMB=%(ncomb)d)
        %(real_dp_format)s     ZERO
@@ -189,7 +191,7 @@
  ## if(ComputeColorFlows) {
  	  %(real_dp_format)s BUFFRES(0:3,0:NSQUAREDSO)
  ## }
--	  %(complex_dp_format)s COEFS(MAXLWFSIZE,0:VERTEXMAXCOEFS-1,MAXLWFSIZE)
++      %(complex_dp_format)s COEFS(MAXLWFSIZE,0:VERTEXMAXCOEFS-1,MAXLWFSIZE)
        %(complex_dp_format)s CFTOT
  	  LOGICAL FOUNDHELFILTER,FOUNDLOOPFILTER
  	  DATA FOUNDHELFILTER/.TRUE./
@@ -561,7 +563,7 @@
  C SETUP OF THE COMMON STARTING EXTERNAL LOOP WAVEFUNCTION
  C IT IS ALSO PS POINT INDEPENDENT, SO IT CAN BE DONE HERE.
    DO I=0,3
--    PL(I,0)=(0.0d0,0.0d0)
++    PL(I,0)=DCMPLX(0.0d0,0.0d0)
    ENDDO
    DO I=1,MAXLWFSIZE
      DO J=0,LOOPMAXCOEFS-1
 === modified file 'madgraph/iolibs/template_files/loop_optimized/mp_compute_loop_coefs.inc'
 --- madgraph/iolibs/template_files/loop_optimized/mp_compute_loop_coefs.inc	2016-02-25 19:31:10 +0000
 +++ madgraph/iolibs/template_files/loop_optimized/mp_compute_loop_coefs.inc	2016-03-03 23:14:51 +0000
@@ -8,6 +8,10 @@
+ C
  %(process_lines)s
+ C
++C Modules
++C
++      use %(proc_prefix)sPOLYNOMIAL_CONSTANTS
++C
        IMPLICIT NONE
+ C
  C CONSTANTS
@@ -28,8 +32,6 @@
        PARAMETER (NEXTERNAL=%(nexternal)d)
        INTEGER    NWAVEFUNCS,NLOOPWAVEFUNCS
        PARAMETER (NWAVEFUNCS=%(nwavefuncs)d,NLOOPWAVEFUNCS=%(nloopwavefuncs)d)
--      include 'loop_max_coefs.inc'
--      include 'coef_specs.inc'
  	  INTEGER    NCOMB
        PARAMETER (NCOMB=%(ncomb)d)
  	  %(real_mp_format)s    ZERO
@@ -206,8 +208,12 @@
  ENDDO
  DO I=0,3
--  PL(I,0)=(ZERO,ZERO)
++  PL(I,0)=CMPLX(ZERO,ZERO,KIND=16)
++  IF (.NOT.COMPUTE_INTEGRAND_IN_QP) THEN
++    DP_PL(I,0)=DCMPLX(0.0d0,0.0d0)
++  ENDIF
  ENDDO
++
  ## if(AmplitudeReduction){
  IF (.NOT.SKIP_LOOPNUM_COEFS_CONSTRUCTION) THEN
  ## }
 === modified file 'madgraph/iolibs/template_files/loop_optimized/mp_helas_calls_split.inc'
 --- madgraph/iolibs/template_files/loop_optimized/mp_helas_calls_split.inc	2016-02-23 19:44:10 +0000
 +++ madgraph/iolibs/template_files/loop_optimized/mp_helas_calls_split.inc	2016-03-03 23:14:51 +0000
@@ -1,5 +1,6 @@
        SUBROUTINE %(proc_prefix)s%(bunch_name)s_%(bunch_number)d(P,NHEL,H,IC)
--C
++C
++      use %(proc_prefix)sPOLYNOMIAL_CONSTANTS
        IMPLICIT NONE
+ C
  C CONSTANTS
@@ -19,8 +20,6 @@
  	  PARAMETER (NLOOPAMPS=%(nloopamps)d)
        INTEGER    NWAVEFUNCS,NLOOPWAVEFUNCS
        PARAMETER (NWAVEFUNCS=%(nwavefuncs)d,NLOOPWAVEFUNCS=%(nloopwavefuncs)d)
--      include 'loop_max_coefs.inc'
--      include 'coef_specs.inc'
  	  %(real_mp_format)s     ZERO
        PARAMETER (ZERO=0.0e0_16)
  	  %(complex_mp_format)s     IZERO
@@ -38,7 +37,7 @@
  C LOCAL VARIABLES
+ C
  	  INTEGER I,J,K
--	  %(complex_mp_format)s COEFS(MAXLWFSIZE,0:VERTEXMAXCOEFS-1,MAXLWFSIZE)
++	  %(complex_mp_format)s COEFS(MAXLWFSIZE,0:VERTEXMAXCOEFS-1,MAXLWFSIZE)
+ C
  C GLOBAL VARIABLES
+ C
 === modified file 'madgraph/iolibs/template_files/loop_optimized/polynomial.inc'
 --- madgraph/iolibs/template_files/loop_optimized/polynomial.inc	2016-02-23 19:44:10 +0000
 +++ madgraph/iolibs/template_files/loop_optimized/polynomial.inc	2016-03-03 23:14:51 +0000
@@ -3,6 +3,7 @@
  C MULTIPLY BY THE BORN
        SUBROUTINE %(mp_prefix)s%(proc_prefix)sCREATE_LOOP_COEFS(LOOP_WF,RANK,LCUT_SIZE,LOOP_GROUP_NUMBER,SYMFACT,MULTIPLIER,COLOR_ID,HELCONFIG)
++	  USE %(proc_prefix)sPOLYNOMIAL_CONSTANTS
  	  implicit none
+ C
  C CONSTANTS
@@ -17,8 +18,6 @@
        PARAMETER (IMAG1=(ZERO,ONE))
  	  %(complex_format)s CMPLX_ZERO
  	  PARAMETER (CMPLX_ZERO=(ZERO,ZERO))
--      include 'loop_max_coefs.inc'
--      include 'coef_specs.inc'
        INTEGER    NCOLORROWS
  	  PARAMETER (NCOLORROWS=%(nloopamps)d)
  	  INTEGER    NLOOPGROUPS
@@ -97,6 +96,7 @@
  C amplitude level so that no multiplication is performed.
        SUBROUTINE %(mp_prefix)s%(proc_prefix)sCREATE_LOOP_COEFS(LOOP_WF,RANK,LCUT_SIZE,LOOP_GROUP_NUMBER,SYMFACT,MULTIPLIER)
++	  USE %(proc_prefix)sPOLYNOMIAL_CONSTANTS
  	  implicit none
+ C
  C CONSTANTS
@@ -107,8 +107,6 @@
        PARAMETER (IMAG1=(ZERO,ONE))
  	  %(complex_format)s CMPLX_ZERO
  	  PARAMETER (CMPLX_ZERO=(ZERO,ZERO))
--      include 'loop_max_coefs.inc'
--      include 'coef_specs.inc'
  	  INTEGER    NLOOPGROUPS
        PARAMETER (NLOOPGROUPS=%(nloop_groups)d)
  	  INTEGER    NCOMB
@@ -143,16 +141,13 @@
  C       Just a handy subroutine to modify the coefficients for the
  C       tranformation q_loop -> -q_loop
  C       It is only used for the NINJA interface
++        USE %(proc_prefix)sPOLYNOMIAL_CONSTANTS
          IMPLICIT NONE
--      	include 'loop_max_coefs.inc'
          INTEGER I, NCOEFS
  		%(complex_format)s POLYNOMIAL(0:NCOEFS-1)
--        INTEGER COEFTORANK_MAP(0:LOOPMAXCOEFS-1)
--		%(coef_to_rank_map_definition)s
--
          DO I=0,NCOEFS-1
            IF (MOD(COEFTORANK_MAP(I),2).eq.1) then
              POLYNOMIAL(I)=-POLYNOMIAL(I)
@@ -161,3 +156,6 @@
        END
  ## }
++
++C Now the routines to update the wavefunctions
++
 === modified file 'madgraph/loop/loop_exporters.py'
 --- madgraph/loop/loop_exporters.py	2016-02-24 13:54:17 +0000
 +++ madgraph/loop/loop_exporters.py	2016-03-03 23:14:51 +0000
@@ -2070,18 +2070,7 @@
          # Start from the routine in the template
          replace_dict = copy.copy(matrix_element.rep_dict)
--
--        # Write the definition of the coef_to_rank_map
--        coef_to_rank_map_definition = []
--        for rank in range(replace_dict['maxrank']+1):
--            start = q_polynomial.get_number_of_coefs_for_rank(rank-1)
--            end   = q_polynomial.get_number_of_coefs_for_rank(rank)-1
--            coef_to_rank_map_definition.append(
--'DATA (COEFTORANK_MAP(I),I=%(start)d,%(end)d)/%(n_entries)d*%(rank)d/'%
--{'start': start,'end': end,'n_entries': end-start+1,'rank': rank})
--        replace_dict['coef_to_rank_map_definition']=\
--                                          '\n'.join(coef_to_rank_map_definition)
--
++
          dp_routine = open(os.path.join(self.template_dir,'polynomial.inc')).read()
          mp_routine = open(os.path.join(self.template_dir,'polynomial.inc')).read()
          # The double precision version of the basic polynomial routines, such as
@@ -2106,11 +2095,19 @@
          # Initialize the polynomial routine writer
          poly_writer=q_polynomial.FortranPolynomialRoutines(
--                                         matrix_element.get_max_loop_rank(),
--                                         sub_prefix=replace_dict['proc_prefix'])
++            matrix_element.get_max_loop_rank(),
++            updater_max_rank = matrix_element.get_max_loop_vertex_rank(),
++            sub_prefix=replace_dict['proc_prefix'],
++            proc_prefix=replace_dict['proc_prefix'],
++            mp_prefix='')
++        # Write the polynomial constant module common to all
++        writer.writelines(poly_writer.write_polynomial_constant_module()+'\n')
++
          mp_poly_writer=q_polynomial.FortranPolynomialRoutines(
--                    matrix_element.get_max_loop_rank(),coef_format='complex*32',
--                                   sub_prefix='MP_'+replace_dict['proc_prefix'])
++            matrix_element.get_max_loop_rank(),
++            updater_max_rank = matrix_element.get_max_loop_vertex_rank(),
++            coef_format='complex*32', sub_prefix='MP_'+replace_dict['proc_prefix'],
++            proc_prefix=replace_dict['proc_prefix'], mp_prefix='MP_')
          # The eval subroutine
          subroutines.append(poly_writer.write_polynomial_evaluator())
          subroutines.append(mp_poly_writer.write_polynomial_evaluator())
@@ -2120,12 +2117,40 @@
          # The merging one for creating the loop coefficients
          subroutines.append(poly_writer.write_wl_merger())
          subroutines.append(mp_poly_writer.write_wl_merger())
--        # Now the udpate subroutines
          for wl_update in matrix_element.get_used_wl_updates():
--            subroutines.append(poly_writer.write_wl_updater(\
--                                                     wl_update[0],wl_update[1]))
--            subroutines.append(mp_poly_writer.write_wl_updater(\
--                                                     wl_update[0],wl_update[1]))
++            # We pick here the most appropriate way of computing the
++            # tensor product depending on the rank of the two tensors.
++            # The various choices below come out from a careful comparison of
++            # the different methods using the valgrind profiler
++            if wl_update[0]==wl_update[1]==1 or wl_update[0]==0 or wl_update[1]==0:
++                # If any of the rank is 0, or if they are both equal to 1,
++                # then we are better off using the full expanded polynomial,
++                # and let the compiler optimize it.
++                subroutines.append(poly_writer.write_expanded_wl_updater(\
++                                                     wl_update[0],wl_update[1]))
++                subroutines.append(mp_poly_writer.write_expanded_wl_updater(\
++                                                     wl_update[0],wl_update[1]))
++            elif wl_update[0] >= wl_update[1]:
++                # If the loop polynomial is larger then we will filter and loop
++                # over the vertex coefficients first. The smallest product for
++                # which the routines below could be used is then
++                # loop_rank_2 x vertex_rank_1
++                subroutines.append(poly_writer.write_compact_wl_updater(\
++                  wl_update[0],wl_update[1],loop_over_vertex_coefs_first=True))
++                subroutines.append(mp_poly_writer.write_compact_wl_updater(\
++                  wl_update[0],wl_update[1],loop_over_vertex_coefs_first=True))
++            else:
++                # This happens only when the rank of the updater (vertex coef)
++                # is larger than the one of the loop coef and none of them is
++                # zero. This never happens in renormalizable theories but it
++                # can happen in the HEFT ones or other effective ones. In this
++                # case the typicaly use of this routine if for the product
++                # loop_rank_1 x vertex_rank_2
++                subroutines.append(poly_writer.write_compact_wl_updater(\
++                  wl_update[0],wl_update[1],loop_over_vertex_coefs_first=False))
++                subroutines.append(mp_poly_writer.write_compact_wl_updater(\
++                  wl_update[0],wl_update[1],loop_over_vertex_coefs_first=False))
++
          writer.writelines('\n\n'.join(subroutines),
                                         context=self.get_context(matrix_element))
 === modified file 'madgraph/various/process_checks.py'
 --- madgraph/various/process_checks.py	2016-03-02 05:56:15 +0000
 +++ madgraph/various/process_checks.py	2016-03-03 23:14:51 +0000
@@ -1315,7 +1315,11 @@
          if not make_it_quick:
              target_pspoints_number = max(int(30.0/time_per_ps_estimate)+1,50)
          else:
++<<<<<<< TREE
              target_pspoints_number = 10
++=======
++            target_pspoints_number = 1000
++>>>>>>> MERGE-SOURCE
          logger.info("Checking timing for process %s "%proc_name+\
                                      "with %d PS points."%target_pspoints_number)
 === modified file 'madgraph/various/q_polynomial.py'
 --- madgraph/various/q_polynomial.py	2016-02-23 19:44:10 +0000
 +++ madgraph/various/q_polynomial.py	2016-03-03 23:14:51 +0000
@@ -118,11 +118,23 @@
  class PolynomialRoutines(object):
      """ The mother class to output the polynomial subroutines """
--    def __init__(self, max_rank, coef_format='complex*16', sub_prefix=''
--                                                                ,line_split=30):
++    def __init__(self, max_rank, updater_max_rank=None,
++                        coef_format='complex*16', sub_prefix='',
++                        proc_prefix='',mp_prefix='',
++                        line_split=30):
          self.coef_format=coef_format
          self.sub_prefix=sub_prefix
++        self.proc_prefix=proc_prefix
++        self.mp_prefix=mp_prefix
++        if updater_max_rank is None:
++            self.updater_max_rank = max_rank
++        else:
++            if updater_max_rank > max_rank:
++                raise PolynomialError, "The updater max rank must be at most"+\
++                                                " equal to the overall max rank"
++            else:
++                self.updater_max_rank = updater_max_rank
          if coef_format=='complex*16':
              self.rzero='0.0d0'
              self.czero='(0.0d0,0.0d0)'
@@ -138,10 +150,70 @@
                              "The rank of a q-polynomial should be 0 or positive"
          self.max_rank=max_rank
          self.pq=Polynomial(max_rank)
++
++        # A useful replacement dictionary
++        self.rep_dict = {'sub_prefix':self.sub_prefix,
++                         'proc_prefix':self.proc_prefix,
++                         'mp_prefix':self.mp_prefix,
++                         'coef_format':self.coef_format}
  class FortranPolynomialRoutines(PolynomialRoutines):
      """ A daughter class to output the subroutine in the fortran format"""
++    def write_polynomial_constant_module(self):
++        """ Writes a fortran90 module that defined polynomial constants objects."""
++
++        # Start with the polynomial constants module header
++        polynomial_constant_lines = []
++        polynomial_constant_lines.append(
++"""MODULE %sPOLYNOMIAL_CONSTANTS
++implicit none
++include 'coef_specs.inc'
++include 'loop_max_coefs.inc'
++"""%self.sub_prefix)
++        # Add the N coef for rank
++        polynomial_constant_lines.append(
++'C Map associating a rank to each coefficient position')
++        polynomial_constant_lines.append(
++                                     'INTEGER COEFTORANK_MAP(0:LOOPMAXCOEFS-1)')
++        for rank in range(self.max_rank+1):
++            start = get_number_of_coefs_for_rank(rank-1)
++            end   = get_number_of_coefs_for_rank(rank)-1
++            polynomial_constant_lines.append(
++'DATA COEFTORANK_MAP(%(start)d:%(end)d)/%(n_entries)d*%(rank)d/'%
++{'start': start,'end': end,'n_entries': end-start+1,'rank': rank})
++
++        polynomial_constant_lines.append(
++'\nC Map defining the number of coefficients for a symmetric tensor of a given rank')
++        polynomial_constant_lines.append(
++"""INTEGER NCOEF_R(0:%(max_rank)d)
++DATA NCOEF_R/%(ranks)s/"""%{'max_rank':self.max_rank,'ranks':','.join([
++      str(get_number_of_coefs_for_rank(r)) for r in range(0,self.max_rank+1)])})
++        polynomial_constant_lines.append(
++'\nC Map defining the coef position resulting from the multiplication of two lower rank coefs.')
++        mult_matrix = [[
++          self.pq.get_coef_position(self.pq.get_coef_at_position(coef_a)+
++                                    self.pq.get_coef_at_position(coef_b))
++            for coef_b in range(0,get_number_of_coefs_for_rank(self.updater_max_rank))]
++              for coef_a in range(0,get_number_of_coefs_for_rank(self.max_rank))]
++
++        polynomial_constant_lines.append(
++'INTEGER COMB_COEF_POS(0:LOOPMAXCOEFS-1,0:%(max_updater_rank)d)'\
++%{'max_updater_rank':(get_number_of_coefs_for_rank(self.updater_max_rank)-1)})
++
++        for j, line in enumerate(mult_matrix):
++            chunk_size = 20
++            for k in xrange(0, len(line), chunk_size):
++                polynomial_constant_lines.append(
++                "DATA COMB_COEF_POS(%3r,%3r:%3r) /%s/" % \
++                (j, k, min(k + chunk_size, len(line))-1,
++                    ','.join(["%3r" % i for i in line[k:k + chunk_size]])))
++
++        polynomial_constant_lines.append(
++                            "\nEND MODULE %sPOLYNOMIAL_CONSTANTS\n"%self.sub_prefix)
++
++        return '\n'.join(polynomial_constant_lines)
++
      def write_pjfry_mapping(self):
          """ Returns a fortran subroutine which fills in the array of integral reduction
@@ -383,34 +455,112 @@
              subroutines.append('\n'.join(lines+['end']))
          return '\n\n'.join(subroutines)
--
--    def write_wl_updater(self,r_1,r_2):
--        """ Give out the subroutine to update a polynomial of rank r_1 with
--        one of rank r_2 """
--
--        # The update is basically given by
--        # OUT(j,coef,i) = A(k,*,i) x B(j,*,k)
--        # with k a summed index and the 'x' operation is equivalent to
--        # putting together two regular polynomial in q with scalar coefficients
--        # The complexity of this subroutine is therefore
--        # MAXLWFSIZE**3 * NCoef(r_1) * NCoef(r_2)
--        # Which is for example 22'400 when updating a rank 4 loop wavefunction
--        # with a rank 1 updater.
--
--        lines=[]
--
--        # Start by writing out the header:
--        lines.append(
--          """SUBROUTINE %(sub_prefix)sUPDATE_WL_%(r_1)d_%(r_2)d(A,LCUT_SIZE,B,IN_SIZE,OUT_SIZE,OUT)
--                        include 'coef_specs.inc'
--                        include 'loop_max_coefs.inc'
--                        INTEGER I,J,K
--                        %(coef_format)s A(MAXLWFSIZE,0:LOOPMAXCOEFS-1,MAXLWFSIZE)
--                        %(coef_format)s B(MAXLWFSIZE,0:VERTEXMAXCOEFS-1,MAXLWFSIZE)
--                        %(coef_format)s OUT(MAXLWFSIZE,0:LOOPMAXCOEFS-1,MAXLWFSIZE)
--                        INTEGER LCUT_SIZE,IN_SIZE,OUT_SIZE
--                        """%{'sub_prefix':self.sub_prefix,'r_1':r_1,'r_2':r_2,
--                                                'coef_format':self.coef_format})
++
++    def write_compact_wl_updater(self,r_1,r_2,loop_over_vertex_coefs_first=True):
++        """ Give out the subroutine to update a polynomial of rank r_1 with
++        one of rank r_2 """
++
++        # The update is basically given by
++        # OUT(j,coef,i) = A(k,*,i) x B(j,*,k)
++        # with k a summed index and the 'x' operation is equivalent to
++        # putting together two regular polynomial in q with scalar coefficients
++        # The complexity of this subroutine is therefore
++        # MAXLWFSIZE**3 * NCoef(r_1) * NCoef(r_2)
++        # Which is for example 22'400 when updating a rank 4 loop wavefunction
++        # with a rank 1 updater.
++        # The situation is slightly improved by a smarter handling of the
++        # coefficients equal to zero
++
++        lines=[]
++
++        # Start by writing out the header:
++        lines.append(
++          """SUBROUTINE %(sub_prefix)sUPDATE_WL_%(r_1)d_%(r_2)d(A,LCUT_SIZE,B,IN_SIZE,OUT_SIZE,OUT)
++  USE %(proc_prefix)sPOLYNOMIAL_CONSTANTS
++  implicit none
++  INTEGER I,J,K,L,M
++  %(coef_format)s A(MAXLWFSIZE,0:LOOPMAXCOEFS-1,MAXLWFSIZE)
++  %(coef_format)s B(MAXLWFSIZE,0:VERTEXMAXCOEFS-1,MAXLWFSIZE)
++  %(coef_format)s OUT(MAXLWFSIZE,0:LOOPMAXCOEFS-1,MAXLWFSIZE)
++  INTEGER LCUT_SIZE,IN_SIZE,OUT_SIZE
++  INTEGER NEW_POSITION
++  %(coef_format)s UPDATER_COEF
++"""%{'sub_prefix':self.sub_prefix,'proc_prefix':self.proc_prefix,
++                           'r_1':r_1,'r_2':r_2,'coef_format':self.coef_format})
++
++        # Start the loop on the elements i,j of the vector OUT(i,coef,j)
++        lines.append("C Welcome to the computational heart of MadLoop...")
++        if loop_over_vertex_coefs_first:
++            lines.append("OUT(:,:,:)=%s"%self.czero)
++            lines.append(
++    """DO J=1,OUT_SIZE
++      DO M=0,%d
++        DO K=1,IN_SIZE
++          UPDATER_COEF = B(J,M,K)
++          IF (UPDATER_COEF.EQ.%s) CYCLE
++          DO L=0,%d
++            NEW_POSITION = COMB_COEF_POS(L,M)
++            DO I=1,LCUT_SIZE
++              OUT(J,NEW_POSITION,I)=OUT(J,NEW_POSITION,I) + A(K,L,I)*UPDATER_COEF
++            ENDDO
++          ENDDO
++        ENDDO
++      ENDDO
++    ENDDO
++    """%(get_number_of_coefs_for_rank(r_2)-1,
++         self.czero,
++         get_number_of_coefs_for_rank(r_1)-1))
++        else:
++            lines.append("OUT(:,:,:)=%s"%self.czero)
++            lines.append(
++    """DO I=1,LCUT_SIZE
++      DO L=0,%d
++        DO K=1,IN_SIZE
++          UPDATER_COEF = A(K,L,I)
++          IF (UPDATER_COEF.EQ.%s) CYCLE
++          DO M=0,%d
++            NEW_POSITION = COMB_COEF_POS(L,M)
++            DO J=1,OUT_SIZE
++              OUT(J,NEW_POSITION,I)=OUT(J,NEW_POSITION,I) + UPDATER_COEF*B(J,M,K)
++            ENDDO
++          ENDDO
++        ENDDO
++      ENDDO
++    ENDDO
++    """%(get_number_of_coefs_for_rank(r_1)-1,
++         self.czero,
++         get_number_of_coefs_for_rank(r_2)-1))
++
++        lines.append("END")
++        # return the subroutine
++        return '\n'.join(lines)
++
++    def write_expanded_wl_updater(self,r_1,r_2):
++        """ Give out the subroutine to update a polynomial of rank r_1 with
++        one of rank r_2 """
++
++        # The update is basically given by
++        # OUT(j,coef,i) = A(k,*,i) x B(j,*,k)
++        # with k a summed index and the 'x' operation is equivalent to
++        # putting together two regular polynomial in q with scalar coefficients
++        # The complexity of this subroutine is therefore
++        # MAXLWFSIZE**3 * NCoef(r_1) * NCoef(r_2)
++        # Which is for example 22'400 when updating a rank 4 loop wavefunction
++        # with a rank 1 updater.
++
++        lines=[]
++
++        # Start by writing out the header:
++        lines.append(
++          """SUBROUTINE %(sub_prefix)sUPDATE_WL_%(r_1)d_%(r_2)d(A,LCUT_SIZE,B,IN_SIZE,OUT_SIZE,OUT)
++  USE %(proc_prefix)sPOLYNOMIAL_CONSTANTS
++  INTEGER I,J,K
++  %(coef_format)s A(MAXLWFSIZE,0:LOOPMAXCOEFS-1,MAXLWFSIZE)
++  %(coef_format)s B(MAXLWFSIZE,0:VERTEXMAXCOEFS-1,MAXLWFSIZE)
++  %(coef_format)s OUT(MAXLWFSIZE,0:LOOPMAXCOEFS-1,MAXLWFSIZE)
++  INTEGER LCUT_SIZE,IN_SIZE,OUT_SIZE
++"""%{'sub_prefix':self.sub_prefix,'proc_prefix':self.proc_prefix,
++                            'r_1':r_1,'r_2':r_2,'coef_format':self.coef_format})
          # Start the loop on the elements i,j of the vector OUT(i,coef,j)
          lines.append("DO I=1,LCUT_SIZE")
@@ -460,14 +610,12 @@
          # Start by writing out the header:
          lines.append("""SUBROUTINE %(sub_prefix)sEVAL_POLY(C,R,Q,OUT)
--                        include 'coef_specs.inc'
--                        include 'loop_max_coefs.inc'
++                        USE %(proc_prefix)sPOLYNOMIAL_CONSTANTS
                          %(coef_format)s C(0:LOOPMAXCOEFS-1)
                          INTEGER R
                          %(coef_format)s Q(0:3)
                          %(coef_format)s OUT
--                        """%{'sub_prefix':self.sub_prefix,
--                             'coef_format':self.coef_format})
++                        """%self.rep_dict)
          # Start by the trivial coefficient of order 0.
          lines.append("OUT=C(0)")
@@ -497,28 +645,20 @@
          lines=[]
          # Start by writing out the header:
--        lines.append("""SUBROUTINE %(sub_prefix)sMERGE_WL(WL,R,LCUT_SIZE,CONST,OUT)
--                        include 'coef_specs.inc'
--                        include 'loop_max_coefs.inc'
--                        INTEGER I,J
--                        %(coef_format)s WL(MAXLWFSIZE,0:LOOPMAXCOEFS-1,MAXLWFSIZE)
--                        INTEGER R,LCUT_SIZE
--                        %(coef_format)s CONST
--                        %(coef_format)s OUT(0:LOOPMAXCOEFS-1)
--                        """%{'sub_prefix':self.sub_prefix,
--                             'coef_format':self.coef_format})
--
--        # Add an array specifying how many coefs there are for given ranks
--        lines.append("""INTEGER NCOEF_R(0:%(max_rank)d)
--                        DATA NCOEF_R/%(ranks)s/
--                        """%{'max_rank':self.max_rank,'ranks':','.join([
--                            str(get_number_of_coefs_for_rank(r)) for r in
--                                                    range(0,self.max_rank+1)])})
++        lines.append(
++"""SUBROUTINE %(sub_prefix)sMERGE_WL(WL,R,LCUT_SIZE,CONST,OUT)
++  USE %(proc_prefix)sPOLYNOMIAL_CONSTANTS
++  INTEGER I,J
++  %(coef_format)s WL(MAXLWFSIZE,0:LOOPMAXCOEFS-1,MAXLWFSIZE)
++  INTEGER R,LCUT_SIZE
++  %(coef_format)s CONST
++  %(coef_format)s OUT(0:LOOPMAXCOEFS-1)
++"""%self.rep_dict)
          # Now scan them all progressively
          lines.append("DO I=1,LCUT_SIZE")
          lines.append("  DO J=0,NCOEF_R(R)-1")
--        lines.append("    OUT(J)=OUT(J)+WL(I,J,I)*CONST")
++        lines.append("      OUT(J)=OUT(J)+WL(I,J,I)*CONST")
          lines.append("  ENDDO")
          lines.append("ENDDO")
          lines.append("END")
@@ -533,21 +673,12 @@
          # Start by writing out the header:
          lines.append("""SUBROUTINE %(sub_prefix)sADD_COEFS(A,RA,B,RB)
--                        include 'coef_specs.inc'
--                        include 'loop_max_coefs.inc'
++                        USE %(proc_prefix)sPOLYNOMIAL_CONSTANTS
                          INTEGER I
                          %(coef_format)s A(0:LOOPMAXCOEFS-1),B(0:LOOPMAXCOEFS-1)
                          INTEGER RA,RB
--                        """%{'sub_prefix':self.sub_prefix,
--                             'coef_format':self.coef_format})
++                        """%self.rep_dict)
--        # Add an array specifying how many coefs there are for given ranks
--        lines.append("""INTEGER NCOEF_R(0:%(max_rank)d)
--                        DATA NCOEF_R/%(ranks)s/
--                        """%{'max_rank':self.max_rank,'ranks':','.join([
--                            str(get_number_of_coefs_for_rank(r)) for r in
--                                                    range(0,self.max_rank+1)])})
--
          # Now scan them all progressively
          lines.append("DO I=0,NCOEF_R(RB)-1")
          lines.append("  A(I)=A(I)+B(I)")
 === modified file 'tests/time_db'
 --- tests/time_db	2016-03-03 16:04:04 +0000
 +++ tests/time_db	2016-03-03 23:14:51 +0000
@@ -52,6 +52,7 @@
  <__main__.TestSuiteModified tests=[<tests.unit_tests.iolibs.test_export_v4.FullHelasOutputTest testMethod=test_four_fermion_vertex_normal_fermion_flow>]> 0.0366899967194
  <__main__.TestSuiteModified tests=[<tests.unit_tests.iolibs.test_export_v4.FullHelasOutputTest testMethod=test_generate_helas_diagrams_epem_elpelmepem>]> 0.0915629863739
  <__main__.TestSuiteModified tests=[<tests.unit_tests.various.test_aloha.test_aloha_creation testMethod=test_aloha_FFVC>]> 0.0635468959808
++<__main__.TestSuiteModified tests=[<tests.unit_tests.various.test_usermod.Test_ADDON_UFO testMethod=test_identify_particle>]> 0.0015971660614
  <__main__.TestSuiteModified tests=[<tests.unit_tests.various.test_decay.Test_DecayAmplitude testMethod=test_group_channels2amplitudes>]> 0.346367835999
  <__main__.TestSuiteModified tests=[<tests.unit_tests.various.test_import_ufo.TestRestrictModel testMethod=test_detect_special_parameters>]> 0.0848360061646
  <__main__.TestSuiteModified tests=[<tests.unit_tests.iolibs.test_file_writers.FortranWriterTest testMethod=test_write_fortran_error>]> 0.000140190124512

MadGraph5_aMC@NLO

Merge lp:~maddevelopers/mg5amcnlo/smart_zeros into lp:~maddevelopers/mg5amcnlo/2.3.4

Commit message

Description of the change

Preview Diff

Subscribers