Uninitialized variable giving SIGFPE

Bug #1685169 reported by Pedro Brandimarte
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Siesta
Fix Released
Low
Nick Papior
4.0
Fix Released
Low
Nick Papior
4.1
Fix Released
Low
Nick Papior

Bug Description

When compiling the trunk version with the following options:

FC = mpif90
FFLAGS = -g -O0 -m64 -fPIC -fno-second-underscore -fbacktrace \
         -ffpe-trap=invalid,zero,overflow,underflow,denormal \
         -fbounds-check -Wall

the following error happens right at the beginning of the code execution:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

----
Backtrace for this error:
#0 0x2ae199681a8f in ???
#1 0x4ecfb8 in __interpolation_MOD_polint
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/interpolation.f90:627
#2 0x6f82c3 in comlocal
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/atom.F:2460
#3 0x70f0d3 in __atom_MOD_atom_main
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/atom.F:519
#4 0x51a58e in initatom_
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/initatom.f:148
#5 0x64b65f in __m_siesta_init_MOD_siesta_init
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/siesta_init.F:364
#6 0x118efcb in siesta
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/siesta.F:55
#7 0x118f0a5 in main
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/siesta.F:10
----

Although in a standard compilation this will not happen, the error can be fix by including the initialization 'dydx = 0' at 'polint' subroutine from 'interpolation' module.

Revision history for this message
Pedro Brandimarte (brandimarte) wrote :

Hi again, the same kind of problem happens at dhscf module when running in parallel and with the new cellxc:

----
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0 0x2aef4de23a8f in ???
#1 0x2aef4d25fcb0 in ???
#2 0x2aef4d1cbb32 in ???
#3 0x2aef4d191d7a in ???
#4 0x2aef4d99c0ba in ???
#5 0x12ae8a8 in __mpi__r8_v_MOD_mpi_allreduce_t
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Obj/MPI/Interfaces.f90:2284
#6 0x4697c6 in __m_dhscf_MOD_dhscf
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/dhscf.F:1714
#7 0x6337e7 in __m_setup_hamiltonian_MOD_setup_hamiltonian
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/setup_hamiltonian.F:236
#8 0x663aeb in __m_siesta_forces_MOD_siesta_forces
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/siesta_forces.F90:276
#9 0x118f01c in siesta
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/siesta.F:80
#10 0x118f0a5 in main
 at /home/pedro/local/src/siesta/lp-siesta/trunk/Src/siesta.F:10
----

One solution would be to initialize 'sbuffer = 0' at line 1704.

PS: not that I've been too picky, but I had to solve this small things to be able to debug my own implementations.

Nick Papior (nickpapior)
Changed in siesta:
milestone: none → 4.1-b3
assignee: nobody → Nick Papior (nickpapior)
importance: Undecided → Low
status: New → Triaged
milestone: 4.1-b3 → 4.0.1
Revision history for this message
Nick Papior (nickpapior) wrote :

Could you please try now?

I have fixed the two instances you refer to, however, I cannot go any further on my own box (it fails in LAPACK due to DLAMCH calls).
If you find more, please report them.

Thanks.

PS. This is fixed in 4.0, 4.1 and trunk versions now.

Changed in siesta:
status: Triaged → Fix Committed
Revision history for this message
Pedro Brandimarte (brandimarte) wrote :

Same here... sorry. :-)

Revision history for this message
Nick Papior (nickpapior) wrote :

For information and backlog.

The BLAS/LAPACK library will definitely execute a trap in DLAMCH when calculating machine precisions. To bypass this (in non-MPI versions) one may apply the attached patch which removes the affected line in DLAMCH.

In 4.1-r707 after applying the patch there are no traps executed in the serial compilation (test=si001).
In trunk-r618 after applying the patch there are no traps executed in the serial compilation (test=si001).

If you want to test with the distributed version, I guess linking an external ScaLAPACK with the shipped LAPACK+BLAS should work fine. I haven't tried, however.

Revision history for this message
Pedro Brandimarte (brandimarte) wrote :

I see, the lapack sfmin is not that safe! ;-)

Thanks!

Nick Papior (nickpapior)
Changed in siesta:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.