all native ocaml programs segfault on armel

Bug #810402 reported by Stéphane Glondu
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Binutils
Invalid
Undecided
Unassigned
binutils (Ubuntu)
Invalid
Undecided
Unassigned
Oneiric
Invalid
Undecided
Unassigned
ocaml (Ubuntu)
Fix Released
High
Unassigned
Oneiric
Fix Released
High
Unassigned

Bug Description

Starting from version 2.21.52.20110606-1ubuntu1 (in oneiric), all programs produced by ocamlopt segfault on Ubuntu/armel. The problem disappears after downgrading binutils to 2.21.51.20110421-6ubuntu1.

Steps to reproduce: echo > empty.ml && ocamlopt empty.ml && ./a.out

Using the "-S" option of ocamlopt might be useful, to have a look at the generated assembly code, but I don't know what to look for... I hope someone more familiar with binutils code can help.

I found that while investigating the recent build failures of ocaml stuff or armel. Everything works find in the armel port of Debian.

tags: added: armel
Revision history for this message
Matthias Klose (doko) wrote :

- is this seen on the armhf port too?
- what are the options and files passed to as/ld?

Revision history for this message
Matthias Klose (doko) wrote :

- how is the camlstartup .s file generated, or how can it be saved?
- trying to figure out if it's a linker or an assembler issue, could
  you try to build with new as and old ld, and old as and new ld?

Changed in binutils (Ubuntu):
importance: Undecided → High
milestone: none → ubuntu-11.10-beta-1
status: New → Confirmed
Revision history for this message
Stéphane Glondu (glondu) wrote :
Download full text (3.2 KiB)

> is this seen on the armhf port too?
ocamlopt has not been ported to armhf and is not enabled there, so I cannot tell. Should I understand that Ubuntu armel = Debian armhf?

> what are the options and files passed to as/ld?
$ ocamlopt -ccopt -v -dstartup -S -verbose empty.ml
+ as -o 'empty.o' 'empty.s'
+ as -o '/tmp/camlstartupcff7e9.o' 'a.out.startup.s'
+ gcc -o 'a.out' '-L/usr/lib/ocaml' -v '/tmp/camlstartupcff7e9.o' '/usr/lib/ocaml/std_exit.o' 'empty.o' '/usr/lib/ocaml/stdlib.a' '/usr/lib/ocaml/libasmrun.a' -lm -ldl
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.3-3ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib/arm-linux-gnueabi --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.5 --libdir=/usr/lib/arm-linux-gnueabi --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.5.3 (Ubuntu/Linaro 4.5.3-3ubuntu1)
COMPILER_PATH=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/:/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/:/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/:/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/:/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/
LIBRARY_PATH=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/:/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../../../:/lib/:/usr/lib/:/usr/lib/arm-linux-gnueabi/
COLLECT_GCC_OPTIONS='-o' 'a.out' '-L/usr/lib/ocaml' '-v' '-march=armv7-a' '-mfloat-abi=softfp' '-mfpu=vfpv3-d16' '-mthumb'
 /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/collect2 --build-id --no-add-needed --as-needed --eh-frame-hdr -dynamic-linker /lib/ld-linux.so.3 -X --hash-style=gnu -m armelf_linux_eabi -z relro -o a.out /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../../../crt1.o /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../../../crti.o /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/crtbegin.o -L/usr/lib/ocaml -L/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3 -L/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../../.. -L/usr/lib/arm-linux-gnueabi /tmp/camlstartupcff7e9.o /usr/lib/ocaml/std_exit.o empty.o /usr/lib/ocaml/stdlib.a /usr/lib/ocaml/libasmrun.a -lm -ldl -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/crtend.o /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../.....

Read more...

Revision history for this message
Stéphane Glondu (glondu) wrote :

> how is the camlstartup .s file generated, or how can it be saved?

-dstartup option to ocamlopt. Adding -ccopt -Wl,-v shows the ld call:

/usr/bin/ld --build-id --no-add-needed --as-needed --eh-frame-hdr -dynamic-linker /lib/ld-linux.so.3 -X --hash-style=gnu -m armelf_linux_eabi -z relro -o a.out /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../../../crt1.o /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../../../crti.o /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/crtbegin.o -L/usr/lib/ocaml -L/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3 -L/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../../.. -L/usr/lib/arm-linux-gnueabi -v /tmp/camlstartup86089c.o /usr/lib/ocaml/std_exit.o empty.o /usr/lib/ocaml/stdlib.a /usr/lib/ocaml/libasmrun.a -lm -ldl -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/crtend.o /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../../../crtn.o

> trying to figure out if it's a linker or an assembler issue, could
> you try to build with new as and old ld, and old as and new ld?

 - new as, old ld: the resulting binary doesn't segfault
 - old as, new ld: the resulting binary segfaults

Revision history for this message
Matthias Klose (doko) wrote : Re: [Bug 810402] Re: all native ocaml programs segfault on armel

On 07/14/2011 01:58 PM, Stéphane Glondu wrote:
>> is this seen on the armhf port too?
> Should I understand that Ubuntu armel = Debian armhf?

both are built for the armv7-a architecture. Ubuntu armel for the soft-float
ABI, Debian armhf for the hard float ABI.

>> trying to figure out if it's a linker or an assembler issue, could
>> you try to build with new as and old ld, and old as and new ld?
>
> - new as, old ld: the resulting binary doesn't segfault
> - old as, new ld: the resulting binary segfaults

that would suggest a faulty linker, using binutils-gold does work indeed in
oneiric. otoh, the same linker versions works ok in unstable.

Revision history for this message
Matthias Klose (doko) wrote :

On 07/14/2011 02:15 PM, Stéphane Glondu wrote:
>> how is the camlstartup .s file generated, or how can it be saved?
>
> -dstartup option to ocamlopt. Adding -ccopt -Wl,-v shows the ld call:
>
> /usr/bin/ld --build-id --no-add-needed --as-needed --eh-frame-hdr
> -dynamic-linker /lib/ld-linux.so.3 -X --hash-style=gnu -m
> armelf_linux_eabi -z relro -o a.out /usr/lib/arm-linux-gnueabi/gcc/arm-
> linux-gnueabi/4.5.3/../../../crt1.o /usr/lib/arm-linux-gnueabi/gcc/arm-
> linux-gnueabi/4.5.3/../../../crti.o /usr/lib/arm-linux-gnueabi/gcc/arm-
> linux-gnueabi/4.5.3/crtbegin.o -L/usr/lib/ocaml -L/usr/lib/arm-linux-
> gnueabi/gcc/arm-linux-gnueabi/4.5.3 -L/usr/lib/arm-linux-gnueabi/gcc
> /arm-linux-gnueabi/4.5.3/../../.. -L/usr/lib/arm-linux-gnueabi -v
> /tmp/camlstartup86089c.o /usr/lib/ocaml/std_exit.o empty.o
> /usr/lib/ocaml/stdlib.a /usr/lib/ocaml/libasmrun.a -lm -ldl -lgcc --as-
> needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-
> needed /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/crtend.o
> /usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.3/../../../crtn.o

why is gcc-4.5 used? The default in oneiric is 4.6.

Revision history for this message
Matthias Klose (doko) wrote :

replacing the ocaml .o and .a files with the ones from unstable produces a working binary.

now re-uploading ocaml to build with GCC 4.6.

affects: binutils (Ubuntu Oneiric) → ocaml (Ubuntu Oneiric)
Revision history for this message
Stéphane Glondu (glondu) wrote :

Le 14/07/2011 14:39, Matthias Klose a écrit :
> why is gcc-4.5 used? The default in oneiric is 4.6.

I started with a natty chroot, which I upgraded to oneiric. Upgrading
gcc requires upgrading binutils, which I was avoiding.

However, now that I've identified the issue, I can tell you that the bug
occurs in a freshly debootstrapped oneiric chroot, and disappears after
downgrading binutils to 2.21.51.20110421-6ubuntu1.

Revision history for this message
Stéphane Glondu (glondu) wrote :

Le 14/07/2011 15:01, Matthias Klose a écrit :
> replacing the ocaml .o and .a files with the ones from unstable produces
> a working binary.
>
> now re-uploading ocaml to build with GCC 4.6.

Does it mean that recompiling all ocaml libraries with GCC 4.6 is expected?

Revision history for this message
Matthias Klose (doko) wrote :

the build fails now in

./build/ocamlbuild-byte-only.sh
+ ./boot/ocamlrun boot/myocamlbuild byte_stdlib_mixed_mode ocamlc lex/ocamllex ocamlbuild/ocamlbuildlib.cma ocamlbuild/ocamlbuildlightlib.cma ocamlbuild/ocamlbuild.byte ocamlbuild/ocamlbuildlight.byte
../ocamlcomp.sh -g -I stdlib ocamlbuild/ocamlbuild_pack.cmo ocamlbuild/ocamlbuildlight.cmo -o ocamlbuild/ocamlbuildlight.byte
Segmentation fault
Exit code 139 while executing this command:
  ../ocamlcomp.sh -g -I stdlib ocamlbuild/ocamlbuild_pack.cmo ocamlbuild/ocamlbuildlight.cmo -o ocamlbuild/ocamlbuildlight.byte
make[2]: *** [ocamlbuild.byte] Error 139
make[2]: Leaving directory `/build/buildd/ocaml-3.12.0'
make[1]: *** [build-stamp] Error 2

so maybe from the ARM team could investigate

tags: added: arm-porting-queue
Revision history for this message
Matthias Klose (doko) wrote :

doesn't look like a compiler error. checked fsf with and without linaro, on unstable, both armel and armhf, and oneiric.

ocaml does build in unstable on both armhf and armel (armv4t).

ocaml does build with gold, so next check would be to use gold explicitly in ocaml.

Changed in binutils (Ubuntu Oneiric):
status: New → Confirmed
Revision history for this message
Michael Casadevall (mcasadevall) wrote :

So digging into this problem, I can confirm that reverting binutils resolves the build failure, so we're looking at a definate regression in binutils. Looking at gdb and the source, the problem comes during ocaml's internal initialization which is a bunch hand written ASM.

During startup, the application branches into caml_start_program, and segfaults while trying to load the address of caml_program into r12:

        .globl caml_start_program
caml_start_program:
        ldr r12, .Lcaml_program

/* Code shared with caml_callback* */
/* Address of Caml code to call is in r12 */
/* Arguments to the Caml code are in r0...r3 */

and caml_program is in the global reference table at the botton:

.Lcaml_program: .word caml_program

caml_program appears to be the "main" function of the compiled OCaml application, as its defined in every compiled ocaml binary I examined. For some reason, caml_program is pointing to an invalid address so when the process tries to load, it goes boom.

Revision history for this message
Richard Sandiford (rsandifo) wrote :

Interesting. This feels suspiciously like the glibc start problem
that we had a few weeks ago. Just to check something,
would you mind attaching the .S and .o files for caml_start_program?

Thanks,
Richard

Revision history for this message
Stéphane Glondu (glondu) wrote :
Revision history for this message
Stéphane Glondu (glondu) wrote :
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

$ arm-linux-gnueabi-objdump -dr arm.o
[...]
0000012c <caml_start_program>:
 12c: e59fc138 ldr ip, [pc, #312] ; 26c <caml_ml_array_bound_error
[...]
 188: e1a0e00f mov lr, pc
 18c: e12fff1c bx ip
[...]
                        26c: R_ARM_ABS32 caml_program

I don't see anything wrong there. What is caml_program actually fixed up to during the link?

Revision history for this message
Stéphane Glondu (glondu) wrote :

Attached are all *.s and *.o files generated by ocamlopt on an empty .ml file. I just tried calling ld manually as shown above (replacing the temporary filename by a.out.startup.o) in an up-to-date oneiric chroot, and the resulting executable segfaults.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

It looks like ARM code is being executed as Thumb.

The .s file assembled as ARM because that's always the default if there are no .thumb directives and no -mthumb command-line option. (It would need some minor changes to work in Thumb anyway). However, the CPSR value when we segfault indicates that the processor is executing in Thumb state ((cpsr & 0x20) == 0x20)

We end up in the wrong state because the call originating from caml_main() in the linked program calls caml_start_program with a "bl" instruction, so there is no switch from Thumb to ARM. As a result, we execute garbage at caml_start_program.

This may indicate a bug in the linker, but since none of the symbols in the startup .s file are marked as function symbols, it might be that the linker isn't absolutely required to perform the correct fixup in this case.

Can you try adding a ".type <function name>, %function" directive for each function in the startup .s file?

If assembling and linking the resulting object results in a working program, this suggests that the linker is sensitive to the symbol type when doing branch fixups.

If so, we can work around it my adding all the required .type directives for code symbols in ocaml, but we should still raise the issue as a possible linker bug.

(gdb) r
Starting program: /mnt/a.out

Program received signal SIGSEGV, Segmentation fault.
0x00014f94 in caml_start_program ()
(gdb) i r
r0 0x0 0
r1 0x0 0
r2 0xb8d7 47319
r3 0x0 0
r4 0xbefff9c4 3204446660
r5 0x4 4
r6 0xbefff8c4 3204446404
r7 0x0 0
r8 0x0 0
r9 0x0 0
r10 0x4001e000 1073864704
r11 0x0 0
r12 0x400b98c1 1074501825
sp 0xbefff748 0xbefff748
lr 0x153b1 86961
pc 0x14f94 0x14f94 <caml_start_program>
cpsr 0x40000030 1073741872
(gdb)

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Just for completeness, here's what happens if I manually switch back to ARM when we hit caml_start_program:

# gdb --args /mnt/a.out
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /mnt/a.out...(no debugging symbols found)...done.
(gdb) set arm force-mode thumb
(gdb) b *0x14f94
Breakpoint 1 at 0x14f94
(gdb) r
Starting program: /mnt/a.out

Breakpoint 1, 0x00014f94 in caml_start_program ()
(gdb) disable 1
(gdb) set arm force-mode auto
(gdb) set $cpsr &= ~0x20
(gdb) c
Continuing.

Program exited normally.

Revision history for this message
Richard Sandiford (rsandifo) wrote :

Thanks for the great analysis.

Dave Martin <email address hidden> writes:
> If so, we can work around it my adding all the required .type directives
> for code symbols in ocaml, but we should still raise the issue as a
> possible linker bug.

I think that's more than a workaround. I think it's required by the EABI.
4.6.2 says:

    All code symbols exported from an object file (symbols with binding
    STB_GLOBAL) shall have type STT_FUNC.

so the code is violating that. It also says:

    All extern data objects shall have type STT_OBJECT. No STB_GLOBAL
    data symbol shall have type STT_FUNC.

    The type of an undefined symbol shall be STT_NOTYPE or the type of
    its expected definition.

    The type of any other symbol defined in an executable section can be
    STT_NOTYPE. The linker is only required to provide interworking
    support for symbols of type STT_FUNC (interworking for untyped
    symbols must be encoded directly in the object file).

Like you say, the linker only considers converting BL to BLX for
function symbols, but I think that's deliberate.

Richard

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Without actually reading the spec myself, I think I agree.

So, the .type directives are required and should be added in ocaml

Revision history for this message
Stéphane Glondu (glondu) wrote :

Le 12/08/2011 16:25, Dave Martin a écrit :
> Can you try adding a ".type <function name>, %function" directive for
> each function in the startup .s file?

I've patched ocamlopt so that all symbol functions have this directive,
and the empty program works!

I'm afraid this means that everything has to be recompiled with the
patched ocamlopt... this is confirmed by a test on lwt.

What about upgrading to ocaml 3.12.1 at the same time in Ubuntu? The
packaging is ready for Debian (it is currently waiting for ACK from the
release team), and also means recompiling everything. Most packages
recompile with no source changes with ocaml 3.12.1 [1]. The permanent
OCaml transition tracker [2] will be helpful, too.

[1] http://wiki.debian.org/Teams/OCamlTaskForce/OCamlTransitions
[2] http://people.canonical.com/~ubuntu-archive/transitions/ocaml.html

Cheers,

--
Stéphane

Revision history for this message
Colin Watson (cjwatson) wrote :

We're after feature freeze, so upgrading to 3.12.1 would require a good justification.

Matthias Klose (doko)
Changed in binutils (Ubuntu Oneiric):
status: Confirmed → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ocaml - 3.12.0-7ubuntu1

---------------
ocaml (3.12.0-7ubuntu1) oneiric; urgency=low

  * ocamlopt/arm: Add .type directive for code symbols. LP: #810402.
 -- Matthias Klose <email address hidden> Sat, 13 Aug 2011 08:59:40 +0200

Changed in ocaml (Ubuntu Oneiric):
status: Confirmed → Fix Released
Michael Hope (michaelh1)
Changed in binutils-linaro:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.