qemu-aarch64-static segfaults

Bug #1285363 reported by dann frazier
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned
qemu (Ubuntu)
Fix Released
High
Unassigned

Bug Description

I've found a couple conditions that causes qemu-user-static to core dump fairly reliably - same with upstream git - while a binary built from suse's aarch64-1.6 branch seems to consistently work fine.

Testing suggests they are resolved by the sigprocmask wrapper patches included in suse's tree.

 1) dh_fixperms is a script that commonly runs at the end of a package build.
     Its basically doing a `find | xargs chmod`.
 2) debootstrap --second-stage
     This is used to configure an arm64 chroot that was built using
     debootstrap on a non-native host. It is basically invoking a bunch of
     shell scripts (postinst, etc). When it blows up, the stack consistently
     looks like this:

Core was generated by `/usr/bin/qemu-aarch64-static /bin/sh -e
/debootstrap/debootstrap --second-stage'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000060058e55 in memcpy (__len=8, __src=0x7fff62ae34e0,
__dest=0x400082c330) at
/usr/include/x86_64-linux-gnu/bits/string3.h:51
51 return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) bt
#0 0x0000000060058e55 in memcpy (__len=8, __src=0x7fff62ae34e0,
__dest=0x400082c330) at
/usr/include/x86_64-linux-gnu/bits/string3.h:51
#1 stq_p (v=274886476624, ptr=0x400082c330) at
/mnt/qemu.upstream/include/qemu/bswap.h:280
#2 stq_le_p (v=274886476624, ptr=0x400082c330) at
/mnt/qemu.upstream/include/qemu/bswap.h:315
#3 target_setup_sigframe (set=0x7fff62ae3530, env=0x62d9c678,
sf=0x400082b0d0) at /mnt/qemu.upstream/linux-user/signal.c:1167
#4 target_setup_frame (usig=usig@entry=17, ka=ka@entry=0x604ec1e0
<sigact_table+512>, info=info@entry=0x0, set=set@entry=0x7fff62ae3530,
env=env@entry=0x62d9c678)
    at /mnt/qemu.upstream/linux-user/signal.c:1286
#5 0x0000000060059f46 in setup_frame (env=0x62d9c678,
set=0x7fff62ae3530, ka=0x604ec1e0 <sigact_table+512>, sig=17) at
/mnt/qemu.upstream/linux-user/signal.c:1322
#6 process_pending_signals (cpu_env=cpu_env@entry=0x62d9c678) at
/mnt/qemu.upstream/linux-user/signal.c:5747
#7 0x0000000060056e60 in cpu_loop (env=env@entry=0x62d9c678) at
/mnt/qemu.upstream/linux-user/main.c:1082
#8 0x0000000060005079 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at
/mnt/qemu.upstream/linux-user/main.c:4374

Tags: patch hs-arm64
Revision history for this message
dann frazier (dannf) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "qemu.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I'm a little nervous about the do_sigprocmask in linux-user/signal.c for all arches. With the comment

/* Force set state of SIGSEGV, may be best for some apps, maybe not so good
++ * This is not required for qemu to work

Doing this conditionally for arm64 would be more comforting...

I'll go ahead and run some tests with it tomorrow on amd64.

Changed in qemu (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Dann,

can you confirm that you can reproduce this with the upstream git head (or the qemu-2.0~git package)?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I'm building a candidate package with a modified version of the patchset (to only change behavior for aarch64 targets) in ppa:serge-hallyn/virt.

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1285363] Re: qemu-aarch64-static segfaults

On Thu, Feb 27, 2014 at 3:34 PM, Serge Hallyn
<email address hidden> wrote:
> Dann,
>
> can you confirm that you can reproduce this with the upstream git head
> (or the qemu-2.0~git package)?

I just reverified with upstream git head @
9fbee91a131a05e443d7108d7fbdf3ca91020290.
Note that this appears to only be reproducible on systems with > 1 CPU
(easy to reproduce on 4).

Revision history for this message
dann frazier (dannf) wrote :

@Serge: I can confirm that this is fixed in 1.7.0+dfsg-3ubuntu5sig1 from your ppa.

Revision history for this message
Peter Maydell (pmaydell) wrote :

Doing this only for aarch64 targets seems like a bad idea to me -- this isn't an aarch64 specific issue. QEMU needs SIGSEGV to go to its own handler (so we can unprotect pages we've marked as read-only in order to catch guest writes to them so we can throw away invalidated translated code), and that's true for all targets. It probably just happens more often on the aarch64 target than others you've tested because aarch64 has a signal-return trampoline on the stack frame, so we'll often see that page get translated and thrown away again. (Other targets with a trampoline include sparc, cris, openrisc and ppc.)

PS: the comment "this is not required for qemu to work" just means that QEMU will work fine whether we tell the guest a lie about what's going on with SIGSEGV in one way (saying "it's blocked") or the other (saying "it's not blocked").

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Peter Maydell (<email address hidden>):
> Doing this only for aarch64 targets seems like a bad idea to me -- this
> isn't an aarch64 specific issue. QEMU needs SIGSEGV to go to its own
> handler (so we can unprotect pages we've marked as read-only in order to
> catch guest writes to them so we can throw away invalidated translated
> code), and that's true for all targets. It probably just happens more
> often on the aarch64 target than others you've tested because aarch64
> has a signal-return trampoline on the stack frame, so we'll often see
> that page get translated and thrown away again. (Other targets with a
> trampoline include sparc, cris, openrisc and ppc.)

I see. I've just pushed the customized patch to the archive. We can
switch to the original patchset though. But, I'd also like to see what
ends up hitting upstream.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1.7.0+dfsg-3ubuntu5

---------------
qemu (1.7.0+dfsg-3ubuntu5) trusty; urgency=medium

  [ dann frazier ]
  * Add patches from the susematz tree to avoid intermittent segfaults:
     - ubuntu/signal-added-a-wrapper-for-sigprocmask-function.patch
     - ubuntu/signal-sigsegv-protection-on-do_sigprocmask.patch
     - ubuntu/Don-t-block-SIGSEGV-at-more-places.patch

  [ Serge Hallyn ]
  * Modify do_sigprocmask to only change behavior for aarch64.
    (LP: #1285363)
 -- Serge Hallyn <email address hidden> Thu, 06 Mar 2014 16:15:50 -0600

Changed in qemu (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Peter Maydell (pmaydell) wrote :

We've now overhauled the signal handling code in upstream QEMU, and it has its own implementation of the basic idea in the patch from comment 1 (which is "don't let the guest block SIGSEGV").

Changed in qemu:
status: New → Fix Committed
Thomas Huth (th-huth)
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.