Comment 3 for bug 507503

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Now I understand this a bit more, it looks like there is not a security impact; just a robustness problem.

The kernel does isolate the VFP/NEON state between processes, but a process can still corrupt its own state if VFP or NEON registers are modified inside a signal handler.

I wrote an experimental kernel hack (attached) to trace which processes are doing VFP/NEON in their signal handlers and where. Two main culprits emerged: some random stuff in Xorg which makes use of the caller-save VFP register s15 (which overlaps d7 and the NEON register q3), and libc's setjmp and longjmp implementation.

I suspect that longjmp is a non-issue because we probably don't expect to restore what was in the registers prior to the signal handler in this case. setjmp may not be occurring during a signal handler at all (my kernel hack has no way of knowing that a signal handler was exited using longjmp and will carry on tracing subsequent VFP use).

So the most probable explanation is that Xorg's signal handler is corrupting its own foreground state from inside a signal handler. This is consistent with the symptoms. The signal handler is within its rights here, since the ABI spec says that a function does not need to restore d0-d7 or d16-d31 when returning to the caller; GCC correctly does not automatically generate the relevant save and restore sequences.

The same issue could well be causing some application crashes since we now build everything for VFP. The issue is not specific to NEON.

Flakiness is unavoidable without fixing the kernel to save the VFP registers around signal handlers.

According to Catalin Marinas (ARM kernel maintainer) the patches on the thread referenced above should be safe to apply, except for programs which poke about in the signal frame for some reason, becuase the patches make a change to the representation of the floating-point state in the signal frame. (gdb in Lucid doesn't appear to understand VFP and so shouldn't become any more broken due to this).

The fix applies to linux-mvl-dove too, and we should try it ASAP to see whether it makes any difference to the robustness on the Dove boards.