VFP/NEON state is not preserved around signal handlers, causing state corruption between user processes

Bug #507503 reported by Dave Martin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-fsl-imx51 (Ubuntu)
Fix Released
High
Bryan Wu
Declined for Dapper by Paul Larson
Declined for Hardy by Paul Larson
Declined for Intrepid by Paul Larson
Declined for Jaunty by Paul Larson
Declined for Karmic by Paul Larson
Lucid
Fix Released
High
Bryan Wu
linux-mvl-dove (Ubuntu)
Fix Released
High
Eric Miao
Declined for Dapper by Paul Larson
Declined for Hardy by Paul Larson
Declined for Intrepid by Paul Larson
Declined for Jaunty by Paul Larson
Declined for Karmic by Paul Larson
Lucid
Fix Released
High
Eric Miao
linux-ti-omap (Ubuntu)
Fix Released
High
Amit Kucheria
Declined for Dapper by Paul Larson
Declined for Hardy by Paul Larson
Declined for Intrepid by Paul Larson
Declined for Jaunty by Paul Larson
Declined for Karmic by Paul Larson
Lucid
Fix Released
High
Amit Kucheria

Bug Description

NOTE: This issue will apply to all armel kernels except for dove (where the hardware doesn't have NEON support). However, for now imx51 is the only such kernel, so I've raised the bug here.

The issue was discussed on the ARM Linux mailing list; there's a link to the thread and a partial patch here:

http://thread.gmane.org/gmane.linux.ports.arm.kernel/58722/focus=58724

State corruption observed in pixman-0.16.x (resulting on various screen rendering artifacts) is believed to be due to this (this is the conclusion Siarhei Siamashka of Nokia — see the above link).

Note that the patch is disupted on the list and hasn't been merged: it may cause issues with userspace code which makes assumptions about the layout and contents of the signal frame... though it could be interesting to see whether it has any effect.

Note: we may revert to CONFIG_NEON=n by default in the iMX51 kernels due to a separate issue. This may mean that the pixman behaviour appears fixed even if it hasn't been fixed: caution should be applied before assuming that this bug has gone away.
(See https://bugs.launchpad.net/bugs/507416 — subscribers only, though)

Tags: patch armel
Alexander Sack (asac)
Changed in linux-fsl-imx51 (Ubuntu):
assignee: nobody → Bryan Wu (cooloney)
status: New → Confirmed
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

This bug is a potential security hole and should be prioritised as such.

Revision history for this message
Paul Larson (pwlars) wrote :

Marked critical due to the potential security impact. Bryan, is the fix that was posted on the mailing list viable? Seems like there was still some debate over it.

Changed in linux-fsl-imx51 (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Now I understand this a bit more, it looks like there is not a security impact; just a robustness problem.

The kernel does isolate the VFP/NEON state between processes, but a process can still corrupt its own state if VFP or NEON registers are modified inside a signal handler.

I wrote an experimental kernel hack (attached) to trace which processes are doing VFP/NEON in their signal handlers and where. Two main culprits emerged: some random stuff in Xorg which makes use of the caller-save VFP register s15 (which overlaps d7 and the NEON register q3), and libc's setjmp and longjmp implementation.

I suspect that longjmp is a non-issue because we probably don't expect to restore what was in the registers prior to the signal handler in this case. setjmp may not be occurring during a signal handler at all (my kernel hack has no way of knowing that a signal handler was exited using longjmp and will carry on tracing subsequent VFP use).

So the most probable explanation is that Xorg's signal handler is corrupting its own foreground state from inside a signal handler. This is consistent with the symptoms. The signal handler is within its rights here, since the ABI spec says that a function does not need to restore d0-d7 or d16-d31 when returning to the caller; GCC correctly does not automatically generate the relevant save and restore sequences.

The same issue could well be causing some application crashes since we now build everything for VFP. The issue is not specific to NEON.

Flakiness is unavoidable without fixing the kernel to save the VFP registers around signal handlers.

According to Catalin Marinas (ARM kernel maintainer) the patches on the thread referenced above should be safe to apply, except for programs which poke about in the signal frame for some reason, becuase the patches make a change to the representation of the floating-point state in the signal frame. (gdb in Lucid doesn't appear to understand VFP and so shouldn't become any more broken due to this).

The fix applies to linux-mvl-dove too, and we should try it ASAP to see whether it makes any difference to the robustness on the Dove boards.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Linking this bug to linux-mvl-dove since the problem is now known not to be imx51-specific.

Revision history for this message
Bryan Wu (cooloney) wrote :

@Dave,

I am testing these 2 patches, but how about the upstream response of that? I guess Catalin is pushing that?

Thanks
-Bryan

Revision history for this message
Dave Martin (dave-martin-arm) wrote : RE: [Bug 507503] Re: VFP/NEON state is not preserved around signalhandlers, causing state corruption between user processes

> I am testing these 2 patches, but how about the upstream
> response of that? I guess Catalin is pushing that?

I don't think the patches are actively pushed right now.

I'll mail the patch author and try to determine his current status; he may
have an update by now.

Revision history for this message
Bryan Wu (cooloney) wrote :

Dave,

Great, thanks for pushing this.

So is there any test case I can run on my hardware to reproduce this issue? My X never crashed before, -;)).

-Bryan

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

The test case from the ARM Linux kernel thread does not work for me; GCC allocates the VFP use in main to d8-d15 (since main is not a leaf function). The signal handler is allocated d0-d7 to corrupt, so main does not notice. It may depend on the GCC version.

To force the register allocation, I coded a similar testcase in assembler.

Like the original testcase, alarm(2) is used to schedule a signal.

main is coded to call no functions in between setting d0 and checking it (since the ABI allows d0-d7 to be corrupted in this case).

  * If the signal handler manages to corrupt d0, a message is printed and the test terminates.

  * If no corruption is observed, main will simply loop forever and the test will not terminate.

I get:

$ gcc -o vfp-sigtest vfp-sigtest.s
$ ./vfp-sigtest
Waiting for d0 to be corrupted...
<2 seconds later>
d0 corrupted.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Please see the following updated patch posting to alkml by Imre Deak:

[RFC PATCH v2 0/2] ARM: VFP: Save / restore VFP state on the signal handler path <http://lists.arm.linux.org.uk/lurker/message/20100204.213712.acad59f5.en.html>

[RFC PATCH v2 1/2] ARM: VFP: add support to sync the VFP state of the current thread <http://lists.arm.linux.org.uk/lurker/message/20100204.213802.e56207e4.en.html>

[RFC PATCH v2 2/2] ARM: VFP: preserve the HW context when calling signal handlers <http://lists.arm.linux.org.uk/lurker/message/20100204.213830.5bf86a58.en.html>

Revision history for this message
Bryan Wu (cooloney) wrote :

Dave,

Thanks so much. I will test this on my hardware and give some feedback later.

-Bryan

Eric Miao (eric.y.miao)
Changed in linux-mvl-dove (Ubuntu):
assignee: nobody → Eric Miao (eric.y.miao)
importance: Undecided → Critical
Eric Miao (eric.y.miao)
Changed in linux-mvl-dove (Ubuntu):
importance: Critical → Medium
Revision history for this message
Eric Miao (eric.y.miao) wrote :
Revision history for this message
Paul Larson (pwlars) wrote :

I've had the test script running on the new kernel for several minutes now, and I never get the d0 corrupted message. Looks good!

Revision history for this message
Bryan Wu (cooloney) wrote :

Since these 2 patches are still concerned by upstream, I will wait for it's finalizing.

Revision history for this message
Dave Martin (dave-martin-arm) wrote :

I suggest you ping the list and explain the Ubuntu timescale so that they know you're waiting for a solution.

Revision history for this message
Paul Larson (pwlars) wrote :

lowering severity some since it does not have known security implications.

Changed in linux-fsl-imx51 (Ubuntu Lucid):
importance: Critical → High
milestone: none → lucid-alpha-3
Changed in linux-mvl-dove (Ubuntu Lucid):
importance: Medium → High
milestone: none → lucid-alpha-3
status: New → Triaged
tags: added: patch
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

FYI, someone here tries the patches referenced above (on a non-Ubuntu 2.6.33-rc5 tree I believe), and they do seem to resolve the screen corruption problem.

Repeating the links here to avoid confusion:

[RFC PATCH v2 0/2] ARM: VFP: Save / restore VFP state on the signal handler path <http://lists.arm.linux.org.uk/lurker/message/20100204.213712.acad59f5.en.html>

[RFC PATCH v2 1/2] ARM: VFP: add support to sync the VFP state of the current thread <http://lists.arm.linux.org.uk/lurker/message/20100204.213802.e56207e4.en.html>

[RFC PATCH v2 2/2] ARM: VFP: preserve the HW context when calling signal handlers <http://lists.arm.linux.org.uk/lurker/message/20100204.213830.5bf86a58.en.html>

Revision history for this message
Bryan Wu (cooloney) wrote :

Dave,

Thanks a lot. I will prepare to apply these 2 patches into our fsl-imx51 kernel

Happy Chinese New Year.
-Bryan

Changed in linux-fsl-imx51 (Ubuntu Lucid):
status: Confirmed → In Progress
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Hi,

> Thanks a lot. I will prepare to apply these 2 patches into our fsl-
> imx51 kernel

Thanks. I suggest to keep an eye on upstream anyway, just in case, but
I don't know when they'll come to agreement.

>
> Happy Chinese New Year.
> -Bryan

To you also!

Cheers
---Dave

--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Revision history for this message
Bryan Wu (cooloney) wrote :

These 2 patches is in my git branch now:
http://kernel.ubuntu.com/git?p=roc/ubuntu-lucid.git;a=shortlog;h=refs/heads/vfp_sig

Paul and Oliver, could you please help me to test the kernel packages on your hardware:
http://people.canonical.com/~roc/kernel/vfp_sig/

Thanks,
-Bryan

Revision history for this message
Bryan Wu (cooloney) wrote :

For fsl-imx51 kernel, patches were applied:

--
linux-fsl-imx51 (2.6.31-605.8) lucid; urgency=low

  [ Upstream Kernel Changes ]

  * ARM: VFP: add support to sync the VFP state of the current thread
    - LP: #507503
  * ARM: VFP: preserve the HW context when calling signal handlers
    - LP: #507503

 -- Andy Whitcroft <email address hidden> Tue, 23 Feb 2010 10:51:17 +0000

---

So close this bug.

Changed in linux-fsl-imx51 (Ubuntu Lucid):
status: In Progress → Fix Committed
Revision history for this message
Bryan Wu (cooloney) wrote :

For mvl-dove kernel, patches were applied too

--
linux-mvl-dove (2.6.32-201.11) lucid; urgency=low

  [ Upstream Kernel Changes ]

  * ARM: VFP: add support to sync the VFP state of the current thread
    - LP: #507503
  * ARM: VFP: preserve the HW context when calling signal handlers
    - LP: #507503

 -- Andy Whitcroft <email address hidden> Tue, 23 Feb 2010 11:01:14 +0000
---

So close this bug.

Changed in linux-mvl-dove (Ubuntu Lucid):
status: Triaged → Fix Committed
Steve Langasek (vorlon)
Changed in linux-fsl-imx51 (Ubuntu Lucid):
milestone: lucid-alpha-3 → ubuntu-10.04-beta-1
Changed in linux-mvl-dove (Ubuntu Lucid):
milestone: lucid-alpha-3 → ubuntu-10.04-beta-1
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-mvl-dove - 2.6.32-201.11

---------------
linux-mvl-dove (2.6.32-201.11) lucid; urgency=low

  [ Upstream Kernel Changes ]

  * ARM: VFP: add support to sync the VFP state of the current thread
    - LP: #507503
  * ARM: VFP: preserve the HW context when calling signal handlers
    - LP: #507503
 -- Andy Whitcroft <email address hidden> Tue, 23 Feb 2010 11:01:14 +0000

Changed in linux-mvl-dove (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-fsl-imx51 - 2.6.31-605.8

---------------
linux-fsl-imx51 (2.6.31-605.8) lucid; urgency=low

  [ Upstream Kernel Changes ]

  * ARM: VFP: add support to sync the VFP state of the current thread
    - LP: #507503
  * ARM: VFP: preserve the HW context when calling signal handlers
    - LP: #507503
 -- Andy Whitcroft <email address hidden> Tue, 23 Feb 2010 10:51:17 +0000

Changed in linux-fsl-imx51 (Ubuntu Lucid):
status: Fix Committed → Fix Released
Amit Kucheria (amitk)
Changed in linux-ti-omap (Ubuntu Lucid):
assignee: nobody → Amit Kucheria (amitk)
importance: Undecided → High
milestone: none → ubuntu-10.04-beta-2
Paul Larson (pwlars)
Changed in linux-ti-omap (Ubuntu Lucid):
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package linux-ti-omap - 2.6.33-500.4

---------------
linux-ti-omap (2.6.33-500.4) lucid; urgency=low

  [ Amit Kucheria ]

  * [Config] Compile-in display subsystem
  * Merge the DSS2 stack from 2.6.34-rc2 mainline tree
  * SAUCE: Upgrade aufs2 to latest version for 2.6.33
    - LP: #548924
  * [Config] Update configuration for new aufs2
  * [Config] Fix d-i modules
  * [Config] USB_STORAGE and ATA drivers can be a modules
  * [Upstream] DSS2 support for Beagleboard
  * [Config] Configure DSS2 based on beagle defconfig
  * [Config] Enable USB OTG mode
  * [Config] Enable various ethernet drivers

  [ Imre Deak ]

  * SAUCE: ARM: VFP: add support to sync the VFP state of the current
    thread
    - LP: #507503
  * SAUCE: ARM: VFP: preserve the HW context when calling signal handlers
    - LP: #507503

  [ Upstream Kernel Changes ]

  * OMAP: DSS2: Add Sharp LQ043T1DG01 panel driver
  * OMAP: DSS2: add Toppoly TDO35S panel
  * OMAP: DSS: add TPO TD043MTEA1 panel
  * OMAP: DSS2: Improve Kconfig help texts
  * OMAP: DSS2: enable VDDS_DSI when using DPI
  * OMAP: 3430SDP: remove vdvi regulator
  * OMAP: DSS: Taal: fix error returns in taal_probe()
  * OMAP: DSS2: OMAPFB: implement OMAPFB_GET_DISPLAY_INFO
  * OMAP: DSS2: fix irq-stats compilation
  * OMAP: DSS2: OMAPFB: Add omapfb_update_window prototype
  * OMAP: DSS2: improve DSS clk src selection
  * OMAP: DSS2: DSI: add dsi_bus_is_locked()
  * OMAP: DSS2: DSI: add helpers for DCS read/write
  * OMAP: DSS2: DSI: export dsi_vc_enable_hs()
  * OMAP: DSS2: DSI: configure all DSI VCs
  * OMAP: DSS2: DSI: remove dsi_vc_print_status()
  * OMAP: DSS2: Check ctx loss count only when starting the first clock
  * OMAP: DSS2: remove sub-panel system
  * OMAP: DSS2: fix driver probe error handling
  * OMAP: DSS2: OMAPFB: fix dssdev cleanup on error
  * OMAP: DSS2: OMAPFB: fix cleanup on dssdev enable error
  * OMAP: DSS2: fix get_dsi/dispc_clk_source() usage
  * OMAP: DSS2: DSI: change DSI bus_lock to semaphore
  * OMAP: DSS2: DSI: remove auto-update perf measurement
  * OMAP: DSS2: move run_test()
  * OMAP: DSS2: move memory_read()
  * OMAP: DSS2: move set/get_mirror()
  * OMAP: DSS2: move get/set_rotate()
  * OMAP: DSS2: move wait_vsync()
  * OMAP: DSS2: move enable/disable_channel to overlay manager
  * OMAP: DSS2: move get_resolution()
  * OMAP: DSS2: move get_recommended_bpp()
  * OMAP: DSS2: move enable/get_te()
  * OMAP: DSS2: move set/get_update_mode()
  * OMAP: DSS2: move update() and sync()
  * OMAP: DSS2: move enable/disable/suspend/resume
  * OMAP: DSS2: move set/get_wss()
  * OMAP: DSS2: move timing functions
  * OMAP: DSS2: DSI: remove external TE support
  * OMAP: DSS2: OMAPFB: Remove FB_OMAP2_FORCE_AUTO_UPDATE
  * OMAP: DSS2: DSI: add dsi_vc_dcs_read_2() helper
  * OMAP: DSS2: TPO-TD03MTEA1: fix function names
  * OMAP: DSS2: DSI: add error prints
  * OMAP: DSS2: OMAPFB: Constify some function parameters
  * OMAP: DSS2: Taal: Fix ESD check
  * OMAP: DSS2: Taal: Fix TE when resuming
  * OMAP: DSS2: VRAM: Fix early_param for vram
  * OMAP: DSS2: initialize dss clk sources properly
  * OMAP: DSS2: panel-generic: re-implement mode changing
 -- Andy Whitcroft <apw@cano...

Read more...

Changed in linux-ti-omap (Ubuntu Lucid):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.