X freeze on i915_gem.c line 1478

Bug #843904 reported by Roland Hieber
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Seth Forshee
Lucid
Fix Released
Undecided
Seth Forshee

Bug Description

SRU Justification

Impact: The i915 driver is not holding references to DRM objects during eviction. Thus an object could be freed while i915 is still referencing it, which results in a kernel BUG and an xserver freeze.

Fix: Backport of upstream fix to hold references to objects during eviction and a related fix to object cleanup in the error paths.

Test case: Verified on LP #843904.

----

in linux-image-2.6.32-34-generic from Lucid LTS, when viewing http://www.bahnbilder.ch/pictures/original/9030.jpg in Firefox, X freezes, and dmesg shows the following:

[ 2319.730128] ------------[ cut here ]------------
[ 2319.730135] kernel BUG at /build/buildd/linux-2.6.32/drivers/gpu/drm/i915/i915_gem.c:1478!
[ 2319.730138] invalid opcode: 0000 [#1] SMP
[ 2319.730141] last sysfs file: /sys/devices/virtual/net/pan1/statistics/collisions
[ 2319.730143] CPU 0
[ 2319.730145] Modules linked in: btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bridge stp sha256_generic cryptd aes_x86_64 aes_generic dm_crypt iptable_filter ip_tables x_tables binfmt_misc ppdev vboxnetadp vboxnetflt vboxdrv joydev snd_hda_codec_intelhdmi snd_hda_codec_realtek arc4 snd_hda_intel snd_hda_codec snd_hwdep uvcvideo iwlagn iwlcore mac80211 jmb38x_ms snd_pcm snd_timer videodev v4l1_compat v4l2_compat_ioctl32 r8169 memstick mii psmouse serio_raw sdhci_pci sdhci led_class cfg80211 snd soundcore snd_page_alloc lp parport fbcon tileblit font bitblit softcursor vga16fb vgastate i915 drm_kms_helper ahci drm i2c_algo_bit video output intel_agp
[ 2319.730199] Pid: 2010, comm: compiz Not tainted 2.6.32-34-generic #76-Ubuntu 28477MG
[ 2319.730201] RIP: 0010:[<ffffffffa0086ece>] [<ffffffffa0086ece>] i915_gem_object_put_pages+0x13e/0x150 [i915]
[ 2319.730220] RSP: 0018:ffff880120417a38 EFLAGS: 00010246
[ 2319.730222] RAX: 0000000000000000 RBX: ffff8800b396bc00 RCX: 01000000000000c1
[ 2319.730224] RDX: 0000000000000026 RSI: 0000000000000000 RDI: ffff880139671d80
[ 2319.730226] RBP: ffff880120417a58 R08: 0000000000000000 R09: dead000000100100
[ 2319.730228] R10: dead000000200200 R11: dead000000100100 R12: ffff8800b396bc00
[ 2319.730231] R13: ffff880139671d80 R14: 0000000002e31000 R15: ffff880139e3afb0
[ 2319.730234] FS: 00007fb8ea22c760(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
[ 2319.730236] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2319.730238] CR2: 00007f9dc57ce000 CR3: 0000000127dc3000 CR4: 00000000000406f0
[ 2319.730240] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2319.730242] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2319.730245] Process compiz (pid: 2010, threadinfo ffff880120416000, task ffff880128b996f0)
[ 2319.730247] Stack:
[ 2319.730248] 0000000000000000 ffff8800b396bc00 ffff880139671d80 ffff880139e0d800
[ 2319.730252] <0> ffff880120417a88 ffffffffa0088921 ffff8800b396bc00 ffff880120417aa8
[ 2319.730256] <0> ffff880120417a78 ffff880120417ab8 ffff880120417af8 ffffffffa008dc81
[ 2319.730260] Call Trace:
[ 2319.730271] [<ffffffffa0088921>] i915_gem_object_unbind+0x121/0x290 [i915]
[ 2319.730281] [<ffffffffa008dc81>] i915_gem_evict_something+0x2b1/0x390 [i915]
[ 2319.730291] [<ffffffffa0087584>] i915_gem_object_bind_to_gtt+0x184/0x360 [i915]
[ 2319.730301] [<ffffffffa00890d0>] i915_gem_object_pin+0x180/0x190 [i915]
[ 2319.730311] [<ffffffffa008a56f>] i915_gem_object_pin_and_relocate+0x5f/0x3a0 [i915]
[ 2319.730320] [<ffffffffa008bb6e>] i915_gem_do_execbuffer+0x53e/0xd70 [i915]
[ 2319.730330] [<ffffffffa0084da4>] ? i915_gem_clflush_object+0x44/0x80 [i915]
[ 2319.730335] [<ffffffff8134b71b>] ? agp_flush_chipset+0x1b/0x20
[ 2319.730353] [<ffffffffa0034819>] ? drm_agp_chipset_flush+0x19/0x20 [drm]
[ 2319.730363] [<ffffffffa008c70d>] i915_gem_execbuffer+0x18d/0x360 [i915]
[ 2319.730373] [<ffffffffa008d67c>] ? i915_gem_pwrite_ioctl+0x6c/0x1e0 [i915]
[ 2319.730382] [<ffffffffa002def8>] drm_ioctl+0x318/0x4c0 [drm]
[ 2319.730392] [<ffffffffa008a982>] ? i915_gem_fault+0xd2/0x200 [i915]
[ 2319.730402] [<ffffffffa008c580>] ? i915_gem_execbuffer+0x0/0x360 [i915]
[ 2319.730407] [<ffffffff81112ea4>] ? __do_fault+0x54/0x500
[ 2319.730411] [<ffffffff811544e2>] vfs_ioctl+0x22/0xa0
[ 2319.730414] [<ffffffff81154791>] do_vfs_ioctl+0x81/0x380
[ 2319.730418] [<ffffffff81545358>] ? do_page_fault+0x158/0x3b0
[ 2319.730421] [<ffffffff81154b11>] sys_ioctl+0x81/0xa0
[ 2319.730426] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
[ 2319.730428] Code: 0a e1 eb de 66 0f 1f 84 00 00 00 00 00 e8 2b 7b 00 00 83 bb c4 00 00 00 01 0f 85 2a ff ff ff c7 43 54 00 00 00 00 e9 1e ff ff ff <0f> 0b eb fe 0f 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
[ 2319.730460] RIP [<ffffffffa0086ece>] i915_gem_object_put_pages+0x13e/0x150 [i915]
[ 2319.730469] RSP <ffff880120417a38>
[ 2319.730472] ---[ end trace 0b520cfaddb649d0 ]---

This is reproducible, and did not happen in 2.6.32-32-generic.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-34-generic 2.6.32-34.76
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-34.76-generic 2.6.32.44+drm33.19
Uname: Linux 2.6.32-34-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: rohieb 2028 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info: Error: [Errno 2] No such file or directory
Card0.Amixer.values: Error: [Errno 2] No such file or directory
Date: Wed Sep 7 15:42:24 2011
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=5e4663af-272d-47f4-b396-8a4813415806
MachineType: LENOVO 28477MG
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-34-generic root=UUID=129e2abc-1682-4b83-8be5-802a35e35f16 ro noquiet splash
RelatedPackageVersions: linux-firmware 1.34.10
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
dmi.bios.date: 06/24/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 6JET81WW (1.39 )
dmi.board.name: 28477MG
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6JET81WW(1.39):bd06/24/2010:svnLENOVO:pn28477MG:pvrThinkPadSL510:rvnLENOVO:rn28477MG:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 28477MG
dmi.product.version: ThinkPad SL510
dmi.sys.vendor: LENOVO

Revision history for this message
Roland Hieber (rohieb) wrote :
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Seth Forshee (sforshee) wrote :

It looks like we may need to backport this fix from upstream. I'll look into the backport when I get time.

commit af6261031317f646d22f994c0b467521e47aa49f
Author: Chris Wilson <email address hidden>
Date: Mon Sep 20 10:31:40 2010 +0100

    drm/i915: Hold a reference to the object whilst unbinding the eviction list

    During heavy aperture thrashing we may be forced to wait upon several active
    objects during eviction. The active list may be the last reference to
    these objects and so the action of waiting upon one of them may cause
    another to be freed (and itself unbound). To prevent the object
    disappearing underneath us, we need to acquire and hold a reference
    whilst unbinding.

    This should fix the reported page refcount OOPS:

    kernel BUG at drivers/gpu/drm/i915/i915_gem.c:1444!
    ...
    RIP: 0010:[<ffffffffa0093026>] [<ffffffffa0093026>] i915_gem_object_put_pages+0x25/0
    Call Trace:
     [<ffffffffa009481d>] i915_gem_object_unbind+0xc5/0x1a7 [i915]
     [<ffffffffa0098ab2>] i915_gem_evict_something+0x3bd/0x409 [i915]
     [<ffffffffa0027923>] ? drm_gem_object_lookup+0x27/0x57 [drm]
     [<ffffffffa0093bc3>] i915_gem_object_bind_to_gtt+0x1d3/0x279 [i915]
     [<ffffffffa0095b30>] i915_gem_object_pin+0xa3/0x146 [i915]
     [<ffffffffa0027948>] ? drm_gem_object_lookup+0x4c/0x57 [drm]
     [<ffffffffa00961bc>] i915_gem_do_execbuffer+0x50d/0xe32 [i915]

    Reported-by: Shawn Starr <email address hidden>
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=18902
    Signed-off-by: Chris Wilson <email address hidden>

Revision history for this message
Seth Forshee (sforshee) wrote :

Okay, I backported this fix plus e39a015 (the parts that weren't already backported at least). The latter fixes leaks introduced by the former. A test build is available at:

http://people.canonical.com/~sforshee/lp843904/linux-2.6.32-34.76~lp843904v201109121911/

Please test and report back whether or not it fixes your issue. Thanks!

Changed in linux (Ubuntu):
assignee: nobody → Seth Forshee (sforshee)
importance: Undecided → High
status: Confirmed → Incomplete
Revision history for this message
Roland Hieber (rohieb) wrote :

Yeah, seems to be fixed in your test build. At least, the image linked above does not let X freeze, and no such lines in dmesg. Also I was able to view several more huge pictures in Firefox which would previously result in a freezing X. Thanks for the fix!

Seth Forshee (sforshee)
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Seth Forshee (sforshee)
description: updated
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu Lucid):
status: New → Fix Committed
assignee: nobody → Seth Forshee (sforshee)
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Reapplied these to master-next:

drm/i915: Fix refleak during eviction.
drm/i915: Hold a reference to the object whilst unbinding the eviction list
drm/i915: Remove BUG_ON from i915_gem_evict_something
drm/i915: Periodically flush the active lists and requests
drm/i915/evict: Ensure we completely cleanup on failure
drm/i915: Maintain LRU order of inactive objects upon access by CPU (v2)
drm/i915: Implement fair lru eviction across both rings. (v2)
drm/i915: Move the eviction logic to its own file.
drm/i915: prepare for fair lru eviction

Revision history for this message
Herton R. Krzesinski (herton) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lucid' to 'verification-done-lucid'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-lucid
Revision history for this message
Roland Hieber (rohieb) wrote :

Yep, still fixed in 2.6.32-35.78 from -proposed. Thanks.

tags: added: verification-done-lucid
removed: verification-needed-lucid
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.2 KiB)

This bug was fixed in the package linux - 2.6.32-35.78

---------------
linux (2.6.32-35.78) lucid-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #871899

  [ Andrew Dickinson ]

  * SAUCE: sched: Prevent divide by zero when cpu_power is 0
    - LP: #614853

  [ Stefan Bader ]

  * [Config] Force perf to use libiberty for demangling
    - LP: #783660

  [ Tim Gardner ]

  * [Config] Simplify binary-udebs dependencies
    - LP: #832352
  * [Config] kernel preparation cannot be parallelized
    - LP: #832352
  * [Config] Linearize module/abi checks
    - LP: #832352
  * [Config] Linearize and simplify tree preparation rules
    - LP: #832352
  * [Config] Build kernel image in parallel with modules
    - LP: #832352
  * [Config] Set concurrency for kmake invocations
    - LP: #832352
  * [Config] Improve install-arch-headers speed
    - LP: #832352
  * [Config] Fix binary-perarch dependencies
    - LP: #832352
  * [Config] Removed stamp-flavours target
    - LP: #832352
  * [Config] Serialize binary indep targets
    - LP: #832352
  * [Config] Use build stamp directly
    - LP: #832352
  * [Config] Restore prepare-% target
    - LP: #832352
  * [Config] Fix binary-% build target
  * [Config] Fix install-headers target
    - LP: #832352
  * SAUCE: igb: Protect stats update
    - LP: #829566
  * SAUCE: rtl8192se spams log
    - LP: #859702

  [ Upstream Kernel Changes ]

  * Add mount option to check uid of device being mounted = expect uid,
    CVE-2011-1833
    - LP: #732628
    - CVE-2011-1833
  * crypto: Move md5_transform to lib/md5.c
    - LP: #827462
  * net: Compute protocol sequence numbers and fragment IDs using MD5.
    - LP: #827462
  * ALSA: timer - Fix Oops at closing slave timer
    - LP: #827462
  * ALSA: snd-usb-caiaq: Fix keymap for RigKontrol3
    - LP: #827462
  * powerpc: Fix device tree claim code
    - LP: #827462
  * powerpc: pseries: Fix kexec on machines with more than 4TB of RAM
    - LP: #827462
  * Linux 2.6.32.45+drm33.19
    - LP: #827462
  * ipv6: make fragment identifications less predictable, CVE-2011-2699
    - LP: #827685
    - CVE-2011-2699
  * tunnels: fix netns vs proto registration ordering
    - LP: #823296
  * Fix broken backport for IPv6 tunnels in 2.6.32-longterm kernels.
  * USB: xhci: fix OS want to own HC
    - LP: #837669
  * USB: assign instead of equal in usbtmc.c
    - LP: #837669
  * USB: usb-storage: unusual_devs entry for ARM V2M motherboard.
    - LP: #837669
  * USB: Serial: Added device ID for Qualcomm Modem in Sagemcom's HiLo3G
    - LP: #837669
  * atm: br2864: sent packets truncated in VC routed mode
    - LP: #837669
  * hwmon: (ibmaem) add missing kfree
    - LP: #837669
  * ALSA: snd-usb-caiaq: Correct offset fields of outbound iso_frame_desc
    - LP: #837669
  * mm: fix wrong vmap address calculations with odd NR_CPUS values
    - LP: #837669
  * perf tools: do not look at ./config for configuration
    - LP: #837669
  * fs/partitions/efi.c: corrupted GUID partition tables can cause kernel
    oops
    - LP: #837669
  * befs: Validate length of long symbolic links.
    - LP: #837669
  * ALSA: snd_usb_caiaq: track submitted output urbs
    - LP: #8...

Read more...

Changed in linux (Ubuntu Lucid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.