SRU: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)

Bug #727594 reported by David Coggins
892
This bug affects 133 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Invalid
Undecided
Unassigned
xf86-video-intel
Fix Released
High
linux (Ubuntu)
Invalid
Undecided
Unassigned
Natty
Invalid
Undecided
Unassigned
Oneiric
Invalid
Undecided
Unassigned
xserver-xorg-video-intel (Ubuntu)
Fix Released
High
Bryce Harrington
Natty
Fix Released
High
Bryce Harrington
Oneiric
Fix Released
High
Bryce Harrington

Bug Description

[Impact]
Severe GPU lockup affecting the i915/i945 family of Intel chips, resulting in unrecoverable freeze of graphics, black screen and/or corruption, requiring a hard reboot to reset. The issue is widespread amongst these cards as evidenced by the large number of dupes; the hardware is common. Most users report this is a regression in behavior from maverick.

[Background]
Subsequent to the code included in maverick, upstream introduced an optimization to relax fencing on Intel hardware. This change reduced the amount of memory allocated for video buffers. However on older (pre-G33) hardware such as i915/i945 this results in increased chances of GPU lockups.

[Fix for Development Version]
Upstream has opted to disable the relaxed fencing optimization for their driver release, and the change is still present in their active upstream git tree. Thus, we will be pulling this fix when we update X in oneiric.

[Fix for Stable Version]
For natty, the attached patch is a backport of the patch that went upstream. This patch makes relaxed fencing into an xorg.conf option that can be set, and makes it disabled by default for gen < 33 chipsets.

[Steps to Reproduce]
The freezes typically occur intermittently after some period of use. For some people it occurs right at boot, others after minutes or hours of usage. Some users find that certain activities such as web browsing makes the issue more likely to occur, but others do not find it correlated to any particular action.

However, in all cases once the system is frozen, the file /sys/kernel/debug/dri/0/i915_error_state will contain an error code for the IPEHR value. The exact value appears to vary greatly from hw to hw, but common values tend to be either in the 0x02xxxxxx or 0x7xxxxxxx range.

With this patch applied, gpu freezes should either go away entirely, or become much less frequent. Freezes which still occur but have IPEHR values outside these two ranges may be unrelated bugs.

[Regression Potential]
The patch itself is relatively small and unlikely to introduce regression.

However, this switches optimization paths within the driver. It restores us to an older codepath so presumably this will at least be as stable as maverick, however the amount of testing this path has received on natty is limited. The code is upstream and being tested by the wider community and so far has not proven problematic.

Because it is disabling an optimization, it is possible some users of older hardware may see performance regress, but should be no worse than what was available in maverick.

[Original Report]
As instructed by Bryce I installed the following kernel on my Asus eee pc 701 running natty:
" Please test with the following kernel:

  http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/ "

by doing sudo dpkg -i linux-image....deb

Then I ran update manager.

After rebooting I browsed with google chrome. The system froze after 5 minutes.

I rebooted the same kernel and immediately got this problem popup. In addition there is font corruption - part of the letter t is missing in the browser.

ProblemType: Crash
DistroRelease: Ubuntu 11.04
Package: xserver-xorg-video-intel 2:2.14.0-1ubuntu11
Uname: Linux 2.6.38-999-generic i686
Architecture: i386
Chipset: i915gm
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
DRM.card0.LVDS.1:
 status: connected
 enabled: enabled
 dpms: On
 modes: 800x480
 edid-base64:
DRM.card0.VGA.1:
 status: connected
 enabled: enabled
 dpms: On
 modes: 1680x1050 1280x1024 1280x1024 1280x960 1152x864 1024x768 1024x768 1024x768 832x624 800x600 800x600 800x600 800x600 640x480 640x480 640x480 640x480 720x400
 edid-base64: AP///////wBMLX4CMjJFTQ4RAQMOLx54KtUVpFVJmicUUFS/74CzAIGAgUBxTwEBAQEBAQEBfC6QoGAaHkAwIDYA2igRAAAaAAAA/QA4Sx5RDgAKICAgICAgAAAA/ABTeW5jTWFzdGVyCiAgAAAA/wBIOU5QNDAwMTk3CiAgAM8=
Date: Wed Mar 2 15:23:13 2011
DistUpgraded: Yes, recently upgraded Log time: 2011-02-17 19:00:58.983764
DistroCodename: natty
DistroVariant: ubuntu
DumpSignature: c38b9ae8 (ESR: 0x00000001 IPEHR: 0x02000004)
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GraphicsCard:
 Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (rev 04) (prog-if 00 [VGA controller])
   Subsystem: ASUSTeK Computer Inc. Device [1043:82d9]
   Subsystem: ASUSTeK Computer Inc. Device [1043:82d9]
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Alpha i386 (20110202)
InterpreterPath: /usr/bin/python2.7
MachineType: ASUSTeK Computer INC. 701
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-999-generic root=UUID=3893e3fd-c2b4-40ec-9810-5a9aba86cbd4 ro crashkernel=384M-2G:64M,2G-:128M quiet splash vt.handoff=7
ProcKernelCmdLine_: BOOT_IMAGE=/boot/vmlinuz-2.6.38-999-generic root=UUID=3893e3fd-c2b4-40ec-9810-5a9aba86cbd4 ro crashkernel=384M-2G:64M,2G-:128M quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.6~3ubuntu8
 libdrm2 2.4.23-1ubuntu3
 xserver-xorg-video-intel 2:2.14.0-1ubuntu11
Renderer: Unknown
SourcePackage: xserver-xorg-video-intel
Title: [i915gm] GPU lockup c38b9ae8 (ESR: 0x00000001 IPEHR: 0x02000004)
UpgradeStatus: Upgraded to natty on 2011-02-24 (5 days ago)
UserGroups:

dmi.bios.date: 05/04/2008
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1001
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: 701
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: x.xx
dmi.chassis.asset.tag: 0x00000000
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTek Computer INC.
dmi.chassis.version: x.x
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1001:bd05/04/2008:svnASUSTeKComputerINC.:pn701:pvrx.x:rvnASUSTeKComputerINC.:rn701:rvrx.xx:cvnASUSTekComputerINC.:ct10:cvrx.x:
dmi.product.name: 701
dmi.product.version: x.x
dmi.sys.vendor: ASUSTeK Computer INC.
version.compiz: compiz 1:0.9.4-0ubuntu3
version.libdrm2: libdrm2 2.4.23-1ubuntu3
version.libgl1-mesa-glx: libgl1-mesa-glx 7.10.1~git20110215.cc1636b6-0ubuntu2
version.xserver-xorg: xserver-xorg 1:7.6~3ubuntu8
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.0-0ubuntu2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.14.0-1ubuntu11
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110107+b795ca6e-0ubuntu5

[lspci]
Nux: lspci: 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (r

Revision history for this message
In , Bryce Harrington (bryce) wrote :
Download full text (3.4 KiB)

Forwarding this bug from Ubuntu reporter mkis62:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/714719

[Problem]
GPU lockup (of the "Hangcheck timer elapsed" variety) on 2.6.38-2 kernel and 2.14.0 intel driver with i915gm hardware. No compositor is running.

[Original Description]
X crashed while setting preferences in Decibel Audio Player
tty1-6 works ... rebooting...

From GPU dump:
ACTHD: 0xffffffff
EIR: 0x00000000
EMR: 0xffffffed
ESR: 0x00000001
PGTBL_ER: 0x00000000
IPEHR: 0x02000004
IPEIR: 0x00000000
INSTDONE: 0x03c7c081
    busy: IDCT
    busy: IQ
    busy: PR
    busy: VLD
    busy: Instruction parser
    busy: Windowizer
    busy: Intermediate Z
    busy: Perspective interpolation
    busy: Texture decompression
    busy: Sampler Cache
    busy: Filtering
    busy: Bypass FIFO
    busy: Pixel shader
    busy: Color calculator
    busy: Map L2

From dmesg:
[ 2026.252160] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 2026.254795] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 402290 at 402288, next 402291)

DistroRelease: Ubuntu 11.04
Package: xserver-xorg-video-intel 2:2.14.0-1ubuntu6
ProcVersionSignature: Ubuntu 2.6.38-2.29-generic 2.6.38-rc3
Uname: Linux 2.6.38-2-generic i686
Architecture: i386
Chipset: i915gm
CompisitorRunning: None
DRM.card0.LVDS.1:
 status: connected
 enabled: enabled
 dpms: On
 modes: 1024x768
 edid-base64: DRM.card0.VGA.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
Date: Mon Feb 7 18:50:19 2011
DistUpgraded: Yes, recently upgraded Log time: 2011-01-03 14:04:23.058239
DistroCodename: natty
DistroVariant: ubuntu
DumpSignature: 82856c05
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GconfCompiz:

GraphicsCard:
 Subsystem: Acer Incorporated [ALI] Device [1025:006a]
   Subsystem: Acer Incorporated [ALI] Device [1025:006a]
InterpreterPath: /usr/bin/python2.7
MachineType: Acer TravelMate 2410
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-2-generic root=UUID=263aecd1-0156-49f9-8d5e-99e8079b240f ro gfxpayload=true quiet splash vt.handoff=7
ProcKernelCmdLine_: BOOT_IMAGE=/boot/vmlinuz-2.6.38-2-generic root=UUID=263aecd1-0156-49f9-8d5e-99e8079b240f ro gfxpayload=true quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.6~3ubuntu3
 libdrm2 2.4.23-1ubuntu3
 xserver-xorg-video-intel 2:2.14.0-1ubuntu6
Renderer: Hardware acceleration
SourcePackage: xserver-xorg-video-intel
Title: [i915gm] GPU lockup 82856c05
UserGroups:

dmi.bios.date: 02/07/2006
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: V1.09
dmi.board.name: Morar
dmi.board.vendor: Acer
dmi.board.version: Rev
dmi.chassis.asset.tag: None
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvrV1.09:bd02/07/2006:svnAcer:pnTravelMate2410:pvr0100:rvnAcer:rnMorar:rvrRev:cvnAcer:ct10:cvrN/A:
dmi.product.name: TravelMate 2410
dmi.product.version: 0100
d...

Read more...

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 43065
i915_error_state.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 43066
BootDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 43067
CurrentDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 43068
XorgLog.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 43069
XorgLogOld.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 34015 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

This patch would confirm my hypothesis that is an invalid unfenced alignment:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f136899..c970b81 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1416,6 +1416,7 @@ i915_gem_get_unfenced_gtt_alignment(struct drm_i915_gem_ob
            obj->tiling_mode == I915_TILING_NONE)
                return 4096;

+ return i915_gem_get_gtt_size(obj);
        /*
         * Older chips need unfenced tiled buffers to be aligned to the left
         * edge of an even tile row (where tile rows are counted as if the bo is

Revision history for this message
In , Bryce Harrington (bryce) wrote :

We packaged this patch into a kernel for the bug reporter to test:

   http://people.canonical.com/~apw/lp714719-natty/

We have not yet heard back from him in a couple weeks.

However, we asked other bug reporters with vaguely similar lockups to test as well, and this past weekend one of them tested it and provided the following dmesg after reproducing a lockup.

   https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/718767/+attachment/1861287/+files/dmesg.txt

Revision history for this message
David Coggins (david-coggins-sydney) wrote :
Bryce Harrington (bryce)
summary: - [i915gm] GPU lockup c38b9ae8 (ESR: 0x00000001 IPEHR: 0x02000004)
+ [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
Revision history for this message
Bryce Harrington (bryce) wrote : Re: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)

David Coggins - I've forwarded this bug upstream to http://bugs.freedesktop.org/show_bug.cgi?id=34948 - please subscribe yourself to this bug, in case they need further information or wish you to test something. Thanks ahead of time!

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → High
status: New → Triaged
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
status: Unknown → Confirmed
Revision history for this message
In , Ranma+freedesktop (ranma+freedesktop) wrote :

Hmm, I think I'm seeing this too on my X41T:

Recently upgraded Debian and kernel and got gpu hangs again.
I upgraded to latest libdrm2 and xf86-video-intel, but still getting gpu hangs.
Especially chrome seems to have a knack for causing these (aggressive use of acceleration features I guess).

Linux navi 2.6.38-rc7 #64 PREEMPT Sun Mar 6 14:32:50 CET 2011 i686 GNU/Linux

ii libdrm2 2.4.24-1 Userspace interface to kernel DRM services -
ii xserver-xorg-v 2:2.14.901-1 X.Org X server -- Intel i8xx, i9xx display d

(Both built myself from newest upstream packages released last week).

intel_gpu_dump:
ACTHD: 0xffffffff
EIR: 0x00000000
EMR: 0xffffffed
ESR: 0x00000001
PGTBL_ER: 0x00000000
IPEHR: 0x02000004
IPEIR: 0x00000000
INSTDONE: 0x038ff8c1
    busy: IDCT
    busy: IQ
    busy: PR
    busy: VLD
    busy: Instruction parser
    busy: Setup engine
    busy: Windowizer
    busy: Intermediate Z
    busy: Bypass FIFO
    busy: Pixel shader
    busy: Color calculator
Ringbuffer: Reminder: head pointer is GPU read, tail pointer is CPU write
ringbuffer at 0x00000000:
(copy&paste from terminal, forgot to redirect into file before resetting the gpu with a suspend-resume cycle).

dmesg:
[29103.032023] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[29103.032023] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 1775973 at 1775971, next 1775974)
[29103.032023] [drm:i915_reset] *ERROR* Failed to reset chip.

00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03)

00:02.0 0300: 8086:2592 (rev 03)
00:02.1 0380: 8086:2792 (rev 03)

Vendor: 0x8086, Device: 0x2592, Revision: 0x03 (B1/C0)

Revision history for this message
In , Ranma+freedesktop (ranma+freedesktop) wrote :

BTW, while a suspend-resume should reset the gpu, I see this:

[31055.564022] [drm] Manually setting wedged to 0
[31055.564022] [drm:i915_reset] *ERROR* Failed to reset chip.
Why does it fail?
The units are not busy anymore according to intel_gpu_top, so I'd expect "echo 0 > /sys/kernel/debug/dri/0/i915_wedged" should unwedge it, but it doesn't

Revision history for this message
In , Ranma+freedesktop (ranma+freedesktop) wrote :

Created attachment 44183
i915 dump after s2mem (tried to recover from wedged gpu), but i915 claims it still can't reset the gpu

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #11)
> BTW, while a suspend-resume should reset the gpu, I see this:
>
> [31055.564022] [drm] Manually setting wedged to 0
> [31055.564022] [drm:i915_reset] *ERROR* Failed to reset chip.
> Why does it fail?

It fails because we have not found the means to successfully reset that chipset yet. It may well be the only way is to power cycle the PCI device. Meh.

> The units are not busy anymore according to intel_gpu_top, so I'd expect "echo
> 0 > /sys/kernel/debug/dri/0/i915_wedged" should unwedge it, but it doesn't

The units are idle because the chip hit a fatal error and disabled those units.

Revision history for this message
In , Ranma+freedesktop (ranma+freedesktop) wrote :

(In reply to comment #13)
> (In reply to comment #11)
> > BTW, while a suspend-resume should reset the gpu, I see this:
> >
> > [31055.564022] [drm] Manually setting wedged to 0
> > [31055.564022] [drm:i915_reset] *ERROR* Failed to reset chip.
> > Why does it fail?
>
> It fails because we have not found the means to successfully reset that chipset
> yet. It may well be the only way is to power cycle the PCI device. Meh.
>
> > The units are not busy anymore according to intel_gpu_top, so I'd expect "echo
> > 0 > /sys/kernel/debug/dri/0/i915_wedged" should unwedge it, but it doesn't
>
> The units are idle because the chip hit a fatal error and disabled those units.

I don't think so. They are only idle after coming back out of suspend to ram, so I think it's probably because the GPU was power-cycled.
Both resume from disk and resume from ram have the same effect here.
I think it would be very helpful if KMS/DRM could recover from the GPU hang after suspend to ram or suspend to disk, when the GPU was power-cycled. It used to be the case that 'echo 1 > i915_wedged' would restart the driver after resume, but it seems some internals have changed so that this no longer works. If it would be able to recover in this case it would avoid the need to completely reboot the system to recover.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 34948 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 44468
i915_error_state from #34948

Attaching another i915_error_state variant.

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel:
importance: Critical → Unknown
status: Confirmed → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
In , Chris Wilson (ickle) wrote :

Can you give drm-intel-staging, and in particular,

commit 0faba0d4e49361886b16c703995a3477951b14e5
Author: Chris Wilson <email address hidden>
Date: Thu Mar 17 15:23:22 2011 +0000

    drm/i915: Fix tiling corruption from pipelined fencing

    ... even though it was disabled. A mistake in the handling of fence reuse
    caused us to skip the vital delay of waiting for the object to finish
    rendering before changing the register.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34584
    Cc: Andy Whitcroft <email address hidden>
    Cc: Daniel Vetter <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    [Note for 2.6.38-stable, we need to reintroduce the interruptible passing]
    Signed-off-by: Chris Wilson <email address hidden>

a whirl?

Revision history for this message
Sergey "Shnatsel" Davidoff (shnatsel) wrote :

Confirming this on the latest Natty daily build. I logged on and got a message about kernel problem, clicked "report problem", entered my password and got this crash.

Revision history for this message
Sergey "Shnatsel" Davidoff (shnatsel) wrote :

Also, the system froze on the previous boot. The kernel was still responsive, I managed to get a proper shutdown, but X and even the keyboard NumPad toggling didn't work.

Revision history for this message
David Coggins (david-coggins-sydney) wrote :

The system froze for me testing the latest natty 2.6.38-7.36 which should incorporate the fix for bug 717114

drm/i915: Fix tiling corruption from pipelined fencing

Mar 21 11:29:13 eee kernel: [ 0.000000] Linux version 2.6.38-7-generic (buildd@roseapple) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-6ubuntu4) ) #36-Ubuntu SMP Fri Mar 18 22:05:25 UTC 2011 (Ubuntu 2.6.38-7.36-generic 2.6.38)

Mar 21 11:47:30 eee kernel: [ 1115.992048] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 21 11:47:30 eee kernel: [ 1115.998408] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 110179 at 110177, next 110180)

Apport is not generating a problem popup when I next reboot at the moment.
A small amount of testing with the terminal does not show any corruption which I was seeing 2 weeks ago bug 717114

Revision history for this message
In , Chris Wilson (ickle) wrote :

Working on the theory that it is one and the same bug:

commit b5b5ac2dec49ea5ae033434efa90863aa5cdfb2c
Author: Chris Wilson <email address hidden>
Date: Thu Mar 17 15:23:22 2011 +0000

    drm/i915: Fix tiling corruption from pipelined fencing

    ... even though it was disabled. A mistake in the handling of fence reuse
    caused us to skip the vital delay of waiting for the object to finish
    rendering before changing the register.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34584
    Cc: Andy Whitcroft <email address hidden>
    Cc: Daniel Vetter <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    [Note for 2.6.38-stable, we need to reintroduce the interruptible passing]
    Signed-off-by: Chris Wilson <email address hidden>
    Tested-by: Dave Airlie <email address hidden>

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Revision history for this message
In , Bryce Harrington (bryce) wrote :

Original reporter tested a kernel that includes commit b5b5ac2d patched in and says he still sees the hang:

David Coggins wrote on 2011-03-20:
The system froze for me testing the latest natty 2.6.38-7.36 which should incorporate the fix for bug 717114

drm/i915: Fix tiling corruption from pipelined fencing

Mar 21 11:29:13 eee kernel: [ 0.000000] Linux version 2.6.38-7-generic (buildd@roseapple) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-6ubuntu4) ) #36-Ubuntu SMP Fri Mar 18 22:05:25 UTC 2011 (Ubuntu 2.6.38-7.36-generic 2.6.38)

Mar 21 11:47:30 eee kernel: [ 1115.992048] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 21 11:47:30 eee kernel: [ 1115.998408] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 110179 at 110177, next 110180)

Apport is not generating a problem popup when I next reboot at the moment.
A small amount of testing with the terminal does not show any corruption which I was seeing 2 weeks ago bug 717114

Revision history for this message
Bryce Harrington (bryce) wrote :

Hi David, thanks for the feedback, sorry to hear it didn't solve it.

I've reopened the upstream bug report and included your comment. It would be very helpful if you could add yourself to the CC of the upstream bug report, so you can spot and respond to upstream's suggestions for things to test.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 35608 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 44880
i915_error_state from #35608

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 35647 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 44881
i915_error_state from #35647

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed
Revision history for this message
mkis62 (mihaikx62) wrote :

Still crashing. It happens on rapid mouse movement or scroll in Firefox.
2.6.38-7 worked fine for few days...

Reported on 2011-04-01 as Bug #747676, got the reply 'This bug has been marked a duplicate of bug 727594', so here we are

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 36000 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 45335
i915_error_state from #36000

Revision history for this message
Teun (teunkloosterman) wrote :

Again, this is not a critical bug for me.

I filed three reports in this bug which all happened when the crash reporter was reporting another bug. The crash reporter crashes and confronts me with this bug.

My computer doesn't stall or in any other way starts behaving badly, it's only when a diagnostic report on another application is being created that the bug reporter crashes.

Revision history for this message
David Coggins (david-coggins-sydney) wrote :

System froze testing natty 2.6.38-8.41. I would be interested to know whether this kernel contains the Mar 25 patch

 drm/i915: Round-up GTT allocations for unfenced surfaces to the next tile row ?

Alternatively does the daily mainline kernel 2.6.39-999 contain the patch?

Revision history for this message
Jim Bronson (jim-bronson) wrote : Re: [Bug 727594] Re: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
Download full text (5.0 KiB)

I gave up and went back to 10.04 LTS. I don't like Unity anyway.

Unsubscribe

On Fri, Apr 8, 2011 at 4:45 AM, David Coggins <email address hidden> wrote:
> System froze testing natty 2.6.38-8.41. I would be interested to know
> whether this kernel contains the Mar 25 patch
>
>  drm/i915: Round-up GTT allocations for unfenced surfaces to the next
> tile row ?
>
> Alternatively does the daily mainline kernel 2.6.39-999 contain the
> patch?
>
> --
> You received this bug notification because you are a direct subscriber
> of a duplicate bug (747676).
> https://bugs.launchpad.net/bugs/727594
>
> Title:
>  [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
>
> Status in X.org xf86-video-intel:
>  Confirmed
> Status in “linux” package in Ubuntu:
>  New
> Status in “xserver-xorg-video-intel” package in Ubuntu:
>  Triaged
>
> Bug description:
>  Binary package hint: xserver-xorg-video-intel
>
>  As instructed by Bryce I installed the following kernel on my Asus eee pc 701 running natty:
>  " Please test with the following kernel:
>
>    http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/ "
>
>  by doing sudo dpkg -i linux-image....deb
>
>  Then I ran update manager.
>
>  After rebooting I browsed with google chrome. The system froze after 5
>  minutes.
>
>  I rebooted the same kernel and immediately got this problem popup. In
>  addition there is font corruption - part of the letter t is missing in
>  the browser.
>
>  ProblemType: Crash
>  DistroRelease: Ubuntu 11.04
>  Package: xserver-xorg-video-intel 2:2.14.0-1ubuntu11
>  Uname: Linux 2.6.38-999-generic i686
>  Architecture: i386
>  Chipset: i915gm
>  CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
>  CompositorRunning: None
>  DRM.card0.LVDS.1:
>   status: connected
>   enabled: enabled
>   dpms: On
>   modes: 800x480
>   edid-base64:
>  DRM.card0.VGA.1:
>   status: connected
>   enabled: enabled
>   dpms: On
>   modes: 1680x1050 1280x1024 1280x1024 1280x960 1152x864 1024x768 1024x768 1024x768 832x624 800x600 800x600 800x600 800x600 640x480 640x480 640x480 640x480 720x400
>   edid-base64: AP///////wBMLX4CMjJFTQ4RAQMOLx54KtUVpFVJmicUUFS/74CzAIGAgUBxTwEBAQEBAQEBfC6QoGAaHkAwIDYA2igRAAAaAAAA/QA4Sx5RDgAKICAgICAgAAAA/ABTeW5jTWFzdGVyCiAgAAAA/wBIOU5QNDAwMTk3CiAgAM8=
>  Date: Wed Mar  2 15:23:13 2011
>  DistUpgraded: Yes, recently upgraded Log time: 2011-02-17 19:00:58.983764
>  DistroCodename: natty
>  DistroVariant: ubuntu
>  DumpSignature: c38b9ae8 (ESR: 0x00000001 IPEHR: 0x02000004)
>  ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
>  GraphicsCard:
>   Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (rev 04) (prog-if 00 [VGA controller])
>     Subsystem: ASUSTeK Computer Inc. Device [1043:82d9]
>     Subsystem: ASUSTeK Computer Inc. Device [1043:82d9]
>  InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Alpha i386 (20110202)
>  InterpreterPath: /usr/bin/python2.7
>  MachineType: ASUSTeK Computer INC. 701
>  ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
>  ProcEnviron:
>
>  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-999-generic root=UUID=3893e3fd-c2b4-40ec-9810-...

Read more...

Revision history for this message
In , Knut-petersen (knut-petersen) wrote :

I suspect that this bug is related to Bug 36147

Test if reverting commit cc930a37612341a1f2457adb339523c215879d82
helps

Revision history for this message
In , Chris Wilson (ickle) wrote :

Bryce, I'm confident that Knut identified the same issue and so disabling relaxed-fencing for the release should fix these as well. (I have lingering doubts since we tried the obvious kernel workarounds, but then again I think we may have a fundamental bug in our allocation ala gen2.) Obviously, if I am wrong, let's open the bug again.

commit 686018f283f1d131073ef5917213e6a8ac013f26
Author: Chris Wilson <email address hidden>
Date: Tue Apr 12 08:23:04 2011 +0100

    Turn relaxed-fencing off by default for older (pre-G33) chipsets

    There are still too many unresolved bugs, typically GPU hangs, that are
    related to using relaxed fencing (i.e. only allocating the minimal
    amount of memory required for a buffer) on older hardware, so turn off
    the feature by default for the release.

    Reported-and-tested-by: Knut Petersen <email address hidden>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36147
    Signed-off-by: Chris Wilson <email address hidden>
    Acked-by: Daniel Vetter <email address hidden>

Revision history for this message
In , James Le Cuirot (chewi) wrote :

I can't look too deeply into it right now but it looks like this hasn't fixed it for me. The xf86-video-intel I built definitely included that commit and I was running 2.6.38.2.

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Revision history for this message
Robin.He (hechu) wrote : Re: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)

Is there anyone who can help to release a bug fixed package for Natty? My laptop crashed almost every day and every boot up. I have to remote login (ssh) to my laptop, kill X and restart my GUI every time.
Why should I use a Natty beta version? Because it seems only Natty's graphics chip-set driver support my Sandy Bridge GPU, ( VESA is too slow for me, even I don't need compiz and any 3D effect ),
Thank you.

Revision history for this message
Bryce Harrington (bryce) wrote :

I backported the patch from upstream that they believe will resolve this issue. It might be too late to include this in natty, however if people test it quickly and if it is found to eliminate the freezes on i915 maybe there's a chance.

My backport patch is available in my PPA here: https://launchpad.net/~bryce/+archive/fig

If you test it and find it resolves the issue sufficiently let me know.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
mkis62 (mihaikx62) wrote :

Installed the patch from PPA
Still crashing -- scrolling on a page in Firefox with flash-video content freezes all.
The problem cannot be reported (not genuine Ubuntu package)

Revision history for this message
Colin (colinnc) wrote :

The PPA patch seems to have fixed the problem for me. Using Google Chrome would cause a lockup within five minutes before, so far no lockups after one hour of use. I'll update if anything changes.

Thanks!

Revision history for this message
Robin.He (hechu) wrote :

Hi, unfortunately, the PPA patch did not fix the problem for me.
I am running Firefox, LibreOffice, gnome-terminal, gedit and a third part IM software, when I am typing in the gnome-terminal, the X locked, I have to force power off my laptop (by push the power button) because I don't have another computer to remote login this time.
Now I set "fbdev" as my GPU driver. It runs OK except the speed.
For my case, please refer to:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/754777

Revision history for this message
Bryce Harrington (bryce) wrote :

@Robin, hmm, looks like your bug got mistakenly duped to this one. You have a sandybridge system, whereas near as we can tell this bug is specific to the i915/i945 architecture. I'll undupe your bug report.

That leaves us with one yes, one no... can anyone else provide feedback? I'd like to see a few yes's (and more yes's than no's) before proceeding with this fix.

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Eric Appleman (erappleman) wrote :

I'm not seeing this on i945gm with the final freeze kernel.

Revision history for this message
mkis62 (mihaikx62) wrote :

Bryce, this time I booted on 2.6.38-8 with your patch from PPA. Seems to be OK, even after some 'stressing' in Firefox (multiple tabs, video, scrolling...).
Thanks!

Revision history for this message
David Coggins (david-coggins-sydney) wrote :

My eeepc is running ok after installing the backport from the fig ppa. I used the system for several hours last night in google chrome and again this morning. Previously the bug could take all day to show up. Scrolling is a little laggy. However this evening I am seeing slight font corruption in the letter p in the console and gedit.

Revision history for this message
Eakan Gopalakrishnan (eakangk) wrote : Re: [Bug 727594] Re: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
Download full text (5.2 KiB)

i always keep getting the fonts displayed weirdly..sometimes the upper part
of 't' is invisible, other times the lower part of letters like 'p' is
invisible. Sometimes strange weird lines appear.
wonder what wrong.

On Thu, Apr 21, 2011 at 12:10 PM, David Coggins
<email address hidden>wrote:

> My eeepc is running ok after installing the backport from the fig ppa. I
> used the system for several hours last night in google chrome and again
> this morning. Previously the bug could take all day to show up.
> Scrolling is a little laggy. However this evening I am seeing slight
> font corruption in the letter p in the console and gedit.
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/727594
>
> Title:
> [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
>
> Status in X.org xf86-video-intel:
> Fix Released
> Status in “linux” package in Ubuntu:
> New
> Status in “xserver-xorg-video-intel” package in Ubuntu:
> Incomplete
>
> Bug description:
> Binary package hint: xserver-xorg-video-intel
>
> As instructed by Bryce I installed the following kernel on my Asus eee pc
> 701 running natty:
> " Please test with the following kernel:
>
> http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/ "
>
> by doing sudo dpkg -i linux-image....deb
>
> Then I ran update manager.
>
> After rebooting I browsed with google chrome. The system froze after 5
> minutes.
>
> I rebooted the same kernel and immediately got this problem popup. In
> addition there is font corruption - part of the letter t is missing in
> the browser.
>
> ProblemType: Crash
> DistroRelease: Ubuntu 11.04
> Package: xserver-xorg-video-intel 2:2.14.0-1ubuntu11
> Uname: Linux 2.6.38-999-generic i686
> Architecture: i386
> Chipset: i915gm
> CompizPlugins: No value set for
> `/apps/compiz-1/general/screen0/options/active_plugins'
> CompositorRunning: None
> DRM.card0.LVDS.1:
> status: connected
> enabled: enabled
> dpms: On
> modes: 800x480
> edid-base64:
> DRM.card0.VGA.1:
> status: connected
> enabled: enabled
> dpms: On
> modes: 1680x1050 1280x1024 1280x1024 1280x960 1152x864 1024x768 1024x768
> 1024x768 832x624 800x600 800x600 800x600 800x600 640x480 640x480 640x480
> 640x480 720x400
> edid-base64:
> AP///////wBMLX4CMjJFTQ4RAQMOLx54KtUVpFVJmicUUFS/74CzAIGAgUBxTwEBAQEBAQEBfC6QoGAaHkAwIDYA2igRAAAaAAAA/QA4Sx5RDgAKICAgICAgAAAA/ABTeW5jTWFzdGVyCiAgAAAA/wBIOU5QNDAwMTk3CiAgAM8=
> Date: Wed Mar 2 15:23:13 2011
> DistUpgraded: Yes, recently upgraded Log time: 2011-02-17 19:00:58.983764
> DistroCodename: natty
> DistroVariant: ubuntu
> DumpSignature: c38b9ae8 (ESR: 0x00000001 IPEHR: 0x02000004)
> ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
> GraphicsCard:
> Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller
> [8086:2592] (rev 04) (prog-if 00 [VGA controller])
> Subsystem: ASUSTeK Computer Inc. Device [1043:82d9]
> Subsystem: ASUSTeK Computer Inc. Device [1043:82d9]
> InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Alpha i386 (20110202)
> InterpreterPath: /usr/bin/python2.7
> MachineType: ASUSTeK Computer INC. 7...

Read more...

Revision history for this message
Bryce Harrington (bryce) wrote : Re: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)

Lets leave the font corruption to a different bug, unless it can be definitively shown to be caused by the fig ppa.

So, one more yes vote from mkis62, and perhaps too early to say for David.

Revision history for this message
John T. Folden (john-t-folden) wrote :

I think the fig ppa has fixed the issue for me...

Revision history for this message
Bryce Harrington (bryce) wrote :

Alright I've gotten a few more confirmations of the fix on bugs #755693 and #763259. I think we may be too late for the release unfortunately, but I can try to get it accepted as an SRU. The criteria for SRUs are stricter than development changes though. The more +1 confirmations we can gather that this change resolves freeze issues the better.

Since this bug report is the oldest of this family of bugs I'm going to make it the master for all the dupes and the primary for filing the SRU. Apologies ahead of time if this generates a lot of email traffic for everyone (you can unsub via the bug report in launchpad if you don't want the emails, but hopefully it should be over and done in a couple weeks).

@David, since you're the original reporter on this bug report, would you mind following up with your latest findings vis a vis the ppa fix?

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → In Progress
Bryce Harrington (bryce)
description: updated
Revision history for this message
Bryce Harrington (bryce) wrote :
Changed in xserver-xorg-video-intel (Ubuntu Natty):
milestone: none → natty-updates
Changed in linux (Ubuntu Natty):
status: New → Invalid
Changed in xserver-xorg-video-intel (Ubuntu Natty):
assignee: nobody → Bryce Harrington (bryce)
summary: - [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
+ SRU: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu Natty):
status: In Progress → Fix Committed
Revision history for this message
AJenbo (ajenbo) wrote :

I really hope this makes it in the release, I was not even able to login prior to installing the fix via the terminal in recovery mode :(

Revision history for this message
Bryce Harrington (bryce) wrote :
Revision history for this message
AJenbo (ajenbo) wrote :

Bryce, the patch is empty...

Revision history for this message
David Coggins (david-coggins-sydney) wrote :

The freezes have not occurred for several days now so I would consider them fixed. The patch has not made other problems worse.

Revision history for this message
Christian Göbel (christiangoebel) wrote :

+1
I tested the ppa. Everything looks good so far - no crash since I installed the patched driver.

Revision history for this message
Bryce Harrington (bryce) wrote :

> Bryce, the patch is empty...

Ha, true enough. Let's try that again.

Btw, I'm pretty sure the release is already more or less in the bag, so the fix is not likely to make it to the cd. (I could be wrong; it's a severe enough bug the archive team might pull it in if there are other last minute bugs making it worthwhile to regenerate the CD iso.)

It's more likely this will go through the SRU process. That means, it'll be reviewed and approved to go into natty-proposed for people to test, where it'll love for a period from a week to several weeks until enough testing has been done to show it does not cause regression. At that point it will move to natty-updates and be generally available to all users.

If a lot of people test -proposed and give it a +1, that will help accelerate getting the fix into the release. If people test it and find any regressions, that will significantly delay it going in until those issues can be investigated and resolved. So, hopefully lots of people give +1's and no one gives -1's, and we'll see this live for natty for people to update to post-release.

Revision history for this message
AJenbo (ajenbo) wrote :

With some system not even able to boot normally that wouldn't be very user friendly.

It's looking more and more like Lucid and i8xx all over again :(

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

Reopening, though I'm not sure if Cuirot is the reporter.

Chris, if it does fix, I'd suggest marking dup as resolution.

Revision history for this message
In , James Le Cuirot (chewi) wrote :

If we're going to use surnames, it's Le Cuirot please!

I'm not the reporter and I'm not 100% sure that my issue is the same but it is very telling that all these similar bug reports sprung up around the same time.

I would do a bisect but it's my wife's laptop and I haven't found a quick way to reproduce the issue. It usually occurs around 15 minutes into using Chromium. If someone could suggest a reliable way to reproduce it (like a GPU stress tester?) then I'll give it a try.

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Accepted xserver-xorg-video-intel into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed
bugbot (bugbot)
description: updated
Revision history for this message
David Coggins (david-coggins-sydney) wrote :

I installed the package from natty proposed replacing the fig package yesterday and since then there have been no freezes.

Revision history for this message
AJenbo (ajenbo) wrote :

This issue should probably be mentioned in the release notes.

Revision history for this message
Bryce Harrington (bryce) wrote : Re: [Bug 727594] Re: SRU: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)

On Thu, Apr 28, 2011 at 03:32:06AM -0000, AJenbo wrote:
> This issue should probably be mentioned in the release notes.

It is; I added it there earlier today.

tags: added: verification-done
removed: verification-needed
Revision history for this message
mic (mic-launchpad) wrote :

I am using the package xserver-xorg-video-intel from natty-proposed for several days, freezes are less frequent. It is better than before, unity is able (sometimes) to run for several hours. Still I have to reboot machine several times a day.

Revision history for this message
Jamison Lofthouse (jdloft) wrote :
Download full text (7.7 KiB)

What package would that be?

On Wed, Apr 27, 2011 at 9:14 PM, David Coggins <email address hidden>wrote:

> I installed the package from natty proposed replacing the fig package
> yesterday and since then there have been no freezes.
>
> --
> You received this bug notification because you are a direct subscriber
> of a duplicate bug (769862).
> https://bugs.launchpad.net/bugs/727594
>
> Title:
> SRU: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
>
> Status in X.org xf86-video-intel:
> Confirmed
> Status in “linux” package in Ubuntu:
> Invalid
> Status in “xserver-xorg-video-intel” package in Ubuntu:
> Fix Committed
> Status in “linux” source package in Natty:
> Invalid
> Status in “xserver-xorg-video-intel” source package in Natty:
> Fix Committed
>
> Bug description:
> [Impact]
> Severe GPU lockup affecting the i915/i945 family of Intel chips, resulting
> in unrecoverable freeze of graphics, black screen and/or corruption,
> requiring a hard reboot to reset. The issue is widespread amongst these
> cards as evidenced by the large number of dupes; the hardware is common.
> Most users report this is a regression in behavior from maverick.
>
> [Background]
> Subsequent to the code included in maverick, upstream introduced an
> optimization to relax fencing on Intel hardware. This change reduced the
> amount of memory allocated for video buffers. However on older (pre-G33)
> hardware such as i915/i945 this results in increased chances of GPU lockups.
>
> [Fix for Development Version]
> Upstream has opted to disable the relaxed fencing optimization for their
> driver release, and the change is still present in their active upstream git
> tree. Thus, we will be pulling this fix when we update X in oneiric.
>
> [Fix for Stable Version]
> For natty, the attached patch is a backport of the patch that went
> upstream. This patch makes relaxed fencing into an xorg.conf option that
> can be set, and makes it disabled by default for gen < 33 chipsets.
>
> [Steps to Reproduce]
> The freezes typically occur intermittently after some period of use. For
> some people it occurs right at boot, others after minutes or hours of usage.
> Some users find that certain activities such as web browsing makes the
> issue more likely to occur, but others do not find it correlated to any
> particular action.
>
> However, in all cases once the system is frozen, the file
> /sys/kernel/debug/dri/0/i915_error_state will contain an error code
> for the IPEHR value. The exact value appears to vary greatly from hw
> to hw, but common values tend to be either in the 0x02xxxxxx or
> 0x7xxxxxxx range.
>
> With this patch applied, gpu freezes should either go away entirely,
> or become much less frequent. Freezes which still occur but have
> IPEHR values outside these two ranges may be unrelated bugs.
>
> [Regression Potential]
> The patch itself is relatively small and unlikely to introduce regression.
>
> However, this switches optimization paths within the driver. It
> restores us to an older codepath so presumably this will at least be
> as stable as maverick, however the amount of testing this path has
> received on natty i...

Read more...

Revision history for this message
Patrick M (prmillius) wrote :

My machine boots into a blank screen, except the times it gives me the option to boot into recovery mode at boot up. Then if I run failsafex graphic from the menu I get a decent GUI. Maybe the answer is to just have a way to always boot in failsafe mode.

Revision history for this message
Bryce Harrington (bryce) wrote :

On Sun, May 01, 2011 at 03:15:29AM -0000, mic wrote:
> I am using the package xserver-xorg-video-intel from natty-proposed for
> several days, freezes are less frequent. It is better than before, unity
> is able (sometimes) to run for several hours. Still I have to reboot
> machine several times a day.

You could have another bug at work (we've seen this before), or an
underlying condition that exacerbates the problem.

What this fix does is increase the size of the memory pool the gpu uses.
It's sort of like instead of requiring it to hit the bullseye on a
target we're just asking that it hit anywhere on the target or the hay
bales behind it. So whatever causes your GPU to be such a bad shot
still exists, but it's not getting kicked out of the competition so
much.

In general, the two things needed for GPU bug reports are the 'dmesg'
and /sys/kernel/debug/dri/0/i915_error_state. Look at the IPEHR value
especially, as that seems to be a roughly good indicator of dupes. If
it's still the same values as this bug (0x02xxxxxx or 0x7xxxxxxx) then
your system may just be sensitive. If it's a very different value, then
might be worth handling it as a second, unrelated bug and handling it
separately.

Revision history for this message
Bryce Harrington (bryce) wrote :

On Sun, May 01, 2011 at 01:00:49PM -0000, Jamison Lofthouse wrote:
> What package would that be?

xserver-xorg-video-intel from natty-proposed.

(It's essentially the same as the fig ppa, just different version number
and changelog entry.)

Revision history for this message
Bryce Harrington (bryce) wrote :

On Sun, May 01, 2011 at 06:04:06PM -0000, Patrick M wrote:
> My machine boots into a blank screen, except the times it gives me the
> option to boot into recovery mode at boot up.

Heh, you have to explain a lot more than that. Did the issue only occur
after installing the fix? Do you have GPU lockups matching this bug
prior to the black screen? Or do you see GPU error codes matching this
bug when it is black screened? What is your graphics card?

In general, I would suggest to all, unless you KNOW you have this exact
bug, or if you have found a regression which is traceable specifically
to this fix, let's handle your issue on separate bug reports, not here.
That should help spare people from lots of extraneous email...

> Then if I run failsafex
> graphic from the menu I get a decent GUI. Maybe the answer is to just
> have a way to always boot in failsafe mode.

It's not a bad idea. I've filed this as bug #775093, feel free to
subscribe if you'd like to follow it. I plan on spending some time
reworking X diagnostic tools during oneiric, and this would fit with
those plans.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xserver-xorg-video-intel - 2:2.14.0-4ubuntu7.1

---------------
xserver-xorg-video-intel (2:2.14.0-4ubuntu7.1) natty-proposed; urgency=low

  * Add 119_disable_relaxed_fencing.patch: The relaxed fencing
    optimization is suspected as the cause for various i915/945 gpu lockup
    issues. This disables the optimization by default but adds an
    xorg.conf parameter to let people experiment with it turned on.
    (LP: #727594, #761143, #761632, #755693)
 -- Bryce Harrington <email address hidden> Fri, 22 Apr 2011 19:12:55 -0700

Changed in xserver-xorg-video-intel (Ubuntu Natty):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Copied to oneiric as well.

Changed in xserver-xorg-video-intel (Ubuntu Oneiric):
milestone: natty-updates → none
status: Fix Committed → Fix Released
Revision history for this message
Jamison Lofthouse (jdloft) wrote :
Download full text (8.0 KiB)

Has this happened in the released version (released recently) not in the
beta2 version?

On Mon, May 2, 2011 at 8:53 AM, Martin Pitt <email address hidden> wrote:

> Copied to oneiric as well.
>
> ** Changed in: xserver-xorg-video-intel (Ubuntu Oneiric)
> Status: Fix Committed => Fix Released
>
> ** Changed in: xserver-xorg-video-intel (Ubuntu Oneiric)
> Milestone: natty-updates => None
>
> --
> You received this bug notification because you are a direct subscriber
> of a duplicate bug (769862).
> https://bugs.launchpad.net/bugs/727594
>
> Title:
> SRU: [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
>
> Status in Release Notes for Ubuntu:
> New
> Status in X.org xf86-video-intel:
> Confirmed
> Status in “linux” package in Ubuntu:
> Invalid
> Status in “xserver-xorg-video-intel” package in Ubuntu:
> Fix Released
> Status in “linux” source package in Natty:
> Invalid
> Status in “xserver-xorg-video-intel” source package in Natty:
> Fix Released
> Status in “linux” source package in Oneiric:
> Invalid
> Status in “xserver-xorg-video-intel” source package in Oneiric:
> Fix Released
>
> Bug description:
> [Impact]
> Severe GPU lockup affecting the i915/i945 family of Intel chips, resulting
> in unrecoverable freeze of graphics, black screen and/or corruption,
> requiring a hard reboot to reset. The issue is widespread amongst these
> cards as evidenced by the large number of dupes; the hardware is common.
> Most users report this is a regression in behavior from maverick.
>
> [Background]
> Subsequent to the code included in maverick, upstream introduced an
> optimization to relax fencing on Intel hardware. This change reduced the
> amount of memory allocated for video buffers. However on older (pre-G33)
> hardware such as i915/i945 this results in increased chances of GPU lockups.
>
> [Fix for Development Version]
> Upstream has opted to disable the relaxed fencing optimization for their
> driver release, and the change is still present in their active upstream git
> tree. Thus, we will be pulling this fix when we update X in oneiric.
>
> [Fix for Stable Version]
> For natty, the attached patch is a backport of the patch that went
> upstream. This patch makes relaxed fencing into an xorg.conf option that
> can be set, and makes it disabled by default for gen < 33 chipsets.
>
> [Steps to Reproduce]
> The freezes typically occur intermittently after some period of use. For
> some people it occurs right at boot, others after minutes or hours of usage.
> Some users find that certain activities such as web browsing makes the
> issue more likely to occur, but others do not find it correlated to any
> particular action.
>
> However, in all cases once the system is frozen, the file
> /sys/kernel/debug/dri/0/i915_error_state will contain an error code
> for the IPEHR value. The exact value appears to vary greatly from hw
> to hw, but common values tend to be either in the 0x02xxxxxx or
> 0x7xxxxxxx range.
>
> With this patch applied, gpu freezes should either go away entirely,
> or become much less frequent. Freezes which still occur but have
> IPEHR values outside these two ranges may be unrelated...

Read more...

Revision history for this message
jtl999 (jtl999) wrote :

Ithink this is related to my Minecraft bug on dh55hc mobo

Revision history for this message
Richard Kleeman (kleeman) wrote :

I have the patched version and am still getting freezes in both unity-2d and gnome classic

Revision history for this message
Bryce Harrington (bryce) wrote :
Revision history for this message
Jonas Eberle (jonas-eberle) wrote :

Confirming that this is fixed on my G915. I had very often lockups with the natty beta's, since 1 week (when this new driver and a new kernel arrived), not even one.

Thanks to all working on this.

Revision history for this message
In , James Le Cuirot (chewi) wrote :

Still happening on 2.6.39. :(

Revision history for this message
Shriramana Sharma (jamadagni) wrote :

Hello, please see bug 768986. It is perhaps what Richard (comment #47) has. As he notes, I also have the patched version *7.1 and I still have freezes, but upon Kubuntu login. (I don't use GNOME so I can't say about that.)

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 48884
Use full-fence size for alignment on pre-G33

The complication was that there was a second bug that prevented the original patch from preventing the unalignment of the buffers.

Revision history for this message
In , Chris Wilson (ickle) wrote :

Patch posted for inclusion.

Revision history for this message
In , Chris Wilson (ickle) wrote :

commit e28f87116503f796aba4fb27d81e2c3d81966174
Author: Chris Wilson <email address hidden>
Date: Mon Jul 18 13:11:49 2011 -0700

    drm/i915: Fix unfenced alignment on pre-G33 hardware

    Align unfenced buffers on older hardware to the power-of-two object
    size. The docs suggest that it should be possible to align only to a
    power-of-two tile height, but using the already computed fence size is
    easier and always correct. We also have to make sure that we unbind
    misaligned buffers upon tiling changes.

    In order to prevent a repetition of this bug, we change the interface
    to the alignment computation routines to force the caller to provide
    the requested alignment and size of the GTT binding rather than assume
    the current values on the object.

    Reported-and-tested-by: Sitosfe Wheeler <email address hidden>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36326
    Signed-off-by: Chris Wilson <email address hidden>
    Cc: <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    Signed-off-by: Keith Packard <email address hidden>

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Revision history for this message
david6 (andrew-dowden) wrote :

I am dealing with an issue (Bug #882893 ) that may be related.

In general the PC (with Intel 82865G) is running OK, with Ubuntu 11.10 + Unity-2D, and latest updates.

Only problem: GPU (or CPU) crashes at 'screen OFF', on screensaver timeout.

( I will try: RevertingIntelDriverTo2.4 )

Revision history for this message
TheShadow (theshadow-shadowpedia) wrote :

My coworker and I who are running the same model of machine are experiencing random lock ups. The holy hint is a message in the syslog and kern about the GPU hanging. I've included some data in the attached file.

It started recently for me within the last two weeks, definitely after some round of updates. Which ones I'm not certain of.

Revision history for this message
piccobello (piccobello) wrote :

I have an old machine with this intel graphics card:
$ lspci -v|grep -i vga
00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
I am using Kubuntu natty since a while now, and I never had a problem whatsoever.
I only learned about this bug as I got the scary warning while trying to upgrade to oneiric.
Should I be worried? Is there something useful I can do?
I have arrived to natty via an upgrade, and I always kept all desktop effects deactivated, I guess that may be why I never had any issues.

Pete Graner (pgraner)
Changed in ubuntu-release-notes:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.