I received some additional information that I had not seen before with my latest crash. Notice the last three lines here.
[ 4754.254728] [drm:radeon_fence_wait] *ERROR* fence(ffff88006c4324c0:0x000102A4) 510ms timeout going to reset GPU [ 4754.254746] radeon 0000:01:00.0: GPU softreset [ 4754.254752] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xE00014A4 [ 4754.254757] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00300002 [ 4754.254763] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200030C0 [ 4754.412213] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 4754.412220] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE [ 4754.412280] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 [ 4754.412342] radeon 0000:01:00.0: R_000E60_SRBM_SOFT_RESET=0x00000C02 [ 4754.436581] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xFFFFFFFF [ 4754.436586] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0xFFFFFFFF [ 4754.436591] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0xFFFFFFFF [ 4754.451314] [drm:radeon_fence_wait] *ERROR* fence(ffff88006c4324c0:0x000102A4) 720ms timeout [ 4754.451320] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x000102A4) [ 4754.460002] Uhhuh. NMI received for unknown reason a1 on CPU 0. [ 4754.460002] You have some hardware problem, likely on the PCI bus. [ 4754.460002] Dazed and confused, but trying to continue
I received some additional information that I had not seen before with my latest crash. Notice the last three lines here.
[ 4754.254728] [drm:radeon_ fence_wait] *ERROR* fence(ffff88006 c4324c0: 0x000102A4) 510ms timeout going to reset GPU GRBM_STATUS= 0xE00014A4 GRBM_STATUS2= 0x00300002 SRBM_STATUS= 0x200030C0 GRBM_SOFT_ RESET=0x00007FE E GRBM_SOFT_ RESET=0x0000000 1 SRBM_SOFT_ RESET=0x00000C0 2 GRBM_STATUS= 0xFFFFFFFF GRBM_STATUS2= 0xFFFFFFFF SRBM_STATUS= 0xFFFFFFFF fence_wait] *ERROR* fence(ffff88006 c4324c0: 0x000102A4) 720ms timeout fence_wait] *ERROR* last signaled fence(0x000102A4)
[ 4754.254746] radeon 0000:01:00.0: GPU softreset
[ 4754.254752] radeon 0000:01:00.0: R_008010_
[ 4754.254757] radeon 0000:01:00.0: R_008014_
[ 4754.254763] radeon 0000:01:00.0: R_000E50_
[ 4754.412213] radeon 0000:01:00.0: Wait for MC idle timedout !
[ 4754.412220] radeon 0000:01:00.0: R_008020_
[ 4754.412280] radeon 0000:01:00.0: R_008020_
[ 4754.412342] radeon 0000:01:00.0: R_000E60_
[ 4754.436581] radeon 0000:01:00.0: R_008010_
[ 4754.436586] radeon 0000:01:00.0: R_008014_
[ 4754.436591] radeon 0000:01:00.0: R_000E50_
[ 4754.451314] [drm:radeon_
[ 4754.451320] [drm:radeon_
[ 4754.460002] Uhhuh. NMI received for unknown reason a1 on CPU 0.
[ 4754.460002] You have some hardware problem, likely on the PCI bus.
[ 4754.460002] Dazed and confused, but trying to continue