[elrepo] drm/i915: Resetting chip after gpu hang (Intel Braswell/Cherry View: 0x22b1)
Björn Gerhart
gerhart at posteo.de
Tue Feb 9 01:57:33 EST 2016
Dear list,
with my Braswell chipset I experience a lag of some seconds from time to time, before X continues to refresh the display. From my understanding, the reason may be a buggy computation of watermarks within the i915 driver for Valleyview/Cherryview. This forces the gpu to get resetted which takes the time.
However, with kernel 4.4, the gpu works as I would expect it. So my guess is, that when porting back the watermark computation within the i915 driver to the el7 kernel, the issue should get solved.
I don’t know if it will be sufficient only to modify the i915 part, or if it will be required to patch the drm subsystem also. This would be a much more dramatic change, as other gpu drivers will also be involved.
What are your thoughts or clues on that?
Is there anyone also experiencing this issue and interested in solving that within the el7 kernel?
Below, you’ll find some log file excerpts.
Best - Björn
Each time when booting up the system, a trace similar to the following one gets written:
[ 6.571568] Call Trace:
[ 6.571578] [<ffffffff8163515c>] dump_stack+0x19/0x1b
[ 6.571584] [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
[ 6.571587] [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
[ 6.571631] [<ffffffffa0213279>] vlv_wait_port_ready+0x139/0x180 [i915]
[ 6.571677] [<ffffffffa023f36c>] intel_enable_dp+0x20c/0x2a0 [i915]
[ 6.571737] [<ffffffffa023f6f4>] chv_pre_enable_dp+0x1b4/0x200 [i915]
[ 6.571781] [<ffffffffa021b2f0>] valleyview_crtc_enable+0x260/0x350 [i915]
[ 6.571825] [<ffffffffa0219562>] __intel_set_mode+0xb12/0xd10 [i915]
[ 6.571870] [<ffffffffa022060b>] intel_crtc_set_config+0xaab/0xff0 [i915]
[ 6.571894] [<ffffffffa010ca7b>] ? kfree_state+0x4b/0x50 [drm]
[ 6.571913] [<ffffffffa00fcdb7>] drm_mode_set_config_internal+0x67/0x100 [drm]
[ 6.571921] [<ffffffffa0196508>] restore_fbdev_mode+0xc8/0xf0 [drm_kms_helper]
[ 6.571930] [<ffffffffa01983f5>] drm_fb_helper_restore_fbdev_mode_unlocked+0x25/0x70 [drm_kms_helper]
[ 6.571937] [<ffffffffa0198462>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
[ 6.571945] [<ffffffffa019837f>] drm_fb_helper_hotplug_event+0x8f/0xe0 [drm_kms_helper]
[ 6.571989] [<ffffffffa022f56e>] intel_fbdev_output_poll_changed+0x1e/0x30 [i915]
[ 6.571997] [<ffffffffa018c7b7>] drm_kms_helper_hotplug_event+0x27/0x30 [drm_kms_helper]
[ 6.572004] [<ffffffffa018c88d>] output_poll_execute+0x6d/0x190 [drm_kms_helper]
[ 6.572008] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[ 6.572011] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[ 6.572015] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[ 6.572018] [<ffffffff810a5aef>] kthread+0xcf/0xe0
[ 6.572022] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 6.572027] [<ffffffff81645818>] ret_from_fork+0x58/0x90
[ 6.572031] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 6.572033] ---[ end trace 439a960569a8132a ]---
[ 6.669131] Console: switching to colour frame buffer device 128x48
[ 6.756489] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[ 6.756491] i915 0000:00:02.0: registered panic notifier
Then, each time when the lag occurs later on, the following is written to dmesg. Might that occur due to buggy watermark computation?
[ 156.714911] [drm] stuck on render ring
[ 156.743834] [drm] GPU HANG: ecode 8:0:0x85dffffb, in Xorg [1432], reason: Ring hung, action: reset
[ 156.743852] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 156.743861] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 156.743870] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 156.743878] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 156.743888] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 156.748853] drm/i915: Resetting chip after gpu hang
Additionally, a drm error entry gets written to /sys/class/drm/card0/error. However it’s a huge file so these are just the first few lines:
GPU HANG: ecode 8:0:0x85dffffb, in Xorg [1432], reason: Ring hung, action: reset
Time: 1453771555 s 843906 us
Kernel: 3.10.0-327.4.4.el7.x86_64
Active process (on ring render): Xorg [1432]
Reset count: 0
Suspend count: 0
PCI ID: 0x22b1
EIR: 0x00000000
IER: 0x00000000
GTIER gt 0: 0x01010121
GTIER gt 1: 0x01010101
GTIER gt 2: 0x00000070
GTIER gt 3: 0x00000101
PGTBL_ER: 0x00000000
FORCEWAKE: 0x00000000
DERRMR: 0x00000000
CCID: 0x00000000
Missed interrupts: 0x00000000
fence[0] = 9bf01f006c0001
fence[1] = 307d0010307a003
fence[2] = 308700003087003
fence[3] = 30a701f03088003
fence[4] = 30f701f030b8003
fence[5] = 30fd000030fd003
fence[6] = 31250070310e003
fence[7] = 303600003036003
fence[8] = 303a00103037003
fence[9] = 312e00803126003
fence[10] = 313300103132003
fence[11] = 31a200503137003
fence[12] = 1e5c01f01b5d001
fence[13] = 329a007031db003
fence[14] = 31af001031ae003
fence[15] = 00000000
INSTDONE_0: 0xffdfffff
INSTDONE_1: 0xffffffff
INSTDONE_2: 0xffffffff
INSTDONE_3: 0xff73fffd
ERROR: 0x00000001
FAULT_TLB_DATA: 0x00000000 0x000009f4
DONE_REG: 0x07ffffff
render command stream:
HEAD: 0x00005a78
TAIL: 0x00005a90
CTL: 0x0001f001
HWS: 0x00000000
ACTHD: 0x00000000 00005a78
IPEIR: 0x00000000
IPEHR: 0x7a000004
INSTDONE: 0xffdfffff
BBADDR: 0x00000000 00a0b488
BB_STATE: 0x00000000
INSTPS: 0x8000010b
INSTPM: 0x00006080
FADDR: 0x00000000 01efca90
RC PSMI: 0x00001010
FAULT_REG: 0x000000c1
SYNC_0: 0x00000000 [last synced 0x00000000]
SYNC_1: 0x00000000 [last synced 0x00000000]
SYNC_2: 0x00000000 [last synced 0x00000000]
GFX_MODE: 0x0000a000
PDP0: 0x000000006f849000
PDP1: 0x000000006f848000
PDP2: 0x000000006f847000
PDP3: 0x000000006f846000
seqno: 0xfffff203
waiting: yes
ring->head: 0x00000000
ring->tail: 0x00005a98
hangcheck: hung [40]
bsd command stream:
HEAD: 0x00000000
TAIL: 0x00000000
CTL: 0x00000000
HWS: 0x00037000
ACTHD: 0x00000000 00000000
IPEIR: 0x00000000
IPEHR: 0x00000000
INSTDONE: 0xfffffffe
BBADDR: 0x00000000 00000000
BB_STATE: 0x00000000
INSTPS: 0x00000000
INSTPM: 0x00000000
FADDR: 0x00000000 00000000
RC PSMI: 0x00000010
FAULT_REG: 0x00000000
SYNC_0: 0x00000000 [last synced 0x00000000]
SYNC_1: 0x00000000 [last synced 0x00000000]
SYNC_2: 0x00000000 [last synced 0x00000000]
GFX_MODE: 0x00008000
PDP0: 0x0000000000000000
PDP1: 0x0000000000000000
PDP2: 0x0000000000000000
PDP3: 0x0000000000000000
seqno: 0xffffeffe
waiting: no
ring->head: 0x00000000
ring->tail: 0x00000000
hangcheck: idle [0]
blt command stream:
HEAD: 0x00000368
TAIL: 0x00000368
CTL: 0x0001f001
HWS: 0x00059000
ACTHD: 0x00000000 00000368
IPEIR: 0x00000000
IPEHR: 0x00000000
INSTDONE: 0xfffffffe
BBADDR: 0x00000000 00600038
BB_STATE: 0x00000000
INSTPS: 0x00000000
INSTPM: 0x00000000
FADDR: 0x00000000 006a0368
RC PSMI: 0x00000018
FAULT_REG: 0x00000000
SYNC_0: 0x00000000 [last synced 0x00000000]
SYNC_1: 0x00000000 [last synced 0x00000000]
SYNC_2: 0x00000000 [last synced 0x00000000]
GFX_MODE: 0x00008000
PDP0: 0x000000006f849000
PDP1: 0x000000006f848000
PDP2: 0x000000006f847000
PDP3: 0x000000006f846000
seqno: 0xfffff202
waiting: no
ring->head: 0x00000000
ring->tail: 0x00000000
hangcheck: idle [0]
vebox command stream:
HEAD: 0x00000000
TAIL: 0x00000000
CTL: 0x00000000
HWS: 0x0007b000
ACTHD: 0x00000000 00000000
IPEIR: 0x00000000
IPEHR: 0x00000000
INSTDONE: 0xfffffffe
BBADDR: 0x00000000 00000000
BB_STATE: 0x00000000
INSTPS: 0x00000000
INSTPM: 0x00000000
FADDR: 0x00000000 00000000
RC PSMI: 0x00000010
FAULT_REG: 0x00000000
SYNC_0: 0x00000000 [last synced 0x00000000]
SYNC_1: 0x00000000 [last synced 0x00000000]
SYNC_2: 0x00000000 [last synced 0x00000000]
GFX_MODE: 0x00008000
PDP0: 0x0000000000000000
PDP1: 0x0000000000000000
PDP2: 0x0000000000000000
PDP3: 0x0000000000000000
seqno: 0xffffeffe
waiting: no
ring->head: 0x00000000
ring->tail: 0x00000000
hangcheck: idle [0]
vm[0]
Active [0]:
Pinned [14]:
00000000 81920 01 01 0 0 P dirty uncached
00014000 131072 40 40 0 0 P dirty uncached
00035000 4096 01 01 0 0 P snooped
00037000 8192 01 01 0 0 P dirty uncached
00039000 131072 40 40 0 0 P dirty uncached
00059000 8192 01 01 0 0 P dirty uncached
0005b000 131072 40 40 0 0 P dirty uncached
0007b000 8192 01 01 0 0 P dirty uncached
0007d000 131072 40 40 0 0 P dirty uncached
0009e000 3145728 40 00 0 0 P dirty uncached
01ee3000 81920 01 01 0 0 P dirty uncached
01ef7000 131072 40 40 0 0 P dirty uncached
0202d000 16384 40 00 0 0 P dirty uncached
01b5d000 3145728 36 00 0 0 P X dirty uncached (name: 4) (fence: 12)
vm[1]
Active [0]:
Pinned [0]:
vm[2]
Active [5]:
00601000 36864 37 00 fffff204 0 render uncached
0199c000 2916352 02 00 fffff204 fffff204 X dirty render uncached
01c64000 159744 37 00 fffff204 0 userptr render snooped
0165c000 262144 37 00 fffff204 0 render uncached
00a0b000 16384 7e 00 fffff204 0 dirty render uncached
Pinned [0]:
vm[3]
Active [0]:
Pinned [0]:
vm[4]
Active [0]:
Pinned [0]:
vm[5]
Active [0]:
Pinned [0]:
render ring (submitted by Xorg [1432]) --- gtt_offset = 0x00a0b000
More information about the elrepo
mailing list