RX 5500 XT Ubuntu 20.10 Нестабильность, сбой ([drm: amdgpu_dm_commit_planes.constprop.0 [amdgpu]] * ОШИБКА * Время ожидания ограждений истекло!)

Недавно я построил систему полностью на AMD, с процессором Ryzen 7 3700X и графическим процессором RX 5500 XT Phantom D Gaming. У меня есть материнская плата Aorus Pro Wifi и 32 ГБ оперативной памяти Trident Z Neo с включенным XMP.

Я использую Ubuntu 20.10 с общим ядром 5.6.13-050613.

У меня неоднократно возникали проблемы с драйверами amdgpu, которые замораживали GNOME и все окна на экране, но не мышь.Чтобы устранить проблему, необходимо выполнить цикл включения питания, хотя подключение по SSH к машине работает нормально (поэтому ядро ​​не зависает).

Вот отрывок из журнала ядра из этого сбоя:

635:May 17 16:29:09 arctic kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:40 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
636:May 17 16:29:09 arctic kernel: amdgpu 0000:0b:00.0:   in page starting at address 0x0000000000888000 from client 27
637:May 17 16:29:09 arctic kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00041C50
638:May 17 16:29:09 arctic kernel: amdgpu 0000:0b:00.0:          MORE_FAULTS: 0x0
639:May 17 16:29:09 arctic kernel: amdgpu 0000:0b:00.0:          WALKER_ERROR: 0x0
640:May 17 16:29:09 arctic kernel: amdgpu 0000:0b:00.0:          PERMISSION_FAULTS: 0x5
641:May 17 16:29:09 arctic kernel: amdgpu 0000:0b:00.0:          MAPPING_ERROR: 0x0
642:May 17 16:29:09 arctic kernel: amdgpu 0000:0b:00.0:          RW: 0x1
645:May 17 16:29:19 arctic kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
646:May 17 16:29:19 arctic kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=10870, emitted seq=10872
647:May 17 16:29:19 arctic kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
648:May 17 16:29:19 arctic kernel: amdgpu 0000:0b:00.0: GPU reset begin!
649:May 17 16:29:21 arctic kernel: amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume
654:May 17 16:29:21 arctic kernel: amdgpu: [powerplay] SMU is resuming...
655:May 17 16:29:21 arctic kernel: amdgpu: [powerplay] SMU is resumed successfully!
659:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
660:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
661:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
662:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
663:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
664:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
665:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
666:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
667:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
668:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
669:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring sdma0 uses VM inv eng 12 on hub 0
670:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring sdma1 uses VM inv eng 13 on hub 0
671:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring vcn_dec uses VM inv eng 0 on hub 1
672:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring vcn_enc0 uses VM inv eng 1 on hub 1
673:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring vcn_enc1 uses VM inv eng 4 on hub 1
674:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: ring jpeg_dec uses VM inv eng 5 on hub 1
680:May 17 16:29:22 arctic kernel: amdgpu 0000:0b:00.0: GPU reset(1) succeeded!
688:May 17 16:29:22 arctic /usr/lib/gdm3/gdm-x-session[2329]: amdgpu: amdgpu_cs_query_fence_status failed.
689:May 17 16:29:22 arctic gnome-shell[2678]: amdgpu: amdgpu_cs_query_fence_status failed.
709:May 17 16:33:23 arctic kernel: [drm:mod_hdcp_add_display_topology [amdgpu]] *ERROR* Failed to add display topology, DTM TA is not initialized.
728:May 17 16:39:00 arctic kernel: [drm:mod_hdcp_add_display_topology [amdgpu]] *ERROR* Failed to add display topology, DTM TA is not initialized.
852:May 17 16:49:44 arctic kernel: [drm:mod_hdcp_add_display_topology [amdgpu]] *ERROR* Failed to add display topology, DTM TA is not initialized.
917:May 17 20:12:32 arctic kernel: [drm:mod_hdcp_add_display_topology [amdgpu]] *ERROR* Failed to add display topology, DTM TA is not initialized.

Вот похожий сбой в 5.6.13:

May 18 03:41:05 arctic kernel: [drm:mod_hdcp_add_display_topology [amdgpu]] *ERROR* Failed to add display topology, DTM TA is not initialized.
May 18 03:41:05 arctic kernel: amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:40 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
May 18 03:41:05 arctic kernel: amdgpu 0000:0b:00.0:   in page starting at address 0x00000000008fc000 from client 27
May 18 03:41:05 arctic kernel: amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00041A50
May 18 03:41:05 arctic kernel: amdgpu 0000:0b:00.0:          MORE_FAULTS: 0x0
May 18 03:41:05 arctic kernel: amdgpu 0000:0b:00.0:          WALKER_ERROR: 0x0
May 18 03:41:05 arctic kernel: amdgpu 0000:0b:00.0:          PERMISSION_FAULTS: 0x5
May 18 03:41:05 arctic kernel: amdgpu 0000:0b:00.0:          MAPPING_ERROR: 0x0
May 18 03:41:05 arctic kernel: amdgpu 0000:0b:00.0:          RW: 0x1
May 18 03:41:16 arctic kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
May 18 03:41:16 arctic kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=6205, emitted seq=6208
May 18 03:41:16 arctic kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
May 18 03:41:16 arctic kernel: amdgpu 0000:0b:00.0: GPU reset begin!
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume
May 18 03:41:18 arctic kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000E10000).
May 18 03:41:18 arctic kernel: [drm] VRAM is lost due to GPU reset!
May 18 03:41:18 arctic kernel: [drm] PSP is resuming...
May 18 03:41:18 arctic kernel: [drm] reserve 0xa00000 from 0x81fe400000 for PSP TMR
May 18 03:41:18 arctic kernel: amdgpu: [powerplay] SMU is resuming...
May 18 03:41:18 arctic kernel: amdgpu: [powerplay] SMU is resumed successfully!
May 18 03:41:18 arctic kernel: [drm] kiq ring mec 2 pipe 1 q 0
May 18 03:41:18 arctic kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
May 18 03:41:18 arctic kernel: [drm] JPEG decode initialized successfully.
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring sdma0 uses VM inv eng 12 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring sdma1 uses VM inv eng 13 on hub 0
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring vcn_dec uses VM inv eng 0 on hub 1
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring vcn_enc0 uses VM inv eng 1 on hub 1
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring vcn_enc1 uses VM inv eng 4 on hub 1
May 18 03:41:18 arctic kernel: amdgpu 0000:0b:00.0: ring jpeg_dec uses VM inv eng 5 on hub 1
May 18 03:41:18 arctic kernel: [drm] recover vram bo from shadow start
May 18 03:41:18 arctic kernel: [drm] recover vram bo from shadow done
May 18 03:41:18 arctic kernel: [drm] Skip scheduling IBs!

Вот несколько журналов (из разных версий ядра, извините, я не совсем уверен какие из них из какого ядра:

https://pastebin.com/ADL0JvHB

https://pastebin.com/vSprRyRx

Я обновил ядро ​​с 5.4 до 5.5.19 до 5.6.13, и проблемы все еще присутствуют.

Вот журнал сбоев, когда дисплей просто случайно отключился (ядро 5.6.13):

May 18 02:30:57 arctic kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
May 18 02:30:57 arctic kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=167698, emitted seq=167700
May 18 02:30:57 arctic kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2090 thread Xorg:cs0 pid 2091
May 18 02:30:57 arctic kernel: amdgpu 0000:0b:00.0: GPU reset begin!
May 18 02:30:59 arctic kernel: amdgpu: [powerplay] failed send message: DisallowGfxOff (42) param: 0x00000000 response 0xffffffc2
May 18 02:31:02 arctic /usr/lib/gdm3/gdm-x-session[2090]: (II) event12 - Logitech MX Master 3000: SYN_DROPPED event - some input events have been lost.
May 18 02:31:02 arctic kernel: amdgpu: [powerplay] Msg issuing pre-check failed and SMU may be not in the right state!
May 18 02:31:02 arctic /usr/lib/gdm3/gdm-x-session[2090]: (EE) client bug: timer event12 debounce: scheduled expiry is in the past (-194ms), your system is too slow
May 18 02:31:02 arctic kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
May 18 02:31:02 arctic kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
May 18 02:31:04 arctic kernel: amdgpu: [powerplay] Msg issuing pre-check failed and SMU may be not in the right state!
May 18 02:31:04 arctic kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
May 18 02:31:07 arctic kernel: amdgpu: [powerplay] Msg issuing pre-check failed and SMU may be not in the right state! May 18 02:31:07 arctic kernel: [drm:amdgpu_device_gpu_recover.cold [amdgpu]] *ERROR* ASIC reset failed with error, -62 for drm dev, 0000:0b:00.0
May 18 02:31:07 arctic kernel: amdgpu 0000:0b:00.0: GPU reset(1) failed
May 18 02:31:07 arctic kernel: amdgpu 0000:0b:00.0: GPU reset end with ret = -62
May 18 02:31:12 arctic /usr/lib/gdm3/gdm-x-session[2090]: (EE) client bug: timer event12 debounce short: scheduled expiry is in the past (-5ms), your system is too slow May
18 02:31:17 arctic kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=167700, emitted seq=167700
May 18 02:31:17 arctic kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2090 thread Xorg:cs0 pid 2091
May 18 02:31:17 arctic kernel: amdgpu 0000:0b:00.0: GPU reset begin!

Я установил AMD_DEBUG = nodma, nongg, но это не помогает. могу обновить BIOS на моей материнской плате, хотя у меня только одна версия от самой последней версии, и она обеспечивает только «улучшения памяти». И я могу попробовать проприетарные amdgpu-pro драйверы вместо открытых -source amdgpu драйверы. Но я не могу придумать ничего другого. Я уже пробовал 3 отдельных ядра ... У кого-нибудь есть идеи?

$ glxinfo | grep "OpenGL Version"
OpenGL version string: 4.6 (Compatibility Profile) Mesa 20.0.6
1
задан 18 May 2020 в 11:44

1 ответ

Большую часть дня я пытался решить это. Я даже не буду упоминать все, что я пробовал.

Я мог запустить два монитора, но система зависала (сбой, графическая память была повреждена, полусбой, ядро ​​запаниковало — что угодно), как только я подключил третий монитор.

Ключевые строки журнала в /var/log/syslog были такими:

amdgpu: failed to write reg 28b4 wait reg 28c6 
amdgpu: failed to write reg 1a6f4 wait reg 1a706 
amdgpu: failed send message: NumOfDisplays (64)   param: 0x00000003 response 0xffffffc2 
amdgpu: Msg issuing pre-check failed and SMU may be not in the right state! [drm:amdgpu_job_timedout [amdgpu]] 
*ERROR* ring sdma0 timeout, signaled seq=3474, emitted seq=3476 [drm:amdgpu_job_timedout [amdgpu]]
*ERROR* Process information: process  pid 0 thread  pid 0 amdgpu 0000:0a:00.0: amdgpu: GPU reset begin! ...

Наконец-то я решил обновиться до последней версии ядра 5.7, которую смог найти, так как некоторым людям больше повезло с более поздней версией. ядра.

В моем случае я скачал пакеты .deb и установил их:

wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7.19/amd64/linux-headers-5.7.19-050719-generic_5.7.19-050719.202008270830_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7.19/amd64/linux-image-unsigned-5.7.19-050719-generic_5.7.19-050719.202008270830_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7.19/amd64/linux-modules-5.7.19-050719-generic_5.7.19-050719.202008270830_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7.19/amd64/linux-headers-5.7.19-050719_5.7.19-050719.202008270830_all.deb
dpkg -i linux-*.deb

Заработало сразу. Все четыре монитора работают.

Надеюсь, все это не всплывет, как только я опубликую это!

1
ответ дан 14 September 2020 в 10:56

Другие вопросы по тегам:

Похожие вопросы: