GPU исчез из списка аппаратных устройств

Очень похоже на то, что случилось с этим парнем, мой GPU исчез из списка устройств. Ни lscpi, ни lshw не показывают наличие видеокарты, и я не понимаю, в чем причина проблемы.

Информация о системе:

  • Thinkpad P1 (2nd Gen) с Ubuntu 20.04, которая была обновлена с 19.10.
  • Версия ядра 5.4.
  • GPU NVIDIA Quadro T1000.

Проблема появилась после обновления прошивки (биос и материнская плата), поэтому я попытался понизить версию, но fwupdmgr downgrade не выдал никаких доступных вариантов понижения. Затем я попытался загрузиться с Ubuntu live, и там я смог обнаружить GPU, и он, казалось, работал просто отлично. Поэтому проблема не может быть связана с типом прошивки.

Я попытался найти полезные подсказки об ошибках, используя journalctl, dmesg и /var/log/syslog, без особого успеха. Если вы думаете, что они могут быть полезны, я могу приложить журнал.

Я не знаю, проблема ли это с модулем ядра или что-то еще. Мне кажется, я перепробовал все, что мог, но безуспешно. Я даже очистил и удалил все драйверы nvidia и библиотеки, связанные с nvidia, но это не может быть причиной того, что даже lshw не показывает устройство...

P.S. Использование nvida-installer для попытки принудительной установки драйверов также не сработало.



EDIT: Найдено новое понимание.

Копаясь в журнале, я нашел вот эту строчку:

kernel: nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug

(полный вывод ниже [1])

Поэтому я отключил и занес в черный список драйвер nouveau, чтобы посмотреть, может ли он что-то испортить, но без особого успеха. Новый вывод не содержит никаких сообщений об отключении 0000:01:00.0 (это видеокарта), а только гласит

kernel: pci 0000:01:00.0: Removing from iommu group 1

(полный вывод ниже [2])

Но я понятия не имею, что это значит, и не нашел ничего полезного в интернете.


Запросил в комментарии:

:~$ lspci -k | grep -EA3 'VGA|3D|Display'
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)
Subsystem: Lenovo UHD Graphics 630 (Mobile)
Kernel driver in use: i915
Kernel modules: i915

Второй запрос:

:~$ ls -al /dev | grep nvidia
crw-rw-rw-   1 root     root    195,     0 lug  2 19:40 nvidia0
crw-rw-rw-   1 root     root    195,   255 lug  2 19:40 nvidiactl
crw-rw-rw-   1 root     root    195,   254 lug  2 19:40 nvidia-modeset
crw-rw-rw-   1 root     root    235,     0 lug  2 19:40 nvidia-uvm
crw-rw-rw-   1 root     root    235,     1 lug  2 19:40 nvidia-uvm-tools

:~$ grep 125 /etc/group
colord:x:125:

Третий запрос: вывод /proc/cmdline

BOOT_IMAGE=/vmlinuz-5.4.0-40-generic root=/dev/mapper/vgubuntu-root ro quiet nosplash noacpi noapic pcie_port_pm=off

Надо уточнить, что наличие или отсутствие любой из последних трех опций не дало никакой разницы, кроме увеличения использования вентилятора.

Четвертый запрос:


----:~$ grep -r nvidia /lib/udev/rules.d/ /etc/udev/rules.d
/lib/udev/rules.d/61-gdm.rules:# disable Wayland when using the proprietary nvidia driver
/lib/udev/rules.d/61-gdm.rules:DRIVER=="nvidia", RUN+="/usr/lib/gdm3/gdm-disable-wayland"
/lib/udev/rules.d/71-nvidia.rules:SUBSYSTEM=="pci", ATTRS{vendor}=="0x10de", DRIVERS=="nvidia", TAG+="seat", TAG+="master-of-seat"
/lib/udev/rules.d/71-nvidia.rules:# Start and stop nvidia-persistenced on power on and power off
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", TAG+="systemd", ENV{SYSTEMD_WANTS}="nvidia-persistenced.service"
/lib/udev/rules.d/71-nvidia.rules:# Load and unload nvidia-modeset module
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe nvidia-modeset"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="remove", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe -r nvidia-modeset"
/lib/udev/rules.d/71-nvidia.rules:# Load and unload nvidia-drm module
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe nvidia-drm"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="remove", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe -r nvidia-drm"
/lib/udev/rules.d/71-nvidia.rules:# Load and unload nvidia-uvm module
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe nvidia-uvm"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="remove", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe -r nvidia-uvm"
/lib/udev/rules.d/71-nvidia.rules:# This will create the device nvidia device nodes
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/ub-device-create"
/lib/udev/rules.d/71-nvidia.rules:# Create the device node for the nvidia-uvm module
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/module/nvidia_uvm", SUBSYSTEM=="module", RUN+="/sbin/ub-device-create"
/lib/udev/rules.d/71-u-d-c-gpu-detection.rules:ACTION=="add", SUBSYSTEMS=="pci", DRIVERS=="nvidia", RUN+="/bin/touch /run/u-d-c-nvidia-was-loaded"


[1] вывод с драйвером nouveau

----:~$ cat journalctl -b | grep -i '01:00\|nvidia\|nouveau'
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: [10de:1fb9] type 00 class 0x030000
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x10: [mem 0xed000000-0xedffffff]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x24: [io  0x2000-0x207f]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
giu 22 19:21:28 ----- kernel: pci 0000:01:00.1: [10de:10fa] type 00 class 0x040300
giu 22 19:21:28 ----- kernel: pci 0000:01:00.1: reg 0x10: [mem 0xee000000-0xee003fff]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: vgaarb: bridge control possible
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: BAR 6: assigned [mem 0xee080000-0xee0fffff pref]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: Adding to iommu group 1
giu 22 19:21:28 ----- kernel: pci 0000:01:00.1: Adding to iommu group 1
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
giu 22 19:21:28 ----- kernel: nouveau: detected PR support, will not use DSM
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: enabling device (0002 -> 0003)
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: NVIDIA TU117 (167000a1)
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: bios: version 90.17.31.00.23
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: fb: 4096 MiB GDDR5
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: BIT table 'A' not found
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: BIT table 'L' not found
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB version 4.1
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f66 04600020
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f52 00020010
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f36 04600010
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB outp 03: 01033f46 04600020
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00001248
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB conn 03: 00002348
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: unknown connector type 48
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: unknown connector type 48
giu 22 19:21:28 ----- kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug
giu 22 19:21:30 ----- kernel: snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
giu 22 19:21:30 ----- kernel: snd_hda_intel 0000:01:00.1: Disabling MSI
giu 22 19:21:30 ----- kernel: snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
giu 22 19:21:30 ----- kernel: pci 0000:01:00.0: Removing from iommu group 1
giu 22 19:21:31 ----- kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input20
giu 22 19:21:31 ----- kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input21
giu 22 19:21:31 ----- kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input22
giu 22 19:21:31 ----- kernel: pci 0000:01:00.1: Removing from iommu group 1
giu 22 19:21:31 ----- systemd-udevd[761]: controlC1: /usr/lib/udev/rules.d/78-sound-card.rules:5 Failed to write ATTR{/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/controlC1/../uevent}, ignoring: No such file or directory
giu 22 19:21:31 ----- audit[1241]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1241 comm="apparmor_parser"
giu 22 19:21:31 ----- audit[1241]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1241 comm="apparmor_parser"
giu 22 19:21:31 ----- kernel: audit: type=1400 audit(1592846491.687:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1241 comm="apparmor_parser"
giu 22 19:21:31 ----- kernel: audit: type=1400 audit(1592846491.687:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1241 comm="apparmor_parser"

[2] вывод без драйвера nouveau


-----:~$ journalctl -b | grep -i '01:00\|nvidia\|nouveau'
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: [10de:1fb9] type 00 class 0x030000
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x10: [mem 0xed000000-0xedffffff]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x24: [io  0x2000-0x207f]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: [10de:10fa] type 00 class 0x040300
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: reg 0x10: [mem 0xee000000-0xee003fff]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: vgaarb: bridge control possible
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: BAR 6: assigned [mem 0xee080000-0xee0fffff pref]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: Adding to iommu group 1
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: Adding to iommu group 1
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: Removing from iommu group 1
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: Removing from iommu group 1
giu 23 08:15:49 ----- kernel: nvidia: module license 'NVIDIA' taints kernel.
giu 23 08:15:49 ----- kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
giu 23 08:15:49 ----- kernel: NVRM: No NVIDIA graphics adapter found!
giu 23 08:15:49 ----- kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
giu 23 08:15:50 ----- kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
giu 23 08:15:50 ----- kernel: NVRM: No NVIDIA graphics adapter found!
giu 23 08:15:50 ----- kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
giu 23 08:15:52 ----- audit[1086]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1086 comm="apparmor_parser"
giu 23 08:15:52 ----- audit[1086]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1086 comm="apparmor_parser"
giu 23 08:15:52 ----- kernel: audit: type=1400 audit(1592892952.070:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1086 comm="apparmor_parser"
giu 23 08:15:52 ----- kernel: audit: type=1400 audit(1592892952.070:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1086 comm="apparmor_parser"

Еще один вывод journalctl: выполнен после ручной установки всего ПО nvidia (и последнего драйвера), плюс использование в качестве опции загрузки pcie_port_pm=off (как предложил @nobody)

-----:~$ journalctl -b | grep -i '01:00\|nvidia\|nouveau'
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: [10de:1fb9] type 00 class 0x030000
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x10: [mem 0xed000000-0xedffffff]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x24: [io  0x2000-0x207f]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: [10de:10fa] type 00 class 0x040300
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: reg 0x10: [mem 0xee000000-0xee003fff]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: vgaarb: bridge control possible
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: BAR 6: assigned [mem 0xee080000-0xee0fffff pref]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: Adding to iommu group 1
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: Adding to iommu group 1
lug 02 09:47:59 ----- kernel: nvidia: loading out-of-tree module taints kernel.
lug 02 09:47:59 ----- kernel: nvidia: module license 'NVIDIA' taints kernel.
lug 02 09:47:59 ----- kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
lug 02 09:47:59 ----- kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237
lug 02 09:47:59 ----- kernel: nvidia 0000:01:00.0: enabling device (0002 -> 0003)
lug 02 09:47:59 ----- kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
lug 02 09:47:59 ----- kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020
lug 02 09:47:59 ----- kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  440.100  Fri May 29 08:14:04 UTC 2020
lug 02 09:47:59 ----- kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
lug 02 09:47:59 ----- kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
lug 02 09:47:59 ----- kernel: nvidia-uvm: Loaded the UVM driver, major device number 235.
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: Removing from iommu group 1
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: Removing from iommu group 1
lug 02 09:48:02 ----- kernel: audit: type=1400 audit(1593676082.870:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1113 comm="apparmor_parser"
lug 02 09:48:02 ----- kernel: audit: type=1400 audit(1593676082.870:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1113 comm="apparmor_parser"
lug 02 09:48:02 ----- audit[1113]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1113 comm="apparmor_parser"
lug 02 09:48:02 ----- audit[1113]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1113 comm="apparmor_parser"
lug 02 09:48:03 ----- systemd[1]: Starting NVIDIA Persistence Daemon...
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Verbose syslog connection opened
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Now running with user ID 125 and group ID 132
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Started (1231)
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 125 has read and write permissions for those files.
lug 02 09:48:03 ----- nvidia-persistenced[1231]: PID file unlocked.
lug 02 09:48:03 ----- nvidia-persistenced[1228]: nvidia-persistenced failed to initialize. Check syslog for more details.
lug 02 09:48:03 ----- nvidia-persistenced[1231]: PID file closed.
lug 02 09:48:03 ----- nvidia-persistenced[1231]: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Shutdown (1231)
lug 02 09:48:03 ----- systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE
lug 02 09:48:03 ----- systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
lug 02 09:48:03 ----- systemd[1]: Failed to start NVIDIA Persistence Daemon.
3
задан 18 June 2020 в 10:30

0 ответов

Другие вопросы по тегам:

Похожие вопросы: