Random Freezes in Kubuntu 20.04

It's my first time that I've been facing problems with Ubuntu on my machine, recently I've changed the SSD of my PC with a brand new one, it's working really well in Windows and the firmware is up to date.

Hardware

  • Kingston A200 NVME 500Gb (BTRS and XFS)
  • Hybrid Graphics(Intel HD 530, NVIDIA GeForce GTX 950M)

Software

  • Nvidia Driver 440 (From the Official Repositories, Prime Profile: On Demand)
  • Cuda Driver (From the official repository)

Sometimes, I'm using my laptop and Kwin stop working, I can't open the Application Launcher but I can change the window via the Alt + Tab key but after few seconds, the screen is completly frozen, I can't control the mouse, the temperature starts to increase, I can't switch to another console to check the error(Control + Alt + F2) and I can only restart my PC with the Magic SysRq key + REISUB.

Relevant info of my system:

Bios Version

sudo dmidecode -s bios-version
E5CN63WW

RAM and SWAP Data:

free -h
              total        used        free      shared  buff/cache   available
Mem:           15Gi       3,9Gi       7,0Gi       1,3Gi       4,5Gi        10Gi
Swap:         3,8Gi       1,8Gi       2,0Gi

Swapiness

sysctl vm.swappiness
vm.swappiness = 60

The system journal journalctl -k -b -1 (for me) didn't show anything relevant, but I'm attaching the messages with warning or alerts below, just in case I'm forgotting something

First log

aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 125: no longer affine to CPU1
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 140: no longer affine to CPU4
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 124: no longer affine to CPU6
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 128: no longer affine to CPU6
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 138: no longer affine to CPU7
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI: button: The lid device is not compliant to SW_LID.
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 20:49:23 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 20:49:29 josejacomeb-Lenovo-ideapad-700-15ISK kernel: kauditd_printk_skb: 43 callbacks suppressed
aug 11 20:49:55 josejacomeb-Lenovo-ideapad-700-15ISK kernel: xfs filesystem being remounted at /run/systemd/unit-root/var/cache/private/fwupdmgr supports timestamps until 2038 (0x7fffffff)

Second log

aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: [Firmware Bug]: TPM Final Events table missing or invalid
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel:  #5 #6 #7
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PPC], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PCT], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PSS], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.LPSS], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.TPSS], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.PSDF], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PSD], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.HPSD], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.SPSD], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform MSFT0101:00: failed to claim resource 1: [mem 0xfed40000-0xfed40fff]
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: acpi MSFT0101:00: platform device creation failed: -16
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: usb: port power management may be unreliable
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: EISA: Cannot allocate resource for mainboard
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 1
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 2
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 3
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 4
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 5
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 6
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 7
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 8
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvme nvme0: missing or invalid SUBNQN field.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: xfs filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: asus_wmi: ASUS Management GUID not found
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Realtek Extended Controls Unit was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Extension 4 was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Processing 2 was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Camera 1 was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvidia: loading out-of-tree module taints kernel.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvidia: module license 'NVIDIA' taints kernel.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Disabling lock debugging due to kernel taint
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: thermal thermal_zone3: failed to read out thermal zone (-61)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 21:02:01 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 21:02:01 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)

Third log

aug 11 21:44:10 josejacomeb-Lenovo-ideapad-700-15ISK kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020
aug 11 21:44:10 josejacomeb-Lenovo-ideapad-700-15ISK kernel: thermal thermal_zone3: failed to read out thermal zone (-61)
aug 11 21:44:10 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 21:44:13 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 21:44:13 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)
aug 11 22:21:25 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 22:21:26 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 22:22:31 josejacomeb-Lenovo-ideapad-700-15ISK kernel: kauditd_printk_skb: 37 callbacks suppressed

Thanks for reading. What have I done wrong? Any comment is really appreciated!

Regards

0
задан 12 August 2020 в 11:55

2 ответа

Проблема была связана с функциями SSD, автономные переходы между состояниями питания (APST) вызывали зависания. Чтобы смягчить это, пока они не выпустят исправление, включите строку nvme_core.default_ps_max_latency_us=0 в опции GRUB_CMDLINE_LINUX_DEFAULT. Например:

GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=0"
GRUB_CMDLINE_LINUX=""
3
ответ дан 10 September 2020 в 08:25

BIOS

Your BIOS is current at version E5CN63WW.

MDS

You have MDS and TAA errors:

aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.

Mitigation control on the kernel command line

The kernel command line allows to control the MDS mitigations at boot time with the option “mds=”. The valid arguments for this option are:

full    

If the CPU is vulnerable, enable all available mitigations for the MDS vulnerability, CPU buffer clearing on exit to userspace and when entering a VM. Idle transitions are protected as well if SMT is enabled.

It does not automatically disable SMT.

full,nosmt

The same as mds=full, with SMT disabled on vulnerable CPUs. This is the complete mitigation.

off

Disables MDS mitigations completely.


sudo -H gedit /etc/default/grub

Change:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

To:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash mds=full,nosmt"

Save the file and quit gedit.

sudo update-grub

reboot

Note: Understand that you'll take a HUGE performance hit on multi-cpu or multi-core configurations.

Note: If the performance hit is too great, try mds=full instead of mds=full,nosmt.

NVMe

Kingston A200 NVME 500Gb

You may have a firmware problem:

aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvme nvme0: missing or invalid SUBNQN field.

Go to the manufacturer's web site and check for newer firmware.

TPM

You have TPM errors:

aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform MSFT0101:00: failed to claim resource 1: [mem 0xfed40000-0xfed40fff]
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: acpi MSFT0101:00: platform device creation failed: -16

Check to make sure your Software Updates are current, and that you're running the latest kernel.

Check your BIOS for a TPM setting, and disable TPM if possible.

memory

Your swap and vm.swappiness settings look fine.

Go to https://www.memtest86.com/ and download/run their free memtest to test your memory. Get at least one complete pass of all the 4/4 tests to confirm good memory. This may take many hours to complete.

Nvidia

You have Nvidia driver 440. There's a newer version available, 450.57, and it can be downloaded here.

enter image description here

enter image description here

0
ответ дан 21 August 2020 в 08:03

Другие вопросы по тегам:

Похожие вопросы: