[elrepo] Kernel Crash with wireguard module

Sat Jan 14 17:32:03 EST 2023

On 14/01/2023 21:12, Jens Kuehnel wrote:
> Hi all,
> 
> I have a strange problem. I run a Server with mutiples VMs with RHEL8 
> (Developer subscription) and some elrepo modules.
> 
> I run wireguard with this rpm:
> kmod-wireguard-1.0.20220627-3.el8_7.elrepo.x86_64
> 
> Every thing works fine with 4.18.0-425.3.1.el8.x86_64, but after update 
> to kernel-4.18.0-425.10.1.el8_7.x86_64 I got after about 20 seconds:
> 
> * 100% CPU load with a kworker
> * cpu soft lockup
> 
> and about 30-60 second after that the system hangs, no ssh, no console, 
> only ping works.
> 
> The dmesg output at the end of this mail.
> 
> When I disable wireguard everything works fine. So it is the wireguard 
> module. I run at the moment the 425.3.1 kernel again, because wireguard 
> is important.
> 
> Can another recompile of the kernel help, has anyone the same problem, 
> or is this a uniq problem with my hardware?
> 
> Thanks for the info.
> 
> Greeting from Frankfurt Germany.
> CU
> Jens Kühnel
> 
> 
> ------------------------------------------------------------
> [   85.035861] wireguard: WireGuard 1.0.20220627 loaded. See 
> www.wireguard.com for information.
> [   85.035868] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld 
> <Jason at zx2c4.com>. All Rights Reserved.
> [  112.088711] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! 
> [kworker/5:1:81]
> [  112.088718] Modules linked in: xt_CHECKSUM wireguard ip6_udp_tunnel 
> udp_tunnel binfmt_misc br_netfilter bridge stp llc xt_physdev ipt_REJECT 
> nf_reject_ipv4 nft_counter xt_LOG nf_log_syslog ip6t_REJECT 
> nft_chain_nat nf_reject_ipv6 ipt_MASQUERADE nf_nat xt_conntrack 
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables 
> libcrc32c nfnetlink sunrpc vfat fat intel_rapl_msr intel_rapl_common 
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mei_wdt 
> iTCO_wdt irqbypass iTCO_vendor_support rapl intel_cstate intel_uncore 
> pcspkr wmi mei_me intel_pch_thermal mei acpi_pad i2c_i801 ie31200_edac 
> intel_pmc_core ext4 mbcache jbd2 dm_crypt raid1 sd_mod t10_pi sg i915 
> i2c_algo_bit cec intel_gtt drm_buddy drm_dp_helper drm_kms_helper 
> syscopyarea sysfillrect sysimgblt crct10dif_pclmul fb_sys_fops ttm 
> crc32_pclmul crc32c_intel ahci libahci e1000e drm libata 
> ghash_clmulni_intel serio_raw video dm_mirror dm_region_hash dm_log 
> dm_mod ftsteutates(O) fuse
> [  112.088835] CPU: 5 PID: 81 Comm: kworker/5:1 Tainted: G          IO   
> --------- -  - 4.18.0-425.10.1.el8_7.x86_64 #1
> [  112.088839] Hardware name: FUJITSU D3417-B1/D3417-B1, BIOS V5.0.0.11 
> R1.28.0.SR.1 for D3417-B1x               07/25/2019
> [  112.088842] Workqueue: events_power_efficient 
> wg_ratelimiter_gc_entries [wireguard]
> [  112.088851] RIP: 0010:native_queued_spin_lock_slowpath+0x5f/0x1c0
> [  112.088856] Code: 71 f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 
> 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 4b 85 c0 74 0e 8b 07 84 c0 74 08 f3 
> 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 e9 1e b7 aa 00 8b 37 81
> [  112.088861] RSP: 0018:ffffb780c657be58 EFLAGS: 00000202 ORIG_RAX: 
> ffffffffffffff13
> [  112.088864] RAX: 0000000000000101 RBX: ffffffffc0f05160 RCX: 
> ffffffffb07b9a40
> [  112.088866] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> ffffffffc0f05fb8
> [  112.088869] RBP: 00000013e2e12d11 R08: ffffffffb07b9ae0 R09: 
> 0000746e65696369
> [  112.088871] R10: 8080808080808080 R11: 0000000000000018 R12: 
> dead000000000200
> [  112.088873] R13: 0000000000000001 R14: ffff98bf06e6b780 R15: 
> 0000000000000001
> [  112.088876] FS:  0000000000000000(0000) GS:ffff98cdee540000(0000) 
> knlGS:0000000000000000
> [  112.088878] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  112.088881] CR2: 00007f211802e240 CR3: 00000004eec10001 CR4: 
> 00000000003706e0
> [  112.088883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [  112.088885] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000400
> [  112.088888] Call Trace:
> [  112.088890]  _raw_spin_lock+0x1e/0x30
> [  112.088894]  wg_ratelimiter_gc_entries+0x49/0x170 [wireguard]
> [  112.088901]  process_one_work+0x1a7/0x360
> [  112.088904]  ? create_worker+0x1a0/0x1a0
> [  112.088907]  worker_thread+0x30/0x390
> [  112.088909]  ? create_worker+0x1a0/0x1a0
> [  112.088911]  kthread+0x10b/0x130
> [  112.088915]  ? set_kthread_struct+0x50/0x50
> [  112.088918]  ret_from_fork+0x1f/0x40
> 
> Message from syslogd at vmhost at Jan 14 21:24:04 ...
>   kernel:watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [kworker/5:1:81]
> [  140.088627] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! 
> [kworker/5:1:81]

Hi Jens,

Thanks for the report.

I have rebuilt kmod-wireguard against the latest 
4.18.0-425.10.1.el8_7.x86_64 RHEL kernel for you, and released updated 
packages to the testing repository. Updated packages should be available 
on our mirror sites shortly:

kmod-wireguard-1.0.20220627-4.el8_7.elrepo.x86_64.rpm

Please can you update, reboot to the latest kernel 
(4.18.0-425.10.1.el8_7.x86_64) and test to see if this fixes the issue 
for you.

Thanks,

Phil