[elrepo] Kernel Crash with wireguard module
Phil Perry
phil at elrepo.org
Sat Jan 14 17:32:03 EST 2023
On 14/01/2023 21:12, Jens Kuehnel wrote:
> Hi all,
>
> I have a strange problem. I run a Server with mutiples VMs with RHEL8
> (Developer subscription) and some elrepo modules.
>
> I run wireguard with this rpm:
> kmod-wireguard-1.0.20220627-3.el8_7.elrepo.x86_64
>
> Every thing works fine with 4.18.0-425.3.1.el8.x86_64, but after update
> to kernel-4.18.0-425.10.1.el8_7.x86_64 I got after about 20 seconds:
>
> * 100% CPU load with a kworker
> * cpu soft lockup
>
> and about 30-60 second after that the system hangs, no ssh, no console,
> only ping works.
>
> The dmesg output at the end of this mail.
>
> When I disable wireguard everything works fine. So it is the wireguard
> module. I run at the moment the 425.3.1 kernel again, because wireguard
> is important.
>
> Can another recompile of the kernel help, has anyone the same problem,
> or is this a uniq problem with my hardware?
>
> Thanks for the info.
>
> Greeting from Frankfurt Germany.
> CU
> Jens Kühnel
>
>
> ------------------------------------------------------------
> [ 85.035861] wireguard: WireGuard 1.0.20220627 loaded. See
> www.wireguard.com for information.
> [ 85.035868] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld
> <Jason at zx2c4.com>. All Rights Reserved.
> [ 112.088711] watchdog: BUG: soft lockup - CPU#5 stuck for 22s!
> [kworker/5:1:81]
> [ 112.088718] Modules linked in: xt_CHECKSUM wireguard ip6_udp_tunnel
> udp_tunnel binfmt_misc br_netfilter bridge stp llc xt_physdev ipt_REJECT
> nf_reject_ipv4 nft_counter xt_LOG nf_log_syslog ip6t_REJECT
> nft_chain_nat nf_reject_ipv6 ipt_MASQUERADE nf_nat xt_conntrack
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables
> libcrc32c nfnetlink sunrpc vfat fat intel_rapl_msr intel_rapl_common
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mei_wdt
> iTCO_wdt irqbypass iTCO_vendor_support rapl intel_cstate intel_uncore
> pcspkr wmi mei_me intel_pch_thermal mei acpi_pad i2c_i801 ie31200_edac
> intel_pmc_core ext4 mbcache jbd2 dm_crypt raid1 sd_mod t10_pi sg i915
> i2c_algo_bit cec intel_gtt drm_buddy drm_dp_helper drm_kms_helper
> syscopyarea sysfillrect sysimgblt crct10dif_pclmul fb_sys_fops ttm
> crc32_pclmul crc32c_intel ahci libahci e1000e drm libata
> ghash_clmulni_intel serio_raw video dm_mirror dm_region_hash dm_log
> dm_mod ftsteutates(O) fuse
> [ 112.088835] CPU: 5 PID: 81 Comm: kworker/5:1 Tainted: G IO
> --------- - - 4.18.0-425.10.1.el8_7.x86_64 #1
> [ 112.088839] Hardware name: FUJITSU D3417-B1/D3417-B1, BIOS V5.0.0.11
> R1.28.0.SR.1 for D3417-B1x 07/25/2019
> [ 112.088842] Workqueue: events_power_efficient
> wg_ratelimiter_gc_entries [wireguard]
> [ 112.088851] RIP: 0010:native_queued_spin_lock_slowpath+0x5f/0x1c0
> [ 112.088856] Code: 71 f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2
> 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 4b 85 c0 74 0e 8b 07 84 c0 74 08 f3
> 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 e9 1e b7 aa 00 8b 37 81
> [ 112.088861] RSP: 0018:ffffb780c657be58 EFLAGS: 00000202 ORIG_RAX:
> ffffffffffffff13
> [ 112.088864] RAX: 0000000000000101 RBX: ffffffffc0f05160 RCX:
> ffffffffb07b9a40
> [ 112.088866] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffffffffc0f05fb8
> [ 112.088869] RBP: 00000013e2e12d11 R08: ffffffffb07b9ae0 R09:
> 0000746e65696369
> [ 112.088871] R10: 8080808080808080 R11: 0000000000000018 R12:
> dead000000000200
> [ 112.088873] R13: 0000000000000001 R14: ffff98bf06e6b780 R15:
> 0000000000000001
> [ 112.088876] FS: 0000000000000000(0000) GS:ffff98cdee540000(0000)
> knlGS:0000000000000000
> [ 112.088878] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 112.088881] CR2: 00007f211802e240 CR3: 00000004eec10001 CR4:
> 00000000003706e0
> [ 112.088883] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 112.088885] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 112.088888] Call Trace:
> [ 112.088890] _raw_spin_lock+0x1e/0x30
> [ 112.088894] wg_ratelimiter_gc_entries+0x49/0x170 [wireguard]
> [ 112.088901] process_one_work+0x1a7/0x360
> [ 112.088904] ? create_worker+0x1a0/0x1a0
> [ 112.088907] worker_thread+0x30/0x390
> [ 112.088909] ? create_worker+0x1a0/0x1a0
> [ 112.088911] kthread+0x10b/0x130
> [ 112.088915] ? set_kthread_struct+0x50/0x50
> [ 112.088918] ret_from_fork+0x1f/0x40
>
> Message from syslogd at vmhost at Jan 14 21:24:04 ...
> kernel:watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [kworker/5:1:81]
> [ 140.088627] watchdog: BUG: soft lockup - CPU#5 stuck for 22s!
> [kworker/5:1:81]
Hi Jens,
Thanks for the report.
I have rebuilt kmod-wireguard against the latest
4.18.0-425.10.1.el8_7.x86_64 RHEL kernel for you, and released updated
packages to the testing repository. Updated packages should be available
on our mirror sites shortly:
kmod-wireguard-1.0.20220627-4.el8_7.elrepo.x86_64.rpm
Please can you update, reboot to the latest kernel
(4.18.0-425.10.1.el8_7.x86_64) and test to see if this fixes the issue
for you.
Thanks,
Phil
More information about the elrepo
mailing list