[elrepo] hard lockups on CPU's with elrepo kernel 3.10.103 on CentOS 6

Akemi Yagi toracat at elrepo.org
Thu Oct 6 18:27:11 EDT 2016


On Thu, Oct 6, 2016 at 2:22 PM, Grigory Shamov
<Grigory.Shamov at umanitoba.ca> wrote:
> An update:
>
> Looks like the same issue was observed in RedHat 7 kernels, also based on
> 3.10:
> This pertains to perf_event_overflow error with increased
> kernel.watchdog.thresh
>
>
> https://access.redhat.com/solutions/1354963
>
> ```
> * Red Hat Enterprise Linux (RHEL) 7
> * seen on several versions of the RHEL7 kernel (3.10.0-version.el7.x86_64)
> * the /proc/sys/kernel/watchdog_thresh parameter is set to a higher value
> than the default
> * Docker
> ```
>
> They report panic on Docker; we see it on normal app workload
> (but HPC applications are long-running and use lot of memory, so they can
> be somewhat similar to a heavily used container).
>
> The RedHat solution basically suggests to update to their later kernel.
> What would one does with the Elrepo one?

I'd like to track down the patch(es) Red Hat applied to fix the issue.
It is possible that, while kernel-lt does not have the patch,
kernel-ml may have it. At any rate the patch must be identified to
find that out.

Akemi


More information about the elrepo mailing list