[elrepo] hard lockups on CPU's with elrepo kernel 3.10.103 on CentOS 6

Akemi Yagi toracat at elrepo.org
Thu Oct 6 18:47:24 EDT 2016


On Thu, Oct 6, 2016 at 3:27 PM, Akemi Yagi <toracat at elrepo.org> wrote:
> On Thu, Oct 6, 2016 at 2:22 PM, Grigory Shamov
> <Grigory.Shamov at umanitoba.ca> wrote:
>> An update:
>>
>> Looks like the same issue was observed in RedHat 7 kernels, also based on
>> 3.10:
>> This pertains to perf_event_overflow error with increased
>> kernel.watchdog.thresh
>>
>> https://access.redhat.com/solutions/1354963
>>
>> ```
>> * Red Hat Enterprise Linux (RHEL) 7
>> * seen on several versions of the RHEL7 kernel (3.10.0-version.el7.x86_64)
>> * the /proc/sys/kernel/watchdog_thresh parameter is set to a higher value
>> than the default
>> * Docker
>> ```
>>
>> They report panic on Docker; we see it on normal app workload
>> (but HPC applications are long-running and use lot of memory, so they can
>> be somewhat similar to a heavily used container).
>>
>> The RedHat solution basically suggests to update to their later kernel.
>> What would one does with the Elrepo one?
>
> I'd like to track down the patch(es) Red Hat applied to fix the issue.
> It is possible that, while kernel-lt does not have the patch,
> kernel-ml may have it. At any rate the patch must be identified to
> find that out.

I now suspect the following patch was the one:

https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/9809b18fcf6b8d8ec4d3643677345907e6b50eca

It first appeared in kernel 3.12. RH backported it to 7.1/7.2 kernels.

Akemi


More information about the elrepo mailing list