[elrepo] kernel-debuginfo-common-x86_64-$KERNEL.x86_64.rpm kernel-debug-$KERNEL.x86_64.rpm packages?
Pat Riehecky
riehecky at fnal.gov
Tue Aug 13 10:09:14 EDT 2013
Hi Mark,
I'm not sure what the overall impact of those patches is (the patch set from
Suresh Siddha is a total of 4 emails in that thread). I can assure you that
the latest 6x kernel (2.6.32-358.14.1.el6) does not have those patches applied.
I would encourage you to open a bug with the folks maintaining the 6x kernel.
They may be able to more accurately assess the risks of merging that patch set.
Pat
On 08/12/2013 03:47 PM, Mark Selby wrote:
> We are running 2.6.32-279.19.1.el6.x86_64 in production.
>
> I am not a developer, just a lowly sysadmin
>
> A good number of our machines are panicking in a custom app with the
> pasted sys and bt
>
> From what I can tell from many google searches there is a change that was
> made to xsave.c which seem to make a few changes to the
> __sanitize_i387_state function. I am not 100% sure but think these changes
> prevent the application crash from crashing the kernel. There maybe other
> files that are patched as well as part of this bug fix.
>
> The main thing that I can see is the removal of the
> 'BUG_ON(task_thread_info(tsk)->status & TS_USEDFPU);' line.
>
> If I knew all of the patches that we involved in this fix I would apply
> them to the vendor kernel.
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02183.html may
> provide more insight
>
>
> All and any help is greatly appreciated
>
> KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6.x86_64/vmlinux
> DUMPFILE: /mnt/data2/127.0.0.1-2013-08-04-21:32:45/vmcore [PARTIAL
> DUMP]
> CPUS: 32
> DATE: Sun Aug 4 21:32:40 2013
> UPTIME: 47 days, 00:08:51
> LOAD AVERAGE: 30.41, 37.01, 35.09
> TASKS: 2614
> NODENAME: mrs4066.sea1.qc
> RELEASE: 2.6.32-279.19.1.el6.x86_64
> VERSION: #1 SMP Wed Dec 19 07:05:20 UTC 2012
> MACHINE: x86_64 (2599 Mhz)
> MEMORY: 64 GB
> PANIC: "kernel BUG at arch/x86/kernel/xsave.c:45!"
> PID: 30634
> COMMAND: "qc-sawzall-mapp"
> TASK: ffff8808572c4040 [THREAD_INFO: ffff880629972000]
> CPU: 0
> STATE: TASK_RUNNING (PANIC)
>
> crash> bt
> PID: 30634 TASK: ffff8808572c4040 CPU: 0 COMMAND: "qc-sawzall-mapp"
> #0 [ffff880629973740] machine_kexec at ffffffff81031f7b
> #1 [ffff8806299737a0] crash_kexec at ffffffff810b8c22
> #2 [ffff880629973870] oops_end at ffffffff814ed6b0
> #3 [ffff8806299738a0] die at ffffffff8100f19b
> #4 [ffff8806299738d0] do_trap at ffffffff814ecfa4
> #5 [ffff880629973930] do_invalid_op at ffffffff8100cdb5
> #6 [ffff8806299739d0] invalid_op at ffffffff8100be5b
> [exception RIP: __sanitize_i387_state+297]
> RIP: ffffffff81015a19 RSP: ffff880629973a88 RFLAGS: 00010202
> RAX: 0000000000000001 RBX: ffff880801678800 RCX: 0000000000000200
> RDX: ffff8808572c4040 RSI: ffffffff81bf74f8 RDI: ffff8802da5d1540
> RBP: ffff880629973aa8 R8: ffff8808678f2600 R9: 0000000000000000
> R10: 0000000000000200 R11: ffff88039d8ac000 R12: ffff8808678f2600
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000200
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #7 [ffff880629973a80] __sanitize_i387_state at ffffffff810159a8
> #8 [ffff880629973ab0] xfpregs_get at ffffffff81015778
> #9 [ffff880629973b00] elf_core_dump at ffffffff811ccffc
> #10 [ffff880629973c40] do_coredump at ffffffff8117c594
> #11 [ffff880629973d90] get_signal_to_deliver at ffffffff8108428d
> #12 [ffff880629973e30] do_signal at ffffffff8100a265
> #13 [ffff880629973f30] do_notify_resume at ffffffff8100aa80
> #14 [ffff880629973f50] int_signal at ffffffff8100b341
> RIP: 00007f70e4fda885 RSP: 00007fffab935e48 RFLAGS: 00000206
> RAX: 0000000000000000 RBX: 000000000085b518 RCX: ffffffffffffffff
> RDX: 0000000000000006 RSI: 00000000000077aa RDI: 00000000000077aa
> RBP: 000000000409f2c8 R8: 000000000000000a R9: 00007f70e7256720
> R10: 0000000000000008 R11: 0000000000000206 R12: 000000000409dc70
> R13: 00007fffab936060 R14: 00007fffab936100 R15: 000000000409ac08
> ORIG_RAX: 00000000000000ea CS: 0033 SS: 002b
>
>
>
>
>
> On 8/12/13 12:27 PM, "Akemi Yagi" <toracat at elrepo.org> wrote:
>
>> On Mon, Aug 12, 2013 at 12:20 PM, Mark Selby <mselby at quantcast.com> wrote:
>>> All of the stock RHEL/CentOS kernels are built using a spec file that
>>> creates two debug packages that are required if you ever want to run
>>> crash
>>> against a kernel dump. We have a requirement to move to a 3.x kernel in
>>> our
>>> CentOS environment because of a kernel bug fix that has not yet been
>>> back
>>> ported to RHEL/CentOS branch.
>> Can you elaborate on the bug fix you are referring to?. Wonder if this
>> is something that *can* be backported to the current EL kernel without
>> much difficulty. If so, that would be a better solution than running a
>> mainline kernel.
>>
>> Akemi
>> _______________________________________________
>> elrepo mailing list
>> elrepo at lists.elrepo.org
>> http://lists.elrepo.org/mailman/listinfo/elrepo
> _______________________________________________
> elrepo mailing list
> elrepo at lists.elrepo.org
> http://lists.elrepo.org/mailman/listinfo/elrepo
--
Pat Riehecky
Scientific Linux developer
http://www.scientificlinux.org/
More information about the elrepo
mailing list