[elrepo] kernel-debuginfo-common-x86_64-$KERNEL.x86_64.rpm kernel-debug-$KERNEL.x86_64.rpm packages?
Mark Selby
mselby at quantcast.com
Mon Aug 12 16:47:07 EDT 2013
We are running 2.6.32-279.19.1.el6.x86_64 in production.
I am not a developer, just a lowly sysadmin
A good number of our machines are panicking in a custom app with the
pasted sys and bt
>From what I can tell from many google searches there is a change that was
made to xsave.c which seem to make a few changes to the
__sanitize_i387_state function. I am not 100% sure but think these changes
prevent the application crash from crashing the kernel. There maybe other
files that are patched as well as part of this bug fix.
The main thing that I can see is the removal of the
'BUG_ON(task_thread_info(tsk)->status & TS_USEDFPU);' line.
If I knew all of the patches that we involved in this fix I would apply
them to the vendor kernel.
http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02183.html may
provide more insight
All and any help is greatly appreciated
KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6.x86_64/vmlinux
DUMPFILE: /mnt/data2/127.0.0.1-2013-08-04-21:32:45/vmcore [PARTIAL
DUMP]
CPUS: 32
DATE: Sun Aug 4 21:32:40 2013
UPTIME: 47 days, 00:08:51
LOAD AVERAGE: 30.41, 37.01, 35.09
TASKS: 2614
NODENAME: mrs4066.sea1.qc
RELEASE: 2.6.32-279.19.1.el6.x86_64
VERSION: #1 SMP Wed Dec 19 07:05:20 UTC 2012
MACHINE: x86_64 (2599 Mhz)
MEMORY: 64 GB
PANIC: "kernel BUG at arch/x86/kernel/xsave.c:45!"
PID: 30634
COMMAND: "qc-sawzall-mapp"
TASK: ffff8808572c4040 [THREAD_INFO: ffff880629972000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 30634 TASK: ffff8808572c4040 CPU: 0 COMMAND: "qc-sawzall-mapp"
#0 [ffff880629973740] machine_kexec at ffffffff81031f7b
#1 [ffff8806299737a0] crash_kexec at ffffffff810b8c22
#2 [ffff880629973870] oops_end at ffffffff814ed6b0
#3 [ffff8806299738a0] die at ffffffff8100f19b
#4 [ffff8806299738d0] do_trap at ffffffff814ecfa4
#5 [ffff880629973930] do_invalid_op at ffffffff8100cdb5
#6 [ffff8806299739d0] invalid_op at ffffffff8100be5b
[exception RIP: __sanitize_i387_state+297]
RIP: ffffffff81015a19 RSP: ffff880629973a88 RFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880801678800 RCX: 0000000000000200
RDX: ffff8808572c4040 RSI: ffffffff81bf74f8 RDI: ffff8802da5d1540
RBP: ffff880629973aa8 R8: ffff8808678f2600 R9: 0000000000000000
R10: 0000000000000200 R11: ffff88039d8ac000 R12: ffff8808678f2600
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000200
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880629973a80] __sanitize_i387_state at ffffffff810159a8
#8 [ffff880629973ab0] xfpregs_get at ffffffff81015778
#9 [ffff880629973b00] elf_core_dump at ffffffff811ccffc
#10 [ffff880629973c40] do_coredump at ffffffff8117c594
#11 [ffff880629973d90] get_signal_to_deliver at ffffffff8108428d
#12 [ffff880629973e30] do_signal at ffffffff8100a265
#13 [ffff880629973f30] do_notify_resume at ffffffff8100aa80
#14 [ffff880629973f50] int_signal at ffffffff8100b341
RIP: 00007f70e4fda885 RSP: 00007fffab935e48 RFLAGS: 00000206
RAX: 0000000000000000 RBX: 000000000085b518 RCX: ffffffffffffffff
RDX: 0000000000000006 RSI: 00000000000077aa RDI: 00000000000077aa
RBP: 000000000409f2c8 R8: 000000000000000a R9: 00007f70e7256720
R10: 0000000000000008 R11: 0000000000000206 R12: 000000000409dc70
R13: 00007fffab936060 R14: 00007fffab936100 R15: 000000000409ac08
ORIG_RAX: 00000000000000ea CS: 0033 SS: 002b
On 8/12/13 12:27 PM, "Akemi Yagi" <toracat at elrepo.org> wrote:
>On Mon, Aug 12, 2013 at 12:20 PM, Mark Selby <mselby at quantcast.com> wrote:
>> All of the stock RHEL/CentOS kernels are built using a spec file that
>> creates two debug packages that are required if you ever want to run
>>crash
>> against a kernel dump. We have a requirement to move to a 3.x kernel in
>>our
>> CentOS environment because of a kernel bug fix that has not yet been
>>back
>> ported to RHEL/CentOS branch.
>
>Can you elaborate on the bug fix you are referring to?. Wonder if this
>is something that *can* be backported to the current EL kernel without
>much difficulty. If so, that would be a better solution than running a
>mainline kernel.
>
>Akemi
>_______________________________________________
>elrepo mailing list
>elrepo at lists.elrepo.org
>http://lists.elrepo.org/mailman/listinfo/elrepo
More information about the elrepo
mailing list