[elrepo] kernel-debuginfo-common-x86_64-$KERNEL.x86_64.rpm kernel-debug-$KERNEL.x86_64.rpm packages?

Tue Aug 13 10:09:14 EDT 2013

Hi Mark,

I'm not sure what the overall impact of those patches is (the patch set from 
Suresh Siddha is a total of 4 emails in that thread).  I can assure you that 
the latest 6x kernel (2.6.32-358.14.1.el6) does not have those patches applied.

I would encourage you to open a bug with the folks maintaining the 6x kernel.  
They may be able to more accurately assess the risks of merging that patch set.

Pat

On 08/12/2013 03:47 PM, Mark Selby wrote:
> We are running 2.6.32-279.19.1.el6.x86_64 in production.
>
> I am not a developer, just a lowly sysadmin
>
> A good number of our machines are panicking in a custom app with the
> pasted sys and bt
>
>  From what I can tell from many google searches there is a change that was
> made to xsave.c which seem to make a few changes to the
> __sanitize_i387_state function. I am not 100% sure but think these changes
> prevent the application crash from crashing the kernel. There maybe other
> files that are patched as well as part of this bug fix.
>
> The main thing that I can see is the removal of the
> 'BUG_ON(task_thread_info(tsk)->status & TS_USEDFPU);' line.
>
> If I knew all of the patches that we involved in this fix I would apply
> them to the vendor kernel.
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02183.html may
> provide more insight
>
>
> All and any help is greatly appreciated
>
> KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6.x86_64/vmlinux
>      DUMPFILE: /mnt/data2/127.0.0.1-2013-08-04-21:32:45/vmcore  [PARTIAL
> DUMP]
>          CPUS: 32
>          DATE: Sun Aug  4 21:32:40 2013
>        UPTIME: 47 days, 00:08:51
> LOAD AVERAGE: 30.41, 37.01, 35.09
>         TASKS: 2614
>      NODENAME: mrs4066.sea1.qc
>       RELEASE: 2.6.32-279.19.1.el6.x86_64
>       VERSION: #1 SMP Wed Dec 19 07:05:20 UTC 2012
>       MACHINE: x86_64  (2599 Mhz)
>        MEMORY: 64 GB
>         PANIC: "kernel BUG at arch/x86/kernel/xsave.c:45!"
>           PID: 30634
>       COMMAND: "qc-sawzall-mapp"
>          TASK: ffff8808572c4040  [THREAD_INFO: ffff880629972000]
>           CPU: 0
>         STATE: TASK_RUNNING (PANIC)
>
> crash> bt
> PID: 30634  TASK: ffff8808572c4040  CPU: 0   COMMAND: "qc-sawzall-mapp"
>   #0 [ffff880629973740] machine_kexec at ffffffff81031f7b
>   #1 [ffff8806299737a0] crash_kexec at ffffffff810b8c22
>   #2 [ffff880629973870] oops_end at ffffffff814ed6b0
>   #3 [ffff8806299738a0] die at ffffffff8100f19b
>   #4 [ffff8806299738d0] do_trap at ffffffff814ecfa4
>   #5 [ffff880629973930] do_invalid_op at ffffffff8100cdb5
>   #6 [ffff8806299739d0] invalid_op at ffffffff8100be5b
>      [exception RIP: __sanitize_i387_state+297]
>      RIP: ffffffff81015a19  RSP: ffff880629973a88  RFLAGS: 00010202
>      RAX: 0000000000000001  RBX: ffff880801678800  RCX: 0000000000000200
>      RDX: ffff8808572c4040  RSI: ffffffff81bf74f8  RDI: ffff8802da5d1540
>      RBP: ffff880629973aa8   R8: ffff8808678f2600   R9: 0000000000000000
>      R10: 0000000000000200  R11: ffff88039d8ac000  R12: ffff8808678f2600
>      R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000200
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #7 [ffff880629973a80] __sanitize_i387_state at ffffffff810159a8
>   #8 [ffff880629973ab0] xfpregs_get at ffffffff81015778
>   #9 [ffff880629973b00] elf_core_dump at ffffffff811ccffc
> #10 [ffff880629973c40] do_coredump at ffffffff8117c594
> #11 [ffff880629973d90] get_signal_to_deliver at ffffffff8108428d
> #12 [ffff880629973e30] do_signal at ffffffff8100a265
> #13 [ffff880629973f30] do_notify_resume at ffffffff8100aa80
> #14 [ffff880629973f50] int_signal at ffffffff8100b341
>      RIP: 00007f70e4fda885  RSP: 00007fffab935e48  RFLAGS: 00000206
>      RAX: 0000000000000000  RBX: 000000000085b518  RCX: ffffffffffffffff
>      RDX: 0000000000000006  RSI: 00000000000077aa  RDI: 00000000000077aa
>      RBP: 000000000409f2c8   R8: 000000000000000a   R9: 00007f70e7256720
>      R10: 0000000000000008  R11: 0000000000000206  R12: 000000000409dc70
>      R13: 00007fffab936060  R14: 00007fffab936100  R15: 000000000409ac08
>      ORIG_RAX: 00000000000000ea  CS: 0033  SS: 002b
>
>
>
>
>
> On 8/12/13 12:27 PM, "Akemi Yagi" <toracat at elrepo.org> wrote:
>
>> On Mon, Aug 12, 2013 at 12:20 PM, Mark Selby <mselby at quantcast.com> wrote:
>>> All of the stock RHEL/CentOS kernels are built using a spec file that
>>> creates two debug packages that are required if you ever want to run
>>> crash
>>> against a kernel dump. We have a requirement to move to a 3.x kernel in
>>> our
>>> CentOS environment because of a kernel bug fix that has not yet been
>>> back
>>> ported to RHEL/CentOS branch.
>> Can you elaborate on the bug fix you are referring to?. Wonder if this
>> is something that *can* be backported to the current EL kernel without
>> much difficulty. If so, that would be a better solution than running a
>> mainline kernel.
>>
>> Akemi
>> _______________________________________________
>> elrepo mailing list
>> elrepo at lists.elrepo.org
>> http://lists.elrepo.org/mailman/listinfo/elrepo
> _______________________________________________
> elrepo mailing list
> elrepo at lists.elrepo.org
> http://lists.elrepo.org/mailman/listinfo/elrepo

-- 
Pat Riehecky

Scientific Linux developer
http://www.scientificlinux.org/