[elrepo] kernel-debuginfo-common-x86_64-$KERNEL.x86_64.rpm kernel-debug-$KERNEL.x86_64.rpm packages?

Mark Selby mselby at quantcast.com
Mon Aug 12 16:47:07 EDT 2013


We are running 2.6.32-279.19.1.el6.x86_64 in production.

I am not a developer, just a lowly sysadmin

A good number of our machines are panicking in a custom app with the
pasted sys and bt

>From what I can tell from many google searches there is a change that was
made to xsave.c which seem to make a few changes to the
__sanitize_i387_state function. I am not 100% sure but think these changes
prevent the application crash from crashing the kernel. There maybe other
files that are patched as well as part of this bug fix.

The main thing that I can see is the removal of the
'BUG_ON(task_thread_info(tsk)->status & TS_USEDFPU);' line.

If I knew all of the patches that we involved in this fix I would apply
them to the vendor kernel.

http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02183.html may
provide more insight


All and any help is greatly appreciated

KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6.x86_64/vmlinux
    DUMPFILE: /mnt/data2/127.0.0.1-2013-08-04-21:32:45/vmcore  [PARTIAL
DUMP]
        CPUS: 32
        DATE: Sun Aug  4 21:32:40 2013
      UPTIME: 47 days, 00:08:51
LOAD AVERAGE: 30.41, 37.01, 35.09
       TASKS: 2614
    NODENAME: mrs4066.sea1.qc
     RELEASE: 2.6.32-279.19.1.el6.x86_64
     VERSION: #1 SMP Wed Dec 19 07:05:20 UTC 2012
     MACHINE: x86_64  (2599 Mhz)
      MEMORY: 64 GB
       PANIC: "kernel BUG at arch/x86/kernel/xsave.c:45!"
         PID: 30634
     COMMAND: "qc-sawzall-mapp"
        TASK: ffff8808572c4040  [THREAD_INFO: ffff880629972000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 30634  TASK: ffff8808572c4040  CPU: 0   COMMAND: "qc-sawzall-mapp"
 #0 [ffff880629973740] machine_kexec at ffffffff81031f7b
 #1 [ffff8806299737a0] crash_kexec at ffffffff810b8c22
 #2 [ffff880629973870] oops_end at ffffffff814ed6b0
 #3 [ffff8806299738a0] die at ffffffff8100f19b
 #4 [ffff8806299738d0] do_trap at ffffffff814ecfa4
 #5 [ffff880629973930] do_invalid_op at ffffffff8100cdb5
 #6 [ffff8806299739d0] invalid_op at ffffffff8100be5b
    [exception RIP: __sanitize_i387_state+297]
    RIP: ffffffff81015a19  RSP: ffff880629973a88  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: ffff880801678800  RCX: 0000000000000200
    RDX: ffff8808572c4040  RSI: ffffffff81bf74f8  RDI: ffff8802da5d1540
    RBP: ffff880629973aa8   R8: ffff8808678f2600   R9: 0000000000000000
    R10: 0000000000000200  R11: ffff88039d8ac000  R12: ffff8808678f2600
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000200
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff880629973a80] __sanitize_i387_state at ffffffff810159a8
 #8 [ffff880629973ab0] xfpregs_get at ffffffff81015778
 #9 [ffff880629973b00] elf_core_dump at ffffffff811ccffc
#10 [ffff880629973c40] do_coredump at ffffffff8117c594
#11 [ffff880629973d90] get_signal_to_deliver at ffffffff8108428d
#12 [ffff880629973e30] do_signal at ffffffff8100a265
#13 [ffff880629973f30] do_notify_resume at ffffffff8100aa80
#14 [ffff880629973f50] int_signal at ffffffff8100b341
    RIP: 00007f70e4fda885  RSP: 00007fffab935e48  RFLAGS: 00000206
    RAX: 0000000000000000  RBX: 000000000085b518  RCX: ffffffffffffffff
    RDX: 0000000000000006  RSI: 00000000000077aa  RDI: 00000000000077aa
    RBP: 000000000409f2c8   R8: 000000000000000a   R9: 00007f70e7256720
    R10: 0000000000000008  R11: 0000000000000206  R12: 000000000409dc70
    R13: 00007fffab936060  R14: 00007fffab936100  R15: 000000000409ac08
    ORIG_RAX: 00000000000000ea  CS: 0033  SS: 002b





On 8/12/13 12:27 PM, "Akemi Yagi" <toracat at elrepo.org> wrote:

>On Mon, Aug 12, 2013 at 12:20 PM, Mark Selby <mselby at quantcast.com> wrote:
>> All of the stock RHEL/CentOS kernels are built using a spec file that
>> creates two debug packages that are required if you ever want to run
>>crash
>> against a kernel dump. We have a requirement to move to a 3.x kernel in
>>our
>> CentOS environment because of a kernel bug fix that has not yet been
>>back
>> ported to RHEL/CentOS branch.
>
>Can you elaborate on the bug fix you are referring to?. Wonder if this
>is something that *can* be backported to the current EL kernel without
>much difficulty. If so, that would be a better solution than running a
>mainline kernel.
>
>Akemi
>_______________________________________________
>elrepo mailing list
>elrepo at lists.elrepo.org
>http://lists.elrepo.org/mailman/listinfo/elrepo



More information about the elrepo mailing list