[elrepo] RAID 0 bug

Thu Oct 29 17:42:42 EDT 2015

We hit this exception this week while running 
kernel-ml-4.0.5-1.el7.elrepo.x86_64:

Oct 28 16:32:56 localhost kernel: WARNING: CPU: 21 PID: 619 at mm/backing-dev.c:372 bdi_unregister+0x36/0x40()
Oct 28 16:32:56 localhost kernel: Modules linked in: bridge(E) stp(E) llc(E) bonding(E) binfmt_misc(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) iTCO_wdt(E) iTCO_vendor_support(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E) lrw(E) gf128mul(E) glue_helper(E) sb_edac(E) ablk_helper(E) cryptd(E) edac_core(E) pcspkr(E) lpc_ich(E) mfd_core(E) shpchp(E) mei_me(E) mei(E) i2c_i801(E) 8250_fintek(E) joydev(E) ioatdma(E) ipmi_si(E) acpi_pad(E) ipmi_devintf(E) ipmi_msghandler(E) ext4(E) mbcache(E) jbd2(E) raid0(E) uas(E) usb_storage(E) sd_mod(E) mgag200(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) drm_kms_helper(E) isci(E) ttm(E) libsas(E) ahci(E) scsi_transport_sas(E) libahci(E) drm(E) ixgbe(E) igb(E) mdio(E) libata(E) ptp(E) i2c_algo_bit(E) pps_core(E)
Oct 28 16:32:56 localhost kernel: dca(E) megaraid_sas(E)
Oct 28 16:32:56 localhost kernel: CPU: 21 PID: 619 Comm: kworker/21:2 Tainted: G            E   4.0.5-1.el7.elrepo.x86_64 #1
Oct 28 16:32:56 localhost kernel: Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 1.0a 06/05/2012
Oct 28 16:32:56 localhost kernel: Workqueue: md_misc mddev_delayed_delete
Oct 28 16:32:56 localhost kernel: 0000000000000000 00000000847739b4 ffff881854a8fcb8 ffffffff816ac3e3
Oct 28 16:32:56 localhost kernel: 0000000000000000 0000000000000000 ffff881854a8fcf8 ffffffff8107860a
Oct 28 16:32:56 localhost kernel: 0000000000000000 ffff881036b35800 0000000000000000 0000000000000000
Oct 28 16:32:56 localhost kernel: Call Trace:
Oct 28 16:32:56 localhost kernel: [<ffffffff816ac3e3>] dump_stack+0x45/0x57
Oct 28 16:32:56 localhost kernel: [<ffffffff8107860a>] warn_slowpath_common+0x8a/0xc0
Oct 28 16:32:56 localhost kernel: [<ffffffff8107873a>] warn_slowpath_null+0x1a/0x20
Oct 28 16:32:56 localhost kernel: [<ffffffff81199f36>] bdi_unregister+0x36/0x40
Oct 28 16:32:56 localhost kernel: [<ffffffff813023a1>] del_gendisk+0x131/0x2b0
Oct 28 16:32:56 localhost kernel: [<ffffffff81536ee3>] md_free+0x43/0x60
Oct 28 16:32:56 localhost kernel: [<ffffffff8131eaeb>] kobject_cleanup+0x7b/0x1a0
Oct 28 16:32:56 localhost kernel: [<ffffffff8131e990>] kobject_put+0x30/0x70
Oct 28 16:32:56 localhost kernel: [<ffffffff81534f44>] mddev_delayed_delete+0x34/0x40
Oct 28 16:32:56 localhost kernel: [<ffffffff81091e1d>] process_one_work+0x14d/0x420
Oct 28 16:32:56 localhost kernel: [<ffffffff810925e2>] worker_thread+0x112/0x510
Oct 28 16:32:56 localhost kernel: [<ffffffff810924d0>] ? rescuer_thread+0x3e0/0x3e0
Oct 28 16:32:56 localhost kernel: [<ffffffff810979c8>] kthread+0xd8/0xf0
Oct 28 16:32:56 localhost kernel: [<ffffffff810978f0>] ? kthread_create_on_node+0x1b0/0x1b0
Oct 28 16:32:56 localhost kernel: [<ffffffff816b3918>] ret_from_fork+0x58/0x90
Oct 28 16:32:56 localhost kernel: [<ffffffff810978f0>] ? kthread_create_on_node+0x1b0/0x1b0
Oct 28 16:32:56 localhost kernel: ---[ end trace 2ff0e61b956ef0af ]---
Oct 28 16:32:56 localhost kernel: md: md1 stopped.

This is a RAID 0 issue and appears to be identical to the bug described 
here: https://bugzilla.redhat.com/show_bug.cgi?id=1226621.

The fix mentioned in this ticket unfortunately is for Fedora core 
(kernel-4.0.5-200.fc21) and is not readily usable with CentOS (at least 
not without dealing with lots of dependency issues). Is there a 4.0.x 
kernel from El Repo that has this fix?

Peter