[elrepo] mpt3sas driver
Phil Perry
phil at elrepo.org
Mon Mar 7 15:40:04 EST 2016
On 07/03/16 17:36, Roman Serbski wrote:
> Hello,
>
> We're experiencing a weird issue with mpt3sas driver under Centos 7
> (7.2.1511) installed on Lenovo System x3650 M5 with 12 SATA drives
> (2TB each). The server will be used as a data node for Big Data
> cluster, hence no RAID just JBOD. CentOS is installed on embedded SD
> card (32GB) and the kernel version is 3.10.0-327.
>
> The installation of CentOS goes fine. Here is a snippet from dmesg:
>
> [ 3.608523] mpt3sas version 04.100.00.00 loaded
> [ 3.608746] mpt3sas 0000:15:00.0: enabling device (0140 -> 0142)
> [ 3.608840] mpt3sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total
> mem (131640904 kB)
> [ 3.664247] mpt3sas0: MSI-X vectors supported: 8, no of cores: 32,
> max_msix_vectors: 8
> [ 3.664362] mpt3sas 0000:15:00.0: irq 46 for MSI/MSI-X
> [ 3.664403] mpt3sas 0000:15:00.0: irq 47 for MSI/MSI-X
> [ 3.664416] mpt3sas 0000:15:00.0: irq 48 for MSI/MSI-X
> [ 3.664450] mpt3sas 0000:15:00.0: irq 49 for MSI/MSI-X
> [ 3.664483] mpt3sas 0000:15:00.0: irq 50 for MSI/MSI-X
> [ 3.664522] mpt3sas 0000:15:00.0: irq 51 for MSI/MSI-X
> [ 3.664564] mpt3sas 0000:15:00.0: irq 52 for MSI/MSI-X
> [ 3.664597] mpt3sas 0000:15:00.0: irq 53 for MSI/MSI-X
> [ 3.664893] mpt3sas0-msix0: PCI-MSI-X enabled: IRQ 46
> [ 3.664894] mpt3sas0-msix1: PCI-MSI-X enabled: IRQ 47
> [ 3.664894] mpt3sas0-msix2: PCI-MSI-X enabled: IRQ 48
> [ 3.664895] mpt3sas0-msix3: PCI-MSI-X enabled: IRQ 49
> [ 3.664895] mpt3sas0-msix4: PCI-MSI-X enabled: IRQ 50
> [ 3.664896] mpt3sas0-msix5: PCI-MSI-X enabled: IRQ 51
> [ 3.664896] mpt3sas0-msix6: PCI-MSI-X enabled: IRQ 52
> [ 3.664897] mpt3sas0-msix7: PCI-MSI-X enabled: IRQ 53
> [ 3.664898] mpt3sas0: iomem(0x0000000090cf0000),
> mapped(0xffffc9001c320000), size(65536)
> [ 3.664899] mpt3sas0: ioport(0x0000000000002e00), size(256)
> [ 3.809272] mpt3sas0: Allocated physical memory: size(17342 kB)
> [ 3.809275] mpt3sas0: Current Controller Queue Depth(10123),Max
> Controller Queue Depth(10240)
> [ 3.809277] mpt3sas0: Scatter Gather Elements per IO(128)
> [ 3.855056] mpt3sas0: LSISAS3008: FWVersion(03.00.07.00),
> ChipRevision(0x02), BiosVersion(06.00.01.00)
> [ 3.855059] mpt3sas0: Protocol=(
> [ 3.855646] mpt3sas0: sending port enable !!
> [ 5.649806] mpt3sas0: host_add: handle(0x0001),
> sas_addr(0x500605b00ab56830), phys(8)
> [ 5.651910] mpt3sas0: expander_add: handle(0x0009), parent(0x0001),
> sas_addr(0x500507606345287f), phys(25)
> [ 12.420259] mpt3sas0: port enable: SUCCESS
>
> However, if I reboot the server, scsi 0:0:0:0 gets blocked, all
> remaining scsi IDs get shifted and I see "device is not present
> handle" errors:
>
> [ 6.107015] scsi 0:0:0:0: device_blocked, handle(0x000a)
>
> [ 3.591144] mpt3sas version 04.100.00.00 loaded
> [ 3.591350] mpt3sas 0000:15:00.0: enabling device (0140 -> 0142)
> [ 3.591440] mpt3sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total
> mem (131640904 kB)
> [ 3.646675] mpt3sas0: MSI-X vectors supported: 8, no of cores: 32,
> max_msix_vectors: 8
> [ 3.646721] mpt3sas 0000:15:00.0: irq 46 for MSI/MSI-X
> [ 3.646729] mpt3sas 0000:15:00.0: irq 47 for MSI/MSI-X
> [ 3.646737] mpt3sas 0000:15:00.0: irq 48 for MSI/MSI-X
> [ 3.646745] mpt3sas 0000:15:00.0: irq 49 for MSI/MSI-X
> [ 3.646753] mpt3sas 0000:15:00.0: irq 50 for MSI/MSI-X
> [ 3.646769] mpt3sas 0000:15:00.0: irq 51 for MSI/MSI-X
> [ 3.646776] mpt3sas 0000:15:00.0: irq 52 for MSI/MSI-X
> [ 3.646784] mpt3sas 0000:15:00.0: irq 53 for MSI/MSI-X
> [ 3.646903] mpt3sas0-msix0: PCI-MSI-X enabled: IRQ 46
> [ 3.646905] mpt3sas0-msix1: PCI-MSI-X enabled: IRQ 47
> [ 3.646906] mpt3sas0-msix2: PCI-MSI-X enabled: IRQ 48
> [ 3.646907] mpt3sas0-msix3: PCI-MSI-X enabled: IRQ 49
> [ 3.646910] mpt3sas0-msix4: PCI-MSI-X enabled: IRQ 50
> [ 3.646911] mpt3sas0-msix5: PCI-MSI-X enabled: IRQ 51
> [ 3.646913] mpt3sas0-msix6: PCI-MSI-X enabled: IRQ 52
> [ 3.646914] mpt3sas0-msix7: PCI-MSI-X enabled: IRQ 53
> [ 3.646916] mpt3sas0: iomem(0x0000000090cf0000),
> mapped(0xffffc9001c8c0000), size(65536)
> [ 3.646917] mpt3sas0: ioport(0x0000000000002e00), size(256)
> [ 3.774651] mpt3sas0: Allocated physical memory: size(17342 kB)
> [ 3.774654] mpt3sas0: Current Controller Queue Depth(10123),Max
> Controller Queue Depth(10240)
> [ 3.774655] mpt3sas0: Scatter Gather Elements per IO(128)
> [ 3.820429] mpt3sas0: LSISAS3008: FWVersion(03.00.07.00),
> ChipRevision(0x02), BiosVersion(06.00.01.00)
> [ 3.820431] mpt3sas0: Protocol=(
> [ 3.821017] mpt3sas0: sending port enable !!
> [ 5.590029] mpt3sas0: host_add: handle(0x0001),
> sas_addr(0x500605b00ab56830), phys(8)
> [ 5.592233] mpt3sas0: expander_add: handle(0x0009), parent(0x0001),
> sas_addr(0x500507606345287f), phys(25)
> [ 5.906518] mpt3sas0: device is not present handle(0x04b)!!!
> [ 5.910632] mpt3sas0: device is not present handle(0x04c)!!!
> [ 5.913716] mpt3sas0: device is not present handle(0x04d)!!!
> [ 5.913836] mpt3sas0: device is not present handle(0x04e)!!!
> [ 5.913948] mpt3sas0: device is not present handle(0x04f)!!!
> [ 5.914059] mpt3sas0: device is not present handle(0x0410)!!!
> [ 5.914170] mpt3sas0: device is not present handle(0x0411)!!!
> [ 5.914281] mpt3sas0: device is not present handle(0x0412)!!!
> [ 5.914392] mpt3sas0: device is not present handle(0x0413)!!!
> [ 5.914504] mpt3sas0: device is not present handle(0x0414)!!!
> [ 5.914631] mpt3sas0: device is not present handle(0x0415)!!!
> [ 11.846267] mpt3sas0: port enable: SUCCESS
>
> If I reboot the server one more time everything is back to normal
> until the next reboot.
>
> I've just tried the latest kernel from ELRepo (4.4.4-1) which includes
> version 09.102.00.00 of mpt3sas driver and it works without any issues
> and reboot of the server doesn't change scsi IDs.
>
> I'm not sure how doable it is, but would somebody be so kind to build
> (or help me to build) a kmod package with 09.102.00.00 mpt3sas drivers
> for 3.10.0-327 kernel?
>
> Many thanks in advance.
Hi,
I've had a look at the possibility of backporting a newer version of the
driver from a more recent kernel, and unfortunately due to ABI changes
this is simply not possible in this case.
My first suggestion is that you file a bug report with Red Hat -
hopefully they can fix the issue which will then flow downstream to
CentOS. You have already demonstrated the issue is fixed in a later kernel.
Second, I would encourage you to review the patches submitted for the
4.4 kernel driver that you have confirmed works, and see if you can
identify the patch(es) that fix the issue (this information would also
be extremely useful for the above bug report):
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/drivers/scsi/mpt3sas?h=v4.4.4
There is then the possibility that we may be able to backport just those
patches to the current RHEL driver to build you an updated kmod driver
until RH is able to release a fix.
If you need help with that you could try emailing the driver maintainer,
describe your problem and see if (s)he can point you towards the correct
patch.
Hope that helps.
More information about the elrepo
mailing list