[elrepo] mpt3sas driver

Roman Serbski mefystofel at gmail.com
Tue Mar 8 06:15:48 EST 2016


On Tue, Mar 8, 2016 at 8:10 AM, Phil Perry <phil at elrepo.org> wrote:
> On 07/03/16 22:05, Roman Serbski wrote:
>>
>> On Mon, Mar 7, 2016 at 10:13 PM, Akemi Yagi <amyagi at gmail.com> wrote:
>>>
>>> On Mon, Mar 7, 2016 at 12:40 PM, Phil Perry <phil at elrepo.org> wrote:
>>>>
>>>>
>>>> On 07/03/16 17:36, Roman Serbski wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> We're experiencing a weird issue with mpt3sas driver under Centos 7
>>>>> (7.2.1511) installed on Lenovo System x3650 M5 with 12 SATA drives
>>>>> (2TB each). The server will be used as a data node for Big Data
>>>>> cluster, hence no RAID just JBOD. CentOS is installed on embedded SD
>>>>> card (32GB) and the kernel version is 3.10.0-327.
>>>>>
>>>>> If I reboot the server one more time everything is back to normal
>>>>> until the next reboot.
>>>>>
>>>>> I've just tried the latest kernel from ELRepo (4.4.4-1) which includes
>>>>> version 09.102.00.00 of mpt3sas driver and it works without any issues
>>>>> and reboot of the server doesn't change scsi IDs.
>>>>>
>>>>> I'm not sure how doable it is, but would somebody be so kind to build
>>>>> (or help me to build) a kmod package with 09.102.00.00 mpt3sas drivers
>>>>> for 3.10.0-327 kernel?
>>>>>
>>>>> Many thanks in advance.
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I've had a look at the possibility of backporting a newer version of the
>>>> driver from a more recent kernel, and unfortunately due to ABI changes
>>>> this
>>>> is simply not possible in this case.
>>>>
>>>> My first suggestion is that you file a bug report with Red Hat -
>>>> hopefully
>>>> they can fix the issue which will then flow downstream to CentOS. You
>>>> have
>>>> already demonstrated the issue is fixed in a later kernel.
>>>>
>>>> Second, I would encourage you to review the patches submitted for the
>>>> 4.4
>>>> kernel driver that you have confirmed works, and see if you can identify
>>>> the
>>>> patch(es) that fix the issue (this information would also be extremely
>>>> useful for the above bug report):
>>>>
>>>>
>>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/drivers/scsi/mpt3sas?h=v4.4.4
>>>>
>>>> There is then the possibility that we may be able to backport just those
>>>> patches to the current RHEL driver to build you an updated kmod driver
>>>> until
>>>> RH is able to release a fix.
>>>>
>>>> If you need help with that you could try emailing the driver maintainer,
>>>> describe your problem and see if (s)he can point you towards the correct
>>>> patch.
>>>>
>>>> Hope that helps.
>>>
>>>
>>> Yet another suggestion is to file a case with IBM and see if they can
>>> help.
>>> See for example:
>>>
>>>
>>> https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5099120
>>>
>>> Akemi
>>
>>
>> Thank you very much!  I'm going to try all three suggestions.
>>
>> After looking at
>>
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/drivers/scsi/mpt3sas?h=v4.4.4
>> I think that
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/scsi/mpt3sas?h=v4.4.4&id=e4bc7f5c21a18cab9acd30940df0ee791fcd7b9e
>> might be the one that fixed the issue.
>>
>> Thanks.
>
> OK, lets see if we can test that hypothesis by building an updated driver
> with that patch backported. Give me a day or two and I'll try to knock you
> up a package to test.
>
> Phil

Thank you Phil!

I managed to get feedback from the maintainer and after some debugging
he suspects that there is something wrong going on with the enclosure,
which is going into unresponsive state for some time during the
initial topology discovery.  He suggested to try two patches: the one
that I've emailed yesterday and one more:

1. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/scsi/mpt3sas?h=v4.4.4&id=e4bc7f5c21a18cab9acd30940df0ee791fcd7b9e
2. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/scsi/mpt3sas?h=v4.4.4&id=df838f92f3f5240dca54e1629e8547818e8ea646

In the meantime, I'm going to raise a bug with RedHat.

Thank you very much for your time.


More information about the elrepo mailing list