[elrepo] Nvidia Driver Performance: El Repo vs. NVidia's .run file
Phil Perry
phil at elrepo.org
Tue Apr 26 13:00:33 EDT 2016
On 26/04/16 16:58, Steve Cleveland wrote:
> On 4/25/2016 12:21 PM, Phil Perry wrote:
>> On 25/04/16 18:09, EBradley at williams-int.com wrote:
>>> Hello Manuel and Phil,
>>>
>>> Thank you both for the prompt replies! Before I provide the info
>>> requested by Manuel I wanted to share something I discovered in the
>>> interim. On my test machine (still using the el repo drivers) I
>>> realized that the terminal window from which Ansys Mechanical was
>>> launched contained an error message I hadn't seen earlier:
>>>
>>> libGl error: failed to load driver: swrast
>>>
>>> This error message is not present when launching the program from the
>>> user's machine mentioned earlier, as it is now using the .run driver
>>> from Nvidia.com. Googling led me to a few different sites, one of
>>> which recommended checking out the symbolic links in /usr/lib64;
>>> specifically those regarding libGL.so.1. Upon doing so, and in
>>> comparing my machine to the user's machine, I realized there were some
>>> pretty obvious differences between the two. To start,
>>> /usr/lib64/libGL.so.1 on the user's machine is a symbolic link to
>>> libGL.so.361.42 whereas on my machine it links to libGL.so.1.2.0. The
>>> libGL.so.361.42 file on my machine is contained in /usr/lib64/nvidia,
>>> a folder which is not present on the user's machine. This folder
>>> contains, as one would expect, many Nvidia-specific library files and
>>> symbolic links that appear to be contained directly in /usr/lib64 on
>>> the user's machine. When I modified my /usr/lib64/libGL.so.1 symbolic
>>> link to point to /usr/lib64/nvidia/libGl.so.361.42
>> the part
>> was displayed as expected in Ansys Mechanical and the error message re:
>> swrast was also absent from the terminal window.
>>>
>>> While this is good, I have to assume that my fix of reconfiguring the
>>> symbolic link for libGl.so.1 isn't suitable for long-term production
>>> use as the next update to the el repo driver or kernel will most
>>> likely overwrite my changes. I am also a bit concerned that there are
>>> other applications waiting to call upon some other library or symbolic
>>> link that is missing/misconfigured. The differences between /usr/lib64
>>> on my machine and the user's seems too great to be ignored. I'm not
>>> sure if this gives you enough to go on in identifying a bug so please
>>> let me know if there's anything else I can provide, or if you'd still
>>> like to see the info originally requested by Manuel.
>>>
>>> Thanks,
>>>
>>>
>>> Evan
>>
>> Ah, the way we install and handle libGL is one major difference between
>> the elrepo driver and the nvidia installer.
>>
>> The nvidia drivers use their own libGL library. The nvidia installer
>> backs up the original distro file in /usr/lib{64}/ and replaces it with
>> the nvidia version of lbGL. This approach works fine until the distro
>> updates the mesa-libGL package thus overwriting the nvidia library and
>> breaking the installation.
>>
>> To better solve this issue we install all the nvidia libs to a separate
>> nvidia dir /usr/lib{64}/nvidia/ and then update the lib path in
>> /etc/ld.so.conf.d/nvidia.conf
>>
>> cat /etc/ld.so.conf.d/nvidia.conf
>> /usr/lib64/nvidia
>> /usr/lib64/vdpau
>> /usr/lib/nvidia
>> /usr/lib/vdpau
>>
>> So your system should be using the nvidia copy of libGL in
>> /usr/lib{64}/nvidia/, not the distro copy in /usr/lib{64}/ (assuming you
>> have the elrepo drivers installed)
>>
>> You can confirm this by using the ldd command to see which version a
>> program is linked against. For example,
>>
>>
>> # ldd /usr/bin/glxgears | grep libGL
>> libGL.so.1 => /usr/lib64/nvidia/libGL.so.1 (0x0000003496200000)
>>
>> and we see it's correctly linked against the nvidia libGL.
>>
>> Please try running the above on a few programs including the program you
>> are having issues with and let us know which copy of libGL is being used.
>>
>> Given your workaround fixes the issue, it would appear to be a bug in
>> your application which appears to be using the wrong libGL (the above
>> should confirm this).
>>
>> As you correctly identify, your workaround will work fine until the
>> distro updates the libGL.so.1 lib and symlinks, which is exactly the
>> case for the nvidia .run installer and the very reason we don't package
>> the files that way.
>
> We have an engineering application that tries to use the system libGL.so
> as well. We're now using puppet to soft link /usr/lib64/libGL.so to
> /usr/lib64/nvidia/libGL.so. I think we're doing the same for 32bit as
> well. Not ideal, but puppet does fix the link when the system libraries
> are updated. A cron job could accomplish the same thing. Unfortunately,
> we have no option to push on the developer to fix it as they don't exist
> anymore.
>
> - Steve
>
Thanks for the input Steve.
We could potentially fix this with a %triggerin script in our rpm
package to create/fix the symlink every time mesa-libGL is updated, but
it's not a very elegant way to fix the issue plus it's not really our
bug to fix (it should really be fixed in the offending application).
More information about the elrepo
mailing list