[elrepo] Nvidia Driver Performance: El Repo vs. NVidia's .run file

Steve Cleveland stevec at engr.oregonstate.edu
Tue Apr 26 11:58:04 EDT 2016


On 4/25/2016 12:21 PM, Phil Perry wrote:
> On 25/04/16 18:09, EBradley at williams-int.com wrote:
>> Hello Manuel and Phil,
>>
>> Thank you both for the prompt replies! Before I provide the info
>> requested by Manuel I wanted to share something I discovered in the
>> interim. On my test machine (still using the el repo drivers) I
>> realized that the terminal window from which Ansys Mechanical was
>> launched contained an error message I hadn't seen earlier:
>>
>> libGl error: failed to load driver: swrast
>>
>> This error message is not present when launching the program from the
>> user's machine mentioned earlier, as it is now using the .run driver
>> from Nvidia.com. Googling led me to a few different sites, one of
>> which recommended checking out the symbolic links in /usr/lib64;
>> specifically those regarding libGL.so.1. Upon doing so, and in
>> comparing my machine to the user's machine, I realized there were some
>> pretty obvious differences between the two. To start,
>> /usr/lib64/libGL.so.1 on the user's machine is a symbolic link to
>> libGL.so.361.42 whereas on my machine it links to libGL.so.1.2.0. The
>> libGL.so.361.42 file on my machine is contained in /usr/lib64/nvidia,
>> a folder which is not present on the user's machine. This folder
>> contains, as one would expect, many Nvidia-specific library files and
>> symbolic links that appear to be contained directly in /usr/lib64 on
>> the user's machine. When I modified my /usr/lib64/libGL.so.1 symbolic
>> link to point to /usr/lib64/nvidia/libGl.so.361.42
>  the part
> was displayed as expected in Ansys Mechanical and the error message re:
> swrast was also absent from the terminal window.
>>
>> While this is good, I have to assume that my fix of reconfiguring the
>> symbolic link for libGl.so.1 isn't suitable for long-term production
>> use as the next update to the el repo driver or kernel will most
>> likely overwrite my changes. I am also a bit concerned that there are
>> other applications waiting to call upon some other library or symbolic
>> link that is missing/misconfigured. The differences between /usr/lib64
>> on my machine and the user's seems too great to be ignored. I'm not
>> sure if this gives you enough to go on in identifying a bug so please
>> let me know if there's anything else I can provide, or if you'd still
>> like to see the info originally requested by Manuel.
>>
>> Thanks,
>>
>>
>> Evan
>
> Ah, the way we install and handle libGL is one major difference between
> the elrepo driver and the nvidia installer.
>
> The nvidia drivers use their own libGL library. The nvidia installer
> backs up the original distro file in /usr/lib{64}/ and replaces it with
> the nvidia version of lbGL. This approach works fine until the distro
> updates the mesa-libGL package thus overwriting the nvidia library and
> breaking the installation.
>
> To better solve this issue we install all the nvidia libs to a separate
> nvidia dir /usr/lib{64}/nvidia/ and then update the lib path in
> /etc/ld.so.conf.d/nvidia.conf
>
> cat /etc/ld.so.conf.d/nvidia.conf
> /usr/lib64/nvidia
> /usr/lib64/vdpau
> /usr/lib/nvidia
> /usr/lib/vdpau
>
> So your system should be using the nvidia copy of libGL in
> /usr/lib{64}/nvidia/, not the distro copy in /usr/lib{64}/ (assuming you
> have the elrepo drivers installed)
>
> You can confirm this by using the ldd command to see which version a
> program is linked against. For example,
>
>
> # ldd /usr/bin/glxgears | grep libGL
>         libGL.so.1 => /usr/lib64/nvidia/libGL.so.1 (0x0000003496200000)
>
> and we see it's correctly linked against the nvidia libGL.
>
> Please try running the above on a few programs including the program you
> are having issues with and let us know which copy of libGL is being used.
>
> Given your workaround fixes the issue, it would appear to be a bug in
> your application which appears to be using the wrong libGL (the above
> should confirm this).
>
> As you correctly identify, your workaround will work fine until the
> distro updates the libGL.so.1 lib and symlinks, which is exactly the
> case for the nvidia .run installer and the very reason we don't package
> the files that way.

We have an engineering application that tries to use the system libGL.so 
as well.  We're now using puppet to soft link /usr/lib64/libGL.so to 
/usr/lib64/nvidia/libGL.so.  I think we're doing the same for 32bit as 
well.  Not ideal, but puppet does fix the link when the system libraries 
are updated.  A cron job could accomplish the same thing. 
Unfortunately, we have no option to push on the developer to fix it as 
they don't exist anymore.

  - Steve



More information about the elrepo mailing list