[elrepo] Nvidia Driver Performance: El Repo vs. NVidia's .run file

EBradley at williams-int.com EBradley at williams-int.com
Tue Apr 26 14:08:51 EDT 2016


Hi Manuel, Phil, and Steve,

Thank you all for the replies! And let me also apologize for my email
client not following the thread structure in the list. I'll reply to
everyone in this message so as not to disrupt the flow any more than
necessary.

Manuel:  I do now see why you requested the Xorg.0.logs and it appears
that I took the long way around. I just wasn't sure if the libGL
discussion was going to take us in another direction rendering your
request moot, so I figured I'd post that info first. 

Phil:  Interesting, thanks for the info. Per your request, the ldd
command shows that the el repo drivers are functioning as intended for
glxgears and the system in general:

[224 root at system]ldd /usr/bin/glxgears | grep libGL
        libGLU.so.1 => /usr/lib64/libGLU.so.1 (0x0000003953800000)
        libGL.so.1 => /usr/lib64/nvidia/libGL.so.1 (0x0000003951600000)

I did try the above on a few other applications (firefox, glchess,
baobab) but either they weren't dynamic executables or didn't contain
any links to libGL.so.1. I also wasn't able to get a clear result from
the above in regards to Ansys Mechanical as it isn't clear exactly how
it's launched when the user double-clicks on a results cell. I tried a
few binaries within the installation directory, but either they too
weren't dynamic executables or there's some other vendor-specific voodoo
at work as Ansys likes to launch their Windows .exe files on Linux via
mono. At any rate, I'm glad to see I was on the right track and not
really surprised to find that it's the fault of the application.
Unfortunately, I'm in the same boat as Steve in that it probably won't
do much good for me to ask the developers of the app to fix the linking
issue. I'll probably have to decide if it's easier for me to create a
script that fixes the link after each kernel upgrade or go back to the
nvidia.com drivers and explore installing them with the --dkms option so
DKMS will recreate the kernel modules for the newly-installed kernel.
Since neither of those options are under the scope of this list, I'll
consider this issue resolved as far as el repo is concerned.

Steve:  Thanks for the post re: Puppet. I'll take a look at that as a
possible option going forward.

Thanks again for the support on this issue and for the el repo project
in general!


Evan

-----Original Message-----
From: Phil Perry [mailto:phil at elrepo.org] 
Sent: Monday, April 25, 2016 3:22 PM
To: elrepo at lists.elrepo.org
Subject: Re: [elrepo] Nvidia Driver Performance: El Repo vs. NVidia's
.run file

On 25/04/16 18:09, EBradley at williams-int.com wrote:
> Hello Manuel and Phil,
>
> Thank you both for the prompt replies! Before I provide the info
requested by Manuel I wanted to share something I discovered in the
interim. On my test machine (still using the el repo drivers) I realized
that the terminal window from which Ansys Mechanical was launched
contained an error message I hadn't seen earlier:
>
> libGl error: failed to load driver: swrast
>
> This error message is not present when launching the program from the 
> user's machine mentioned earlier, as it is now using the .run driver 
> from Nvidia.com. Googling led me to a few different sites, one of 
> which recommended checking out the symbolic links in /usr/lib64; 
> specifically those regarding libGL.so.1. Upon doing so, and in 
> comparing my machine to the user's machine, I realized there were some

> pretty obvious differences between the two. To start, 
> /usr/lib64/libGL.so.1 on the user's machine is a symbolic link to 
> libGL.so.361.42 whereas on my machine it links to libGL.so.1.2.0. The 
> libGL.so.361.42 file on my machine is contained in /usr/lib64/nvidia, 
> a folder which is not present on the user's machine. This folder 
> contains, as one would expect, many Nvidia-specific library files and 
> symbolic links that appear to be contained directly in /usr/lib64 on 
> the user's machine. When I modified my /usr/lib64/libGL.so.1 symbolic 
> link to point to /usr/lib64/nvidia/libGl.so.361.42
  the part
 was displayed as expected in Ansys Mechanical and the error message re:
swrast was also absent from the terminal window.
>
> While this is good, I have to assume that my fix of reconfiguring the
symbolic link for libGl.so.1 isn't suitable for long-term production use
as the next update to the el repo driver or kernel will most likely
overwrite my changes. I am also a bit concerned that there are other
applications waiting to call upon some other library or symbolic link
that is missing/misconfigured. The differences between /usr/lib64 on my
machine and the user's seems too great to be ignored. I'm not sure if
this gives you enough to go on in identifying a bug so please let me
know if there's anything else I can provide, or if you'd still like to
see the info originally requested by Manuel.
>
> Thanks,
>
>
> Evan

Ah, the way we install and handle libGL is one major difference between
the elrepo driver and the nvidia installer.

The nvidia drivers use their own libGL library. The nvidia installer
backs up the original distro file in /usr/lib{64}/ and replaces it with
the nvidia version of lbGL. This approach works fine until the distro
updates the mesa-libGL package thus overwriting the nvidia library and
breaking the installation.

To better solve this issue we install all the nvidia libs to a separate
nvidia dir /usr/lib{64}/nvidia/ and then update the lib path in
/etc/ld.so.conf.d/nvidia.conf

cat /etc/ld.so.conf.d/nvidia.conf
/usr/lib64/nvidia
/usr/lib64/vdpau
/usr/lib/nvidia
/usr/lib/vdpau

So your system should be using the nvidia copy of libGL in
/usr/lib{64}/nvidia/, not the distro copy in /usr/lib{64}/ (assuming you
have the elrepo drivers installed)

You can confirm this by using the ldd command to see which version a
program is linked against. For example,


# ldd /usr/bin/glxgears | grep libGL
         libGL.so.1 => /usr/lib64/nvidia/libGL.so.1 (0x0000003496200000)

and we see it's correctly linked against the nvidia libGL.

Please try running the above on a few programs including the program you

are having issues with and let us know which copy of libGL is being
used.

Given your workaround fixes the issue, it would appear to be a bug in 
your application which appears to be using the wrong libGL (the above 
should confirm this).

As you correctly identify, your workaround will work fine until the 
distro updates the libGL.so.1 lib and symlinks, which is exactly the 
case for the nvidia .run installer and the very reason we don't package 
the files that way.



WILLIAMS INTERNATIONAL                                  THE POWER OF VISION     

This email message and any attachment(s) are for the sole use of the intended
recipient(s) and may contain proprietary and/or confidential information which may
be privileged or otherwise protected from disclosure.

Any unauthorized review, use, disclosure or distribution is prohibited. If you are
not the intended recipient(s), please contact the sender by reply email and destroy
the original message and any copies of the message as well as any attachment(s)
to the original message.

This email message does not form a binding contract or contract amendment with
the sender, unless it clearly states in writing that it is a contract or contract amendment.



More information about the elrepo mailing list