[elrepo] Problem with CUDA since 331.67.elrepo
Phil Perry
phil at elrepo.org
Fri May 2 18:03:10 EDT 2014
On 02/05/14 18:00, Michael Lampe wrote:
> Phil Perry wrote:
>
>> I've merged your patches, and built some testing packages (331.67-2)
>> which I can release to the testing repo, but I've come across a small
>> issue whilst doing some quick pre-release testing.
>>
>> On RHEL6, when running glxgears the animation noticeably stutters, it is
>> no longer smooth. The fps count is still reported as ~60fps, apparently
>> linked to the refresh rate of my panel, but the animation "looks" more
>> like 5-10 fps!
>>
>> Downgrading to 331.67-1 confirmed we appear to have introduced a glitch.
>>
>> Unloading the nvidia-uvm module had no effect so that does not appear to
>> be the cause.
>>
>> Commenting out the 'NVreg_ModifyDeviceFiles=0' in
>> /etc/modprobe.d/nvidia.conf fixed the issue.
>>
>> Are you able to observe similar behaviour?
>>
>> I don't observe any issues on RHEL5 where glxgears reports ~11,000fps
>> with or without 'NVreg_ModifyDeviceFiles=0'.
>
> Well, I admit to have tested mostly with el5, which works like I
> described, see
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/s1-pam-console.html.
>
>
> The only el6 machine with nvidia hardware I have available here at work
> is a GPU-Server. It has no video-out, so I cannot login at the console.
> It also uses another mechanism for device file permissions (actually
> none: r/w for everyone).
>
> El6 doesn't have /etc/security/console.perms.d/50-default.perms, it
> would want to use something like /lib/udev/rules.d/70-acl.rules to add
> an acl for _all_ locally logged in users.
>
> Options/ideas:
>
> 1) Write a udev rule for nvidia's modules. I'm 99% sure this won't work,
> because the nvidia stuff doesn't populate sysfs and never creates udev
> events.
>
> 2) Create /etc/security/console.perms.d/50-nvidia.perms with a line like
> this:
>
> <console> 0600 /dev/nvidia* 0600 root
>
> Then permissions should be handled as in el5. Multiple logins via X
> won't work I guess, because permissions cannot be accumulated like
> entries to an acl.
>
> 3) Create all devices once r/w for everyone and stick to that.
>
> 4) Admit defeat. Remove NVreg_ModifyDeviceFiles=0, put in suid root
> nvidia-modprobe, and let nvidia have their bloody way.
>
> Better ideas?
>
> -Michael
ATM I'm inclined to go with option 4 for the following reasons:
1. My goal is to package the NVIDIA driver to replicate as closely as
possible (and where appropriate) the behaviour of the NVIDIA installer
package, whilst providing a consistent packaged solution that addresses
issues such as library conflicts (e.g, libGL).
2. I'm not particularly keen to reinvent the wheel. If nvidia-modprobe
works then I see no reason to craft another solution for a problem that
doesn't exist. It may not be the way we would have gone about solving
the problem, but it's the solution nvidia have given us.
3. I'm also really not keen on having the nvidia-uvm module loaded by
default. My understanding is that on a default NVIDIA installer
installation only the nvidia module is loaded by default. CUDA
applications trigger the nvidia-uvm module to load (if not already
loaded) at run time by forking nvidia-modprobe. So I don't think it
appropriate to load the nvidia-uvm module by default on all
installations as a) it deviates from upstream default behaviour and b)
is not particularly efficient for the 90% plus users who don't use CUDA
and don't need the nvidia-uvm module loaded.
I prefer this to the alternative of creating the device files and
loading the module on all installations. Another option would be to
split nvidia-uvm out into a separate package (e.g, kmod-nvidia-uvm) that
installs the nvidia-uvm kernel module, loads it and creates the
necessary device files. Then CUDA users can install this extra package
without encumbering the rest of the nvidia user-base. However, this
wouldn't be my preferred option as it creates more work for me having to
maintain and build an extra package for every driver release (over
multiple arches / distro releases). I already have to manually update
and build 12 packages for each new nvidia release (soon to be 15 with
the release of RHEL7) so you'll understand why I'm not keen to add 4-5 more.
So I'd propose dropping the /etc/modprobe.d/nvidia.conf settings, adding
nvidia-modprobe to the package and see how that works. BTW, I believe
this is all rpmfusion has done for their Fedora packages, and likewise
debian.
I'm happy to ship an /etc/modprobe.d/nvidia.conf file, and mark it as
noreplace so users can add their personal configurations as required.
I'm also happy to populate it with the options you've provided, as
examples, but commented out by default. We can provide a brief
description / document these options and allow users to uncomment them
if they wish on a case by case basis.
I'm hoping that will at least give us a minimalistic working base.
Thoughts?
Phil
More information about the elrepo
mailing list