[elrepo] Problem with CUDA since 331.67.elrepo

Michael Lampe mlampe0 at googlemail.com
Fri May 2 13:00:55 EDT 2014


Phil Perry wrote:

> I've merged your patches, and built some testing packages (331.67-2)
> which I can release to the testing repo, but I've come across a small
> issue whilst doing some quick pre-release testing.
>
> On RHEL6, when running glxgears the animation noticeably stutters, it is
> no longer smooth. The fps count is still reported as ~60fps, apparently
> linked to the refresh rate of my panel, but the animation "looks" more
> like 5-10 fps!
>
> Downgrading to 331.67-1 confirmed we appear to have introduced a glitch.
>
> Unloading the nvidia-uvm module had no effect so that does not appear to
> be the cause.
>
> Commenting out the 'NVreg_ModifyDeviceFiles=0' in
> /etc/modprobe.d/nvidia.conf fixed the issue.
>
> Are you able to observe similar behaviour?
>
> I don't observe any issues on RHEL5 where glxgears reports ~11,000fps
> with or without 'NVreg_ModifyDeviceFiles=0'.

Well, I admit to have tested mostly with el5, which works like I 
described, see 
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/s1-pam-console.html.

The only el6 machine with nvidia hardware I have available here at work 
is a GPU-Server. It has no video-out, so I cannot login at the console. 
It also uses another mechanism for device file permissions (actually 
none: r/w for everyone).

El6 doesn't have /etc/security/console.perms.d/50-default.perms, it 
would want to use something like /lib/udev/rules.d/70-acl.rules to add 
an acl for _all_ locally logged in users.

Options/ideas:

1) Write a udev rule for nvidia's modules. I'm 99% sure this won't work, 
because the nvidia stuff doesn't populate sysfs and never creates udev 
events.

2) Create /etc/security/console.perms.d/50-nvidia.perms with a line like 
this:

<console> 0600 /dev/nvidia* 0600 root

Then permissions should be handled as in el5. Multiple logins via X 
won't work I guess, because permissions cannot be accumulated like 
entries to an acl.

3) Create all devices once r/w for everyone and stick to that.

4) Admit defeat. Remove NVreg_ModifyDeviceFiles=0, put in suid root 
nvidia-modprobe, and let nvidia have their bloody way.

Better ideas?

-Michael


More information about the elrepo mailing list