[elrepo] nvidia plugin slow

Phil Perry phil at elrepo.org
Fri May 31 16:18:00 EDT 2019


On 31/05/2019 00:18, Steve Cleveland wrote:
> I've been seeing lately that yum has been running very slow on some 
> machines.  I did some testing disabling repos and then plugins.  The 
> combination of elrepo being enabled, nvidia plugin being enabled and an 
> nvidia GPU being present seems to be causing the problem.  That makes 
> sense as the plugin needs the repo and the hardware to function.
> 
> Example:
> 
> # time yum search kmod-nvidia --disableplugin=nvidia
> Loaded plugins: fastestmirror
> Loading mirror speeds from cached hostfile
> =========================== N/S matched: kmod-nvidia 
> ===========================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
> 
>    Name and summary matches only, use "search all" for everything.
> 
> real    0m0.569s
> user    0m0.396s
> sys    0m0.172s
> 
> # time yum search kmod-nvidia
> Loaded plugins: fastestmirror, nvidia
> Loading mirror speeds from cached hostfile
> =========================== N/S matched: kmod-nvidia 
> ===========================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
> 
>    Name and summary matches only, use "search all" for everything.
> 
> real    0m41.644s
> user    0m40.973s
> sys    0m0.564s
> 
>  From half a second to 40 seconds.  Is there a way to for me to debug 
> what's happening?  I assume this is not expected behavior? I haven't 
> noticed it in the past.
> 
> Thanks,
> 
>   - Steve
> 

Hi Steve,

For reference, here's what I see on my system with nvidia hardware:

# time yum search kmod-nvidia --disableplugin=nvidia
Loaded plugins: fastestmirror, langpacks, product-id, 
search-disabled-repos, subscription-manager
Loading mirror speeds from cached hostfile
  * elrepo: mirrors.coreix.net
  * epel: fedora.cu.be
  * nux-dextop: mirror.li.nux.ro
============================== N/S matched: kmod-nvidia 
==============================
kmod-nvidia.x86_64 : nvidia kernel module(s)
kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)

   Name and summary matches only, use "search all" for everything.

real	0m2.651s
user	0m0.958s
sys	0m0.176s


and with nvidia plugin enabled it is a little slower, but nowhere near 
as slow as for you:

# time yum search kmod-nvidia
Loaded plugins: fastestmirror, langpacks, nvidia, product-id, 
search-disabled-repos, subscription-manager
Loading mirror speeds from cached hostfile
  * elrepo: mirrors.coreix.net
  * epel: fedora.cu.be
  * nux-dextop: mirror.li.nux.ro
============================== N/S matched: kmod-nvidia 
==============================
kmod-nvidia.x86_64 : nvidia kernel module(s)
kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)

   Name and summary matches only, use "search all" for everything.

real	0m7.791s
user	0m5.785s
sys	0m0.485s


I assume you have the latest version installed:

# rpm -q yum-plugin-nvidia
yum-plugin-nvidia-1.0.2-1.el7.elrepo.noarch

Please can you try running yum at debuglevel 3. Most of the additional 
time for me seems to come during pkgsack time:

# time yum -d 3 search kmod-nvidia --disableplugin=nvidia
Not loading "rhnplugin" plugin, as it is disabled
Loaded plugins: fastestmirror, langpacks, product-id, 
search-disabled-repos, subscription-manager
Adding en_GB.UTF-8 to language list
Updating Subscription Management repositories.
Config time: 2.079
Adding en_GB to language list
Yum version: 3.4.3
Setting up Package Sacks
Loading mirror speeds from cached hostfile
  * elrepo: mirrors.coreix.net
  * epel: fedora.cu.be
  * nux-dextop: mirror.li.nux.ro
pkgsack time: 0.025
rpmdb time: 0.000
tags time: 0.000
============================== N/S matched: kmod-nvidia 
==============================
kmod-nvidia.x86_64 : nvidia kernel module(s)
kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)

   Name and summary matches only, use "search all" for everything.

real	0m2.690s
user	0m0.962s
sys	0m0.183s


# time yum -d 3 search kmod-nvidia
Not loading "rhnplugin" plugin, as it is disabled
Loaded plugins: fastestmirror, langpacks, nvidia, product-id, 
search-disabled-repos, subscription-manager
Adding en_GB.UTF-8 to language list
Updating Subscription Management repositories.
Config time: 2.083
Adding en_GB to language list
[nvidia]: device found: 
pci:v000010DEd00001287sv00001043sd00008501bc03sc00i00
Yum version: 3.4.3
Setting up Package Sacks
Loading mirror speeds from cached hostfile
  * elrepo: mirrors.coreix.net
  * epel: fedora.cu.be
  * nux-dextop: mirror.li.nux.ro
rpmdb time: 0.000
pkgsack time: 5.137
tags time: 0.000
============================== N/S matched: kmod-nvidia 
==============================
kmod-nvidia.x86_64 : nvidia kernel module(s)
kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)

   Name and summary matches only, use "search all" for everything.

real	0m7.803s
user	0m5.772s
sys	0m0.491s


Running in debuglevel 5 may start shedding some light on the possible 
reason for the slowdown.

The plugin is written in python, which is a relatively slow language. 
See here for the code:

https://github.com/elrepo/packages/blob/master/yum-plugin-nvidia/nvidia.py

The plugin first retrieves the pci ID string for the installed device 
from /sys/bus/ directory structure (init_hook(conduit)) - this should be 
relatively quick and should show up under 'Config time'.

The plug then checks each package to see if it provides a blacklist 
string of blacklisted device IDs (exclude_hook(conduit)). Our nvidia 
packages provide these special blacklist provides.

For each package found to contain a blacklist provides, we loop through 
each individual blacklisted pci ID provide to see if it matches our 
installed hardware pci ID, and if it does we exclude the package from 
the package sack for consideration. I'm guessing this whole process is 
relatively slow, and becomes slower the more provides get added (as more 
devices become legacy) and the more packages there are in the repository.

The last point comes into play here. Recently (el7.6) we started 
releasing an additional nvidia-x11-drv-libs package which took the 
overall package count from 3 to 4 for each driver release (33.3% 
increase). Currently, debuglevel 5 shows us 20 packages are considered 
over 5 driver releases:

Searching 20 packages
searching package kmod-nvidia-410.93-1.el7_6.elrepo.x86_64
searching in provides entries
searching package kmod-nvidia-418.43-1.el7_6.elrepo.x86_64
searching in provides entries
searching package kmod-nvidia-418.56-1.el7_6.elrepo.x86_64
searching in provides entries
searching package kmod-nvidia-418.74-1.el7_6.elrepo.x86_64
searching in provides entries
searching package kmod-nvidia-430.14-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-410.93-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-418.43-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-418.56-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-418.74-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-430.14-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-410.93-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-410.93-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-418.43-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-418.43-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-418.56-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-418.56-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-418.74-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-418.74-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-430.14-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-430.14-1.el7_6.elrepo.x86_64
searching in provides entries

I could probably prune that back to the last 2 driver releases which 
should save considerable time if this is the cause.

I'll leave it alone for now so you can have a play at different 
debuglevels and share any observations, and then we can try trimming the 
number of packages to see the results.

Other than that, we would need to look at optimising the python code. 
I'm not much of a python expert, but I don't see much that can be 
optimised. The only thing that springs to mind is that we are searching 
all packages for 'blacklist' provides, when we could maybe limit that to 
only searching the elrepo repository. I've no idea if this is relevant - 
do you have a huge number of additional packages available to yum 
through any large repositories? In theory it should be easy enough to 
test with some debug code to show us what's going on at each stage so we 
can see exactly what's adding to the time.

Phil

PS - the other option is to disable the plugin if it's causing 
significant issues for you. It's only really useful when nvidia 
deprecates older hardware to a legacy release, at which point the plugin 
could be enabled again if required (we always get decent warning when 
nvidia are going to deprecate older hardware).



More information about the elrepo mailing list