[elrepo] nvidia plugin slow
Phil Perry
phil at elrepo.org
Fri May 31 16:18:00 EDT 2019
On 31/05/2019 00:18, Steve Cleveland wrote:
> I've been seeing lately that yum has been running very slow on some
> machines. I did some testing disabling repos and then plugins. The
> combination of elrepo being enabled, nvidia plugin being enabled and an
> nvidia GPU being present seems to be causing the problem. That makes
> sense as the plugin needs the repo and the hardware to function.
>
> Example:
>
> # time yum search kmod-nvidia --disableplugin=nvidia
> Loaded plugins: fastestmirror
> Loading mirror speeds from cached hostfile
> =========================== N/S matched: kmod-nvidia
> ===========================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
> Name and summary matches only, use "search all" for everything.
>
> real 0m0.569s
> user 0m0.396s
> sys 0m0.172s
>
> # time yum search kmod-nvidia
> Loaded plugins: fastestmirror, nvidia
> Loading mirror speeds from cached hostfile
> =========================== N/S matched: kmod-nvidia
> ===========================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
> Name and summary matches only, use "search all" for everything.
>
> real 0m41.644s
> user 0m40.973s
> sys 0m0.564s
>
> From half a second to 40 seconds. Is there a way to for me to debug
> what's happening? I assume this is not expected behavior? I haven't
> noticed it in the past.
>
> Thanks,
>
> - Steve
>
Hi Steve,
For reference, here's what I see on my system with nvidia hardware:
# time yum search kmod-nvidia --disableplugin=nvidia
Loaded plugins: fastestmirror, langpacks, product-id,
search-disabled-repos, subscription-manager
Loading mirror speeds from cached hostfile
* elrepo: mirrors.coreix.net
* epel: fedora.cu.be
* nux-dextop: mirror.li.nux.ro
============================== N/S matched: kmod-nvidia
==============================
kmod-nvidia.x86_64 : nvidia kernel module(s)
kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
Name and summary matches only, use "search all" for everything.
real 0m2.651s
user 0m0.958s
sys 0m0.176s
and with nvidia plugin enabled it is a little slower, but nowhere near
as slow as for you:
# time yum search kmod-nvidia
Loaded plugins: fastestmirror, langpacks, nvidia, product-id,
search-disabled-repos, subscription-manager
Loading mirror speeds from cached hostfile
* elrepo: mirrors.coreix.net
* epel: fedora.cu.be
* nux-dextop: mirror.li.nux.ro
============================== N/S matched: kmod-nvidia
==============================
kmod-nvidia.x86_64 : nvidia kernel module(s)
kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
Name and summary matches only, use "search all" for everything.
real 0m7.791s
user 0m5.785s
sys 0m0.485s
I assume you have the latest version installed:
# rpm -q yum-plugin-nvidia
yum-plugin-nvidia-1.0.2-1.el7.elrepo.noarch
Please can you try running yum at debuglevel 3. Most of the additional
time for me seems to come during pkgsack time:
# time yum -d 3 search kmod-nvidia --disableplugin=nvidia
Not loading "rhnplugin" plugin, as it is disabled
Loaded plugins: fastestmirror, langpacks, product-id,
search-disabled-repos, subscription-manager
Adding en_GB.UTF-8 to language list
Updating Subscription Management repositories.
Config time: 2.079
Adding en_GB to language list
Yum version: 3.4.3
Setting up Package Sacks
Loading mirror speeds from cached hostfile
* elrepo: mirrors.coreix.net
* epel: fedora.cu.be
* nux-dextop: mirror.li.nux.ro
pkgsack time: 0.025
rpmdb time: 0.000
tags time: 0.000
============================== N/S matched: kmod-nvidia
==============================
kmod-nvidia.x86_64 : nvidia kernel module(s)
kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
Name and summary matches only, use "search all" for everything.
real 0m2.690s
user 0m0.962s
sys 0m0.183s
# time yum -d 3 search kmod-nvidia
Not loading "rhnplugin" plugin, as it is disabled
Loaded plugins: fastestmirror, langpacks, nvidia, product-id,
search-disabled-repos, subscription-manager
Adding en_GB.UTF-8 to language list
Updating Subscription Management repositories.
Config time: 2.083
Adding en_GB to language list
[nvidia]: device found:
pci:v000010DEd00001287sv00001043sd00008501bc03sc00i00
Yum version: 3.4.3
Setting up Package Sacks
Loading mirror speeds from cached hostfile
* elrepo: mirrors.coreix.net
* epel: fedora.cu.be
* nux-dextop: mirror.li.nux.ro
rpmdb time: 0.000
pkgsack time: 5.137
tags time: 0.000
============================== N/S matched: kmod-nvidia
==============================
kmod-nvidia.x86_64 : nvidia kernel module(s)
kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
Name and summary matches only, use "search all" for everything.
real 0m7.803s
user 0m5.772s
sys 0m0.491s
Running in debuglevel 5 may start shedding some light on the possible
reason for the slowdown.
The plugin is written in python, which is a relatively slow language.
See here for the code:
https://github.com/elrepo/packages/blob/master/yum-plugin-nvidia/nvidia.py
The plugin first retrieves the pci ID string for the installed device
from /sys/bus/ directory structure (init_hook(conduit)) - this should be
relatively quick and should show up under 'Config time'.
The plug then checks each package to see if it provides a blacklist
string of blacklisted device IDs (exclude_hook(conduit)). Our nvidia
packages provide these special blacklist provides.
For each package found to contain a blacklist provides, we loop through
each individual blacklisted pci ID provide to see if it matches our
installed hardware pci ID, and if it does we exclude the package from
the package sack for consideration. I'm guessing this whole process is
relatively slow, and becomes slower the more provides get added (as more
devices become legacy) and the more packages there are in the repository.
The last point comes into play here. Recently (el7.6) we started
releasing an additional nvidia-x11-drv-libs package which took the
overall package count from 3 to 4 for each driver release (33.3%
increase). Currently, debuglevel 5 shows us 20 packages are considered
over 5 driver releases:
Searching 20 packages
searching package kmod-nvidia-410.93-1.el7_6.elrepo.x86_64
searching in provides entries
searching package kmod-nvidia-418.43-1.el7_6.elrepo.x86_64
searching in provides entries
searching package kmod-nvidia-418.56-1.el7_6.elrepo.x86_64
searching in provides entries
searching package kmod-nvidia-418.74-1.el7_6.elrepo.x86_64
searching in provides entries
searching package kmod-nvidia-430.14-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-410.93-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-418.43-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-418.56-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-418.74-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-430.14-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-410.93-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-410.93-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-418.43-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-418.43-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-418.56-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-418.56-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-418.74-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-418.74-1.el7_6.elrepo.x86_64
searching in provides entries
searching package nvidia-x11-drv-libs-430.14-1.el7_6.elrepo.i686
searching in provides entries
searching package nvidia-x11-drv-libs-430.14-1.el7_6.elrepo.x86_64
searching in provides entries
I could probably prune that back to the last 2 driver releases which
should save considerable time if this is the cause.
I'll leave it alone for now so you can have a play at different
debuglevels and share any observations, and then we can try trimming the
number of packages to see the results.
Other than that, we would need to look at optimising the python code.
I'm not much of a python expert, but I don't see much that can be
optimised. The only thing that springs to mind is that we are searching
all packages for 'blacklist' provides, when we could maybe limit that to
only searching the elrepo repository. I've no idea if this is relevant -
do you have a huge number of additional packages available to yum
through any large repositories? In theory it should be easy enough to
test with some debug code to show us what's going on at each stage so we
can see exactly what's adding to the time.
Phil
PS - the other option is to disable the plugin if it's causing
significant issues for you. It's only really useful when nvidia
deprecates older hardware to a legacy release, at which point the plugin
could be enabled again if required (we always get decent warning when
nvidia are going to deprecate older hardware).
More information about the elrepo
mailing list