[elrepo] nvidia plugin slow
Steve Cleveland
Steve.Cleveland at oregonstate.edu
Fri May 31 19:18:40 EDT 2019
On 5/31/19 1:18 PM, Phil Perry wrote:
> On 31/05/2019 00:18, Steve Cleveland wrote:
>> I've been seeing lately that yum has been running very slow on some
>> machines. I did some testing disabling repos and then plugins. The
>> combination of elrepo being enabled, nvidia plugin being enabled and
>> an nvidia GPU being present seems to be causing the problem. That
>> makes sense as the plugin needs the repo and the hardware to function.
>>
>> Example:
>>
>> # time yum search kmod-nvidia --disableplugin=nvidia
>> Loaded plugins: fastestmirror
>> Loading mirror speeds from cached hostfile
>> =========================== N/S matched: kmod-nvidia
>> ===========================
>> kmod-nvidia.x86_64 : nvidia kernel module(s)
>> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
>> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>>
>> Name and summary matches only, use "search all" for everything.
>>
>> real 0m0.569s
>> user 0m0.396s
>> sys 0m0.172s
>>
>> # time yum search kmod-nvidia
>> Loaded plugins: fastestmirror, nvidia
>> Loading mirror speeds from cached hostfile
>> =========================== N/S matched: kmod-nvidia
>> ===========================
>> kmod-nvidia.x86_64 : nvidia kernel module(s)
>> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
>> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>>
>> Name and summary matches only, use "search all" for everything.
>>
>> real 0m41.644s
>> user 0m40.973s
>> sys 0m0.564s
>>
>> From half a second to 40 seconds. Is there a way to for me to debug
>> what's happening? I assume this is not expected behavior? I haven't
>> noticed it in the past.
>>
>> Thanks,
>>
>> - Steve
>>
>
> Hi Steve,
>
> For reference, here's what I see on my system with nvidia hardware:
Thank you for the thorough reply! Responses are inline.
>
> # time yum search kmod-nvidia --disableplugin=nvidia
> Loaded plugins: fastestmirror, langpacks, product-id,
> search-disabled-repos, subscription-manager
> Loading mirror speeds from cached hostfile
> * elrepo: mirrors.coreix.net
> * epel: fedora.cu.be
> * nux-dextop: mirror.li.nux.ro
> ============================== N/S matched: kmod-nvidia
> ==============================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
> Name and summary matches only, use "search all" for everything.
>
> real 0m2.651s
> user 0m0.958s
> sys 0m0.176s
>
>
> and with nvidia plugin enabled it is a little slower, but nowhere near
> as slow as for you:
>
> # time yum search kmod-nvidia
> Loaded plugins: fastestmirror, langpacks, nvidia, product-id,
> search-disabled-repos, subscription-manager
> Loading mirror speeds from cached hostfile
> * elrepo: mirrors.coreix.net
> * epel: fedora.cu.be
> * nux-dextop: mirror.li.nux.ro
> ============================== N/S matched: kmod-nvidia
> ==============================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
> Name and summary matches only, use "search all" for everything.
>
> real 0m7.791s
> user 0m5.785s
> sys 0m0.485s
>
>
> I assume you have the latest version installed:
>
> # rpm -q yum-plugin-nvidia
> yum-plugin-nvidia-1.0.2-1.el7.elrepo.noarch
Yes, I am on the current version:
# rpm -q yum-plugin-nvidia
yum-plugin-nvidia-1.0.2-1.el7.elrepo.noarch
>
> Please can you try running yum at debuglevel 3. Most of the additional
> time for me seems to come during pkgsack time:
>
> # time yum -d 3 search kmod-nvidia --disableplugin=nvidia
> Not loading "rhnplugin" plugin, as it is disabled
> Loaded plugins: fastestmirror, langpacks, product-id,
> search-disabled-repos, subscription-manager
> Adding en_GB.UTF-8 to language list
> Updating Subscription Management repositories.
> Config time: 2.079
> Adding en_GB to language list
> Yum version: 3.4.3
> Setting up Package Sacks
> Loading mirror speeds from cached hostfile
> * elrepo: mirrors.coreix.net
> * epel: fedora.cu.be
> * nux-dextop: mirror.li.nux.ro
> pkgsack time: 0.025
> rpmdb time: 0.000
> tags time: 0.000
> ============================== N/S matched: kmod-nvidia
> ==============================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
> Name and summary matches only, use "search all" for everything.
>
> real 0m2.690s
> user 0m0.962s
> sys 0m0.183s
>
>
> # time yum -d 3 search kmod-nvidia
> Not loading "rhnplugin" plugin, as it is disabled
> Loaded plugins: fastestmirror, langpacks, nvidia, product-id,
> search-disabled-repos, subscription-manager
> Adding en_GB.UTF-8 to language list
> Updating Subscription Management repositories.
> Config time: 2.083
> Adding en_GB to language list
> [nvidia]: device found:
> pci:v000010DEd00001287sv00001043sd00008501bc03sc00i00
> Yum version: 3.4.3
> Setting up Package Sacks
> Loading mirror speeds from cached hostfile
> * elrepo: mirrors.coreix.net
> * epel: fedora.cu.be
> * nux-dextop: mirror.li.nux.ro
> rpmdb time: 0.000
> pkgsack time: 5.137
> tags time: 0.000
> ============================== N/S matched: kmod-nvidia
> ==============================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
> Name and summary matches only, use "search all" for everything.
>
> real 0m7.803s
> user 0m5.772s
> sys 0m0.491s
>
>
> Running in debuglevel 5 may start shedding some light on the possible
> reason for the slowdown.
>
> The plugin is written in python, which is a relatively slow language.
> See here for the code:
>
> https://github.com/elrepo/packages/blob/master/yum-plugin-nvidia/nvidia.py
>
>
> The plugin first retrieves the pci ID string for the installed device
> from /sys/bus/ directory structure (init_hook(conduit)) - this should
> be relatively quick and should show up under 'Config time'.
>
> The plug then checks each package to see if it provides a blacklist
> string of blacklisted device IDs (exclude_hook(conduit)). Our nvidia
> packages provide these special blacklist provides.
>
> For each package found to contain a blacklist provides, we loop
> through each individual blacklisted pci ID provide to see if it
> matches our installed hardware pci ID, and if it does we exclude the
> package from the package sack for consideration. I'm guessing this
> whole process is relatively slow, and becomes slower the more provides
> get added (as more devices become legacy) and the more packages there
> are in the repository.
>
> The last point comes into play here. Recently (el7.6) we started
> releasing an additional nvidia-x11-drv-libs package which took the
> overall package count from 3 to 4 for each driver release (33.3%
> increase). Currently, debuglevel 5 shows us 20 packages are considered
> over 5 driver releases:
>
> Searching 20 packages
> searching package kmod-nvidia-410.93-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package kmod-nvidia-418.43-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package kmod-nvidia-418.56-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package kmod-nvidia-418.74-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package kmod-nvidia-430.14-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-410.93-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-418.43-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-418.56-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-418.74-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-430.14-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-410.93-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-410.93-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.43-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.43-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.56-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.56-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.74-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.74-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-430.14-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-430.14-1.el7_6.elrepo.x86_64
> searching in provides entries
debug 5 was definitely the most useful. The 'searching in provides
entries' is taking almost two seconds for each entry.
pkgsack time is between 40 and 60 seconds.
rpmdb time: 0.000
pkgsack time: 42.327
tags time: 0.000
That is using the official elrepo repository. I have a local mirror
that includes some additional older versions. So the problem is much
more severe in our production systems. But I have cleaned up a bunch
of older versions. And now that I know what's going on, I can find ways
to work around it.
I will take a look at your code repository to see I can come up with any
suggestions, but I am certainly not a python coder myself. Looping
through an array shouldn't take that long though
>
> I could probably prune that back to the last 2 driver releases which
> should save considerable time if this is the cause.
>
> I'll leave it alone for now so you can have a play at different
> debuglevels and share any observations, and then we can try trimming
> the number of packages to see the results.
>
> Other than that, we would need to look at optimising the python code.
> I'm not much of a python expert, but I don't see much that can be
> optimised. The only thing that springs to mind is that we are
> searching all packages for 'blacklist' provides, when we could maybe
> limit that to only searching the elrepo repository. I've no idea if
> this is relevant - do you have a huge number of additional packages
> available to yum through any large repositories? In theory it should
> be easy enough to test with some debug code to show us what's going on
> at each stage so we can see exactly what's adding to the time.
We don't use any repos out of the ordinary. EPEL, elrepo and a few
other small ones.
If I'm reading things right, the search for blacklist provides (where
you seach all packages) only happens the one time? Then the 'loop'
happens on the packages with the 'blacklist' provides, correct?
>
> Phil
>
> PS - the other option is to disable the plugin if it's causing
> significant issues for you. It's only really useful when nvidia
> deprecates older hardware to a legacy release, at which point the
> plugin could be enabled again if required (we always get decent
> warning when nvidia are going to deprecate older hardware).
I agree, I can work around it. I'd have to disable it by default. We
use puppet for maintaining systems and it uses yum commands VERY liberally.
Thanks again,
- Steve
More information about the elrepo
mailing list