[elrepo] nvidia plugin slow

Steve Cleveland Steve.Cleveland at oregonstate.edu
Fri May 31 19:18:40 EDT 2019


On 5/31/19 1:18 PM, Phil Perry wrote:
> On 31/05/2019 00:18, Steve Cleveland wrote:
>> I've been seeing lately that yum has been running very slow on some 
>> machines.  I did some testing disabling repos and then plugins.  The 
>> combination of elrepo being enabled, nvidia plugin being enabled and 
>> an nvidia GPU being present seems to be causing the problem.  That 
>> makes sense as the plugin needs the repo and the hardware to function.
>>
>> Example:
>>
>> # time yum search kmod-nvidia --disableplugin=nvidia
>> Loaded plugins: fastestmirror
>> Loading mirror speeds from cached hostfile
>> =========================== N/S matched: kmod-nvidia 
>> ===========================
>> kmod-nvidia.x86_64 : nvidia kernel module(s)
>> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
>> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>>
>>    Name and summary matches only, use "search all" for everything.
>>
>> real    0m0.569s
>> user    0m0.396s
>> sys    0m0.172s
>>
>> # time yum search kmod-nvidia
>> Loaded plugins: fastestmirror, nvidia
>> Loading mirror speeds from cached hostfile
>> =========================== N/S matched: kmod-nvidia 
>> ===========================
>> kmod-nvidia.x86_64 : nvidia kernel module(s)
>> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
>> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>>
>>    Name and summary matches only, use "search all" for everything.
>>
>> real    0m41.644s
>> user    0m40.973s
>> sys    0m0.564s
>>
>>  From half a second to 40 seconds.  Is there a way to for me to debug 
>> what's happening?  I assume this is not expected behavior? I haven't 
>> noticed it in the past.
>>
>> Thanks,
>>
>>   - Steve
>>
>
> Hi Steve,
>
> For reference, here's what I see on my system with nvidia hardware:

Thank you for the thorough reply!  Responses are inline.

>
> # time yum search kmod-nvidia --disableplugin=nvidia
> Loaded plugins: fastestmirror, langpacks, product-id, 
> search-disabled-repos, subscription-manager
> Loading mirror speeds from cached hostfile
>  * elrepo: mirrors.coreix.net
>  * epel: fedora.cu.be
>  * nux-dextop: mirror.li.nux.ro
> ============================== N/S matched: kmod-nvidia 
> ==============================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
>   Name and summary matches only, use "search all" for everything.
>
> real    0m2.651s
> user    0m0.958s
> sys    0m0.176s
>
>
> and with nvidia plugin enabled it is a little slower, but nowhere near 
> as slow as for you:
>
> # time yum search kmod-nvidia
> Loaded plugins: fastestmirror, langpacks, nvidia, product-id, 
> search-disabled-repos, subscription-manager
> Loading mirror speeds from cached hostfile
>  * elrepo: mirrors.coreix.net
>  * epel: fedora.cu.be
>  * nux-dextop: mirror.li.nux.ro
> ============================== N/S matched: kmod-nvidia 
> ==============================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
>   Name and summary matches only, use "search all" for everything.
>
> real    0m7.791s
> user    0m5.785s
> sys    0m0.485s
>
>
> I assume you have the latest version installed:
>
> # rpm -q yum-plugin-nvidia
> yum-plugin-nvidia-1.0.2-1.el7.elrepo.noarch

Yes, I am on the current version:

# rpm -q yum-plugin-nvidia
yum-plugin-nvidia-1.0.2-1.el7.elrepo.noarch

>
> Please can you try running yum at debuglevel 3. Most of the additional 
> time for me seems to come during pkgsack time:
>
> # time yum -d 3 search kmod-nvidia --disableplugin=nvidia
> Not loading "rhnplugin" plugin, as it is disabled
> Loaded plugins: fastestmirror, langpacks, product-id, 
> search-disabled-repos, subscription-manager
> Adding en_GB.UTF-8 to language list
> Updating Subscription Management repositories.
> Config time: 2.079
> Adding en_GB to language list
> Yum version: 3.4.3
> Setting up Package Sacks
> Loading mirror speeds from cached hostfile
>  * elrepo: mirrors.coreix.net
>  * epel: fedora.cu.be
>  * nux-dextop: mirror.li.nux.ro
> pkgsack time: 0.025
> rpmdb time: 0.000
> tags time: 0.000
> ============================== N/S matched: kmod-nvidia 
> ==============================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
>   Name and summary matches only, use "search all" for everything.
>
> real    0m2.690s
> user    0m0.962s
> sys    0m0.183s
>
>
> # time yum -d 3 search kmod-nvidia
> Not loading "rhnplugin" plugin, as it is disabled
> Loaded plugins: fastestmirror, langpacks, nvidia, product-id, 
> search-disabled-repos, subscription-manager
> Adding en_GB.UTF-8 to language list
> Updating Subscription Management repositories.
> Config time: 2.083
> Adding en_GB to language list
> [nvidia]: device found: 
> pci:v000010DEd00001287sv00001043sd00008501bc03sc00i00
> Yum version: 3.4.3
> Setting up Package Sacks
> Loading mirror speeds from cached hostfile
>  * elrepo: mirrors.coreix.net
>  * epel: fedora.cu.be
>  * nux-dextop: mirror.li.nux.ro
> rpmdb time: 0.000
> pkgsack time: 5.137
> tags time: 0.000
> ============================== N/S matched: kmod-nvidia 
> ==============================
> kmod-nvidia.x86_64 : nvidia kernel module(s)
> kmod-nvidia-340xx.x86_64 : nvidia-340xx kernel module(s)
> kmod-nvidia-390xx.x86_64 : nvidia-390xx kernel module(s)
>
>   Name and summary matches only, use "search all" for everything.
>
> real    0m7.803s
> user    0m5.772s
> sys    0m0.491s
>
>
> Running in debuglevel 5 may start shedding some light on the possible 
> reason for the slowdown.
>
> The plugin is written in python, which is a relatively slow language. 
> See here for the code:
>
> https://github.com/elrepo/packages/blob/master/yum-plugin-nvidia/nvidia.py 
>
>
> The plugin first retrieves the pci ID string for the installed device 
> from /sys/bus/ directory structure (init_hook(conduit)) - this should 
> be relatively quick and should show up under 'Config time'.
>
> The plug then checks each package to see if it provides a blacklist 
> string of blacklisted device IDs (exclude_hook(conduit)). Our nvidia 
> packages provide these special blacklist provides.
>
> For each package found to contain a blacklist provides, we loop 
> through each individual blacklisted pci ID provide to see if it 
> matches our installed hardware pci ID, and if it does we exclude the 
> package from the package sack for consideration. I'm guessing this 
> whole process is relatively slow, and becomes slower the more provides 
> get added (as more devices become legacy) and the more packages there 
> are in the repository.
>
> The last point comes into play here. Recently (el7.6) we started 
> releasing an additional nvidia-x11-drv-libs package which took the 
> overall package count from 3 to 4 for each driver release (33.3% 
> increase). Currently, debuglevel 5 shows us 20 packages are considered 
> over 5 driver releases:
>
> Searching 20 packages
> searching package kmod-nvidia-410.93-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package kmod-nvidia-418.43-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package kmod-nvidia-418.56-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package kmod-nvidia-418.74-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package kmod-nvidia-430.14-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-410.93-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-418.43-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-418.56-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-418.74-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-430.14-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-410.93-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-410.93-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.43-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.43-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.56-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.56-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.74-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-418.74-1.el7_6.elrepo.x86_64
> searching in provides entries
> searching package nvidia-x11-drv-libs-430.14-1.el7_6.elrepo.i686
> searching in provides entries
> searching package nvidia-x11-drv-libs-430.14-1.el7_6.elrepo.x86_64
> searching in provides entries


debug 5 was definitely the most useful.  The 'searching in provides 
entries' is taking almost two seconds for each entry.

pkgsack time is between 40 and 60 seconds.

rpmdb time: 0.000
pkgsack time: 42.327
tags time: 0.000

That is using the official elrepo repository.  I have a local mirror 
that includes some additional older versions. So the problem is much 
more  severe in our production systems.  But I have cleaned up a bunch 
of older versions.  And now that I know what's going on, I can find ways 
to work around it.

I will take a look at your code repository to see I can come up with any 
suggestions, but I am certainly not a python coder myself.  Looping 
through an array shouldn't take that long though

>
> I could probably prune that back to the last 2 driver releases which 
> should save considerable time if this is the cause.
>
> I'll leave it alone for now so you can have a play at different 
> debuglevels and share any observations, and then we can try trimming 
> the number of packages to see the results.
>
> Other than that, we would need to look at optimising the python code. 
> I'm not much of a python expert, but I don't see much that can be 
> optimised. The only thing that springs to mind is that we are 
> searching all packages for 'blacklist' provides, when we could maybe 
> limit that to only searching the elrepo repository. I've no idea if 
> this is relevant - do you have a huge number of additional packages 
> available to yum through any large repositories? In theory it should 
> be easy enough to test with some debug code to show us what's going on 
> at each stage so we can see exactly what's adding to the time.


We don't use any repos out of the ordinary.  EPEL, elrepo and a few 
other small ones.

If I'm reading things right, the search for blacklist provides (where 
you seach all packages) only happens the one time?  Then the 'loop' 
happens on the packages with the 'blacklist' provides, correct?


>
> Phil
>
> PS - the other option is to disable the plugin if it's causing 
> significant issues for you. It's only really useful when nvidia 
> deprecates older hardware to a legacy release, at which point the 
> plugin could be enabled again if required (we always get decent 
> warning when nvidia are going to deprecate older hardware).

I agree, I can work around it. I'd have to disable it by default.  We 
use puppet for maintaining systems and it uses yum commands VERY liberally.

Thanks again,

  - Steve



More information about the elrepo mailing list