Checking if installed packages are up to date is slow and use lots of CPU
See original GitHub issueIβve been trying to migrate a project to use Pipenv but Iβm slightly blocked on how much longer it takes for Pipenv to check if the installed dependencies are up to date, compared to pointing pip install
at a requirements file.
In our setup we run tests inside Docker containers. The image we run the tests on is one that comes pre-installed with the dependencies our project has at the time the image is built. Before we run any tests we then make sure the dependencies are up to date, in case any new dependencies are needed for any new code/tests that might have been added. For this we have just been using pip install -r requirements.txt
, which normally completes in around 30 seconds when thereβs no new dependencies to install.
I then tried to switch this to Pipenv and pre-installed the dependences in the image using a Pipfile
and Pipfile.lock
and then running pipenv install --deploy --dev --system
against the files. That works fine and I got an image created, but the problem comes to when we want to run tests and want to check if dependencies are up to date first. Iβve done this using the same pipenv install --deploy --dev --system
command and instead of 30 seconds it now takes 5 minutes and 30 seconds! On top of that the CPU usage it much, much higher.
Iβve made a small test with the Pipfile
and Pipfile.lock
we are using (only slightly modified): https://github.com/Tenzer/pipenv-test. Some simple tests that can be run with it is for instance to first install the dependencies and then afterwards check that they are up to date in the local environment, than then see how long time and CPU the second operation takes:
$ docker run -it --rm -v $(pwd):/test python bash
root@9f6ecaf12cf8:/# cd /test
root@9f6ecaf12cf8:/test# pip install pipenv
[...]
root@9f6ecaf12cf8:/test# pipenv install --deploy --dev --system
Installing dependencies from Pipfile.lock (f4e26d)β¦
Ignoring appnope: markers 'sys_platform == "darwin"' don't match your environment
π ββββββββββββββββββββββββββββββββ 212/212 β 00:02:25
root@9f6ecaf12cf8:/test# time pipenv install --deploy --dev --system
Installing dependencies from Pipfile.lock (f4e26d)β¦
Ignoring appnope: markers 'sys_platform == "darwin"' don't match your environment
π ββββββββββββββββββββββββββββββββ 212/212 β 00:01:04
real 1m7.166s
user 1m49.520s
sys 0m15.000s
Note that this was run on my laptop rather than our CI system, and with a slightly simpler Pipfile
, hence itβs faster than what I described above. It can however be compared to checking if all packages are installed with pip
:
root@9f6ecaf12cf8:/test# pip freeze > requirements.txt
root@9f6ecaf12cf8:/test# pip install -r requirements.txt
[...]
real 0m1.836s
user 0m1.610s
sys 0m0.130s
So according to this non-scientific test, Pipenv is taking 36 times as long and using 94 times more CPU than pip
.
I know that thereβs a big difference between whatβs going on under the hood, but my point here is that the vastly longer time and resource usage may be a deal breaker for some with lots of dependencies.
While digging into this, I noticed that Pipenv is spawning one pip
process for each package, and I wonder how much of a slowdown that is compared to pip
doing everything inside one process. Would it potentially make sense to split the list of dependencies into 16 (or whatever PIPENV_MAX_SUBPROCESS
is set to), in order to avoid having to spawn 212 pip
processes - like itβs the case here?
It might also be that this is all down to pip
and trying to make it faster for the operations that Pipenv runs. I just thought I would start here and see if there perhaps could be some possible optimisations on the Pipenv side of things.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:14
- Comments:18 (12 by maintainers)
Top GitHub Comments
Would adding a flag or environment variable to change the behaviour be acceptable? Meaning that the current behaviour is kept as the default, and then people who want the speed boost instead of the progress bar can switch to using a batched behaviour instead.
It could be thought of as a feature flag and perhaps help assess how big a difference it makes to the package installation speed.
Funny story, I came to a similar conclusion as @Tenzer recently without realizing it β I worked on a performance optimization for batch_install which cut our install time in the python package manager shootout benchmark in half. https://github.com/pypa/pipenv/pull/5301 Ref: https://lincolnloop.github.io/python-package-manager-shootout/
It did require dropping the progress bar, but the actual progress can be observed with the
--verbose
flag and cutting our time to install in basically half appears to be quite worth it. Just a reminder β feel free to open new issues and discussions as it pertains to pipenv. I had locked this conversation at the time because it involves many that are no loner active on the project and sometimes these legacy conversations are hard to parse and make sense of given how much has changed in the last couple years, plus youβll get people come and comment on an issue from 2018 and that makes it easy to miss and not respond to them while we still have over 330 open issues to sort though.