question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Checking if installed packages are up to date is slow and use lots of CPU

See original GitHub issue

I’ve been trying to migrate a project to use Pipenv but I’m slightly blocked on how much longer it takes for Pipenv to check if the installed dependencies are up to date, compared to pointing pip install at a requirements file.

In our setup we run tests inside Docker containers. The image we run the tests on is one that comes pre-installed with the dependencies our project has at the time the image is built. Before we run any tests we then make sure the dependencies are up to date, in case any new dependencies are needed for any new code/tests that might have been added. For this we have just been using pip install -r requirements.txt, which normally completes in around 30 seconds when there’s no new dependencies to install.

I then tried to switch this to Pipenv and pre-installed the dependences in the image using a Pipfile and Pipfile.lock and then running pipenv install --deploy --dev --system against the files. That works fine and I got an image created, but the problem comes to when we want to run tests and want to check if dependencies are up to date first. I’ve done this using the same pipenv install --deploy --dev --system command and instead of 30 seconds it now takes 5 minutes and 30 seconds! On top of that the CPU usage it much, much higher.

I’ve made a small test with the Pipfile and Pipfile.lock we are using (only slightly modified): https://github.com/Tenzer/pipenv-test. Some simple tests that can be run with it is for instance to first install the dependencies and then afterwards check that they are up to date in the local environment, than then see how long time and CPU the second operation takes:

$ docker run -it --rm -v $(pwd):/test python bash
root@9f6ecaf12cf8:/# cd /test
root@9f6ecaf12cf8:/test# pip install pipenv
[...]
root@9f6ecaf12cf8:/test# pipenv install --deploy --dev --system
Installing dependencies from Pipfile.lock (f4e26d)…
Ignoring appnope: markers 'sys_platform == "darwin"' don't match your environment
  🐍   β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰ 212/212 β€” 00:02:25
root@9f6ecaf12cf8:/test# time pipenv install --deploy --dev --system
Installing dependencies from Pipfile.lock (f4e26d)…
Ignoring appnope: markers 'sys_platform == "darwin"' don't match your environment
  🐍   β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰ 212/212 β€” 00:01:04

real	1m7.166s
user	1m49.520s
sys	0m15.000s

Note that this was run on my laptop rather than our CI system, and with a slightly simpler Pipfile, hence it’s faster than what I described above. It can however be compared to checking if all packages are installed with pip:

root@9f6ecaf12cf8:/test# pip freeze > requirements.txt
root@9f6ecaf12cf8:/test# pip install -r requirements.txt
[...]
real	0m1.836s
user	0m1.610s
sys	0m0.130s

So according to this non-scientific test, Pipenv is taking 36 times as long and using 94 times more CPU than pip.

I know that there’s a big difference between what’s going on under the hood, but my point here is that the vastly longer time and resource usage may be a deal breaker for some with lots of dependencies.

While digging into this, I noticed that Pipenv is spawning one pip process for each package, and I wonder how much of a slowdown that is compared to pip doing everything inside one process. Would it potentially make sense to split the list of dependencies into 16 (or whatever PIPENV_MAX_SUBPROCESS is set to), in order to avoid having to spawn 212 pip processes - like it’s the case here?

It might also be that this is all down to pip and trying to make it faster for the operations that Pipenv runs. I just thought I would start here and see if there perhaps could be some possible optimisations on the Pipenv side of things.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:14
  • Comments:18 (12 by maintainers)

github_iconTop GitHub Comments

3reactions
Tenzercommented, Jul 12, 2018

Would adding a flag or environment variable to change the behaviour be acceptable? Meaning that the current behaviour is kept as the default, and then people who want the speed boost instead of the progress bar can switch to using a batched behaviour instead.

It could be thought of as a feature flag and perhaps help assess how big a difference it makes to the package installation speed.

0reactions
matteiuscommented, Sep 3, 2022

Funny story, I came to a similar conclusion as @Tenzer recently without realizing it – I worked on a performance optimization for batch_install which cut our install time in the python package manager shootout benchmark in half. https://github.com/pypa/pipenv/pull/5301 Ref: https://lincolnloop.github.io/python-package-manager-shootout/

It did require dropping the progress bar, but the actual progress can be observed with the --verbose flag and cutting our time to install in basically half appears to be quite worth it. Just a reminder – feel free to open new issues and discussions as it pertains to pipenv. I had locked this conversation at the time because it involves many that are no loner active on the project and sometimes these legacy conversations are hard to parse and make sense of given how much has changed in the last couple years, plus you’ll get people come and comment on an issue from 2018 and that makes it easy to miss and not respond to them while we still have over 330 open issues to sort though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Installing and searching for updates is ... - Microsoft Support
Installing and searching for updates is slow and high CPU usage occurs in Windows 7 and Windows Server 2008 R2.
Read more >
Troubleshooting slow servers: How to check CPU, RAM, and ...
The main thing to look for in RAM usage is %memused and %commit . A quick word about the %commit field: This field...
Read more >
High CPU Load while packets processed through slow path
Suspected connection utilizes more than 50% of the total work the instance does. In other words, connection CPU utilization must be > 30%....
Read more >
How Do I Find Out Linux CPU Utilization and Usage? - nixCraft
Whenever a Linux system CPU is occupied by a process, it is unavailable for processing other requests. Rest of pending requests must waitΒ ......
Read more >
How to Fix High CPU Usage - Intel
Find out all the reasons why your PC displays high CPU usage. ... If a program has started climbing in CPU use again...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found