Collecting from cache is surprisingly slow
See original GitHub issueWhat’s the problem this feature will solve?
Workflows that require a frequent re-creation of virtual envs from scratch are very slow, and a large amount of time is spend collecting wheels despite having all the artefacts in the local cache already.
To demonstrate, let’s consider installing from this (pip-compiled) requirements.txt
:
requirements.txt
#
# This file is autogenerated by pip-compile with python 3.8
# To update, run:
#
# pip-compile --no-emit-index-url requirements.in
#
ansible==4.5.0
# via -r requirements.in
ansible-core==2.11.4
# via ansible
asgiref==3.4.1
# via django
attrs==21.2.0
# via pytest
backcall==0.2.0
# via ipython
black==21.8b0
# via -r requirements.in
bokeh==2.3.3
# via -r requirements.in
brotli==1.0.9
# via flask-compress
cashier==1.3
# via -r requirements.in
certifi==2021.5.30
# via requests
cffi==1.14.6
# via cryptography
charset-normalizer==2.0.4
# via requests
click==8.0.1
# via
# -r requirements.in
# black
# flask
cloudpickle==2.0.0
# via dask
cryptography==3.4.8
# via ansible-core
cycler==0.10.0
# via matplotlib
dash==2.0.0
# via -r requirements.in
dash-core-components==2.0.0
# via dash
dash-html-components==2.0.0
# via dash
dash-table==5.0.0
# via dash
dask==2021.9.0
# via -r requirements.in
decorator==5.1.0
# via ipython
django==3.2.7
# via -r requirements.in
docker==5.0.2
# via -r requirements.in
flake8==3.9.2
# via -r requirements.in
flask==2.0.1
# via
# -r requirements.in
# dash
# flask-compress
flask-compress==1.10.1
# via dash
fsspec==2021.8.1
# via dask
idna==3.2
# via requests
importlib-resources==5.2.2
# via yachalk
iniconfig==1.1.1
# via pytest
ipython==7.27.0
# via -r requirements.in
itsdangerous==2.0.1
# via flask
jedi==0.18.0
# via ipython
jinja2==3.0.1
# via
# -r requirements.in
# ansible-core
# bokeh
# flask
joblib==1.0.1
# via scikit-learn
kiwisolver==1.3.2
# via matplotlib
locket==0.2.1
# via partd
markupsafe==2.0.1
# via jinja2
matplotlib==3.4.3
# via -r requirements.in
matplotlib-inline==0.1.3
# via ipython
mccabe==0.6.1
# via flake8
mypy==0.910
# via -r requirements.in
mypy-extensions==0.4.3
# via
# black
# mypy
numpy==1.21.2
# via
# -r requirements.in
# bokeh
# matplotlib
# pandas
# pyarrow
# scikit-learn
# scipy
packaging==21.0
# via
# ansible-core
# bokeh
# dask
# pytest
pandas==1.3.3
# via -r requirements.in
parso==0.8.2
# via jedi
partd==1.2.0
# via dask
pathspec==0.9.0
# via black
pexpect==4.8.0
# via ipython
pickleshare==0.7.5
# via ipython
pillow==8.3.2
# via
# bokeh
# matplotlib
platformdirs==2.3.0
# via black
plotly==5.3.1
# via dash
pluggy==1.0.0
# via pytest
prompt-toolkit==3.0.20
# via ipython
ptyprocess==0.7.0
# via pexpect
py==1.10.0
# via pytest
pyarrow==5.0.0
# via -r requirements.in
pycodestyle==2.7.0
# via flake8
pycparser==2.20
# via cffi
pyflakes==2.3.1
# via flake8
pygments==2.10.0
# via ipython
pyparsing==2.4.7
# via
# matplotlib
# packaging
pytest==6.2.5
# via -r requirements.in
python-dateutil==2.8.2
# via
# bokeh
# matplotlib
# pandas
pytz==2021.1
# via
# django
# pandas
pyyaml==5.4.1
# via
# -r requirements.in
# ansible-core
# bokeh
# dask
regex==2021.8.28
# via black
requests==2.26.0
# via
# -r requirements.in
# docker
resolvelib==0.5.5
# via ansible-core
scikit-learn==0.24.2
# via sklearn
scipy==1.7.1
# via
# -r requirements.in
# scikit-learn
six==1.16.0
# via
# cycler
# plotly
# python-dateutil
sklearn==0.0
# via -r requirements.in
sqlitedict==1.7.0
# via -r requirements.in
sqlparse==0.4.2
# via django
tenacity==8.0.1
# via plotly
threadpoolctl==2.2.0
# via scikit-learn
toml==0.10.2
# via
# mypy
# pytest
tomli==1.2.1
# via black
toolz==0.11.1
# via
# dask
# partd
tornado==6.1
# via bokeh
traitlets==5.1.0
# via
# ipython
# matplotlib-inline
typing-extensions==3.10.0.2
# via
# black
# bokeh
# mypy
urllib3==1.26.6
# via requests
wcwidth==0.2.5
# via prompt-toolkit
websocket-client==1.2.1
# via docker
werkzeug==2.0.1
# via flask
yachalk==0.1.4
# via -r requirements.in
zipp==3.5.0
# via importlib-resources
Benchmark script for reproduction:
# cleanup tmp venv if it exists
rm -rf ./tmp_venv
# re-create tmp venv
virtualenv ./tmp_venv
# activate tmp venv
. ./tmp_venv/bin/activate
# install requiremts
time pip install -r requirements.txt --no-deps -i https://pypi.python.org/simple
Starting with the second execution of the script, pip can fully rely on its local cache. But even with 100% cache hits, the time it takes to run all the Collecting ... Using cached ...
operations takes ~42 seconds. The total run time is ~92 seconds, so 45% of the time is spend just collecting from the cache. This time seems excessive. The disk is a fast SSD and the total sum of data to be collected should be < 1GB. So in terms of I/O it should be possible to collect the artefacts much faster from an SSD based cache.
In terms of raw I/O loading the wheels from an SSD based cache should be in the order of few seconds. Thus, bringing down the collection time could speed up venv creation in many cases by almost a factor of 2. This could e.g. significantly speed up CI pipelines that require creation of multiple similar venvs (in fact, venv creation is becoming an increasing bottleneck in complex CI pipelines for us).
Used pip version: 21.2.4
Describe the solution you’d like
Perhaps it is possible to revisit why collecting artefacts from the cache is so slow.
Alternative Solutions
Alternative solutions probably doesn’t apply in case of performance improvements.
Additional context
All information given above?
Code of Conduct
- I agree to follow the PSF Code of Conduct.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:7 (5 by maintainers)
Top GitHub Comments
@bluenote10 thanks for the clear reproduction instructions, makes it so much easier to look at something like this.
I took a profile on my machine and found that it was spending 25s on 92 calls of
_get_html_response
. I’m guessing this is necessary to verify that there hasn’t been a newer version of a given package published & that a previous version hasn’t been yanked.On the one hand this is a good opportunity to thread the http calls, because they’re all independent. But… right now criteria resolution happens on a sort of incremental basis and that will need to be moved a little.
I applied approximately this diff and tried again. Total time for
commands.run
down from ~95s to ~65s and time forresolver.resolve
down from ~38s to ~9s. Now I’m thinking the easiest approach might be to “pre-warm” an http cache in parallel and then not change the structure here?Great findings, thanks a lot for looking into it!
That sounds indeed like a low hanging fruit, with a very welcome speedup 👍
I’m wondering if it would be feasible to store the http request results themselves in the cache as well for an extra speed-up. For
dependency==x.y.z
it would be possible to look in the cached http file for a possible match. If none is found, the http cache is invalidated, running a real http request. However, unversioned dependencies or<
,<=,
>=,>
operators probably must never use the cache. Also there can be subtle issues, e.g. if versionx.y.z
is first published as a source tarball, and only later as a wheel. If in this case the http request caches the state where only the tarball is available, it would continue to install versionx.y.z
from source instead of the preferred wheel. Not sure if there is a good solution to this problem, may cause too much headache.Does that refer to the compilation of
.py
files to.pyc
? So perhaps it would make sense to have an--dont-compile-pyc
option? This particular venv contains 31048.pyc
files totalling a size of 280 MB. In fact this would also help with another issue of virtual envs becoming very large, which can be an issue when trying to create small-ish docker images. If it is indeed the.pyc
creation and it could be skipped, it would mean a massive speed-up with a ~20% size reduction in this example. EDIT: Stupid me, this option already exists, just under the name--no-compile
(always thought it refers to e.g. Cython), and it indeed gives the expected speed-up.If I do the math correctly, the parallel requests combined with skipping
.pyc
creation would bring the time down to ~15 sec, which would be awesome.