Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Collecting from cache is surprisingly slow

See original GitHub issue

What’s the problem this feature will solve?

Workflows that require a frequent re-creation of virtual envs from scratch are very slow, and a large amount of time is spend collecting wheels despite having all the artefacts in the local cache already.

To demonstrate, let’s consider installing from this (pip-compiled) requirements.txt:

requirements.txt

#
# This file is autogenerated by pip-compile with python 3.8
# To update, run:
#
#    pip-compile --no-emit-index-url requirements.in
#
ansible==4.5.0
    # via -r requirements.in
ansible-core==2.11.4
    # via ansible
asgiref==3.4.1
    # via django
attrs==21.2.0
    # via pytest
backcall==0.2.0
    # via ipython
black==21.8b0
    # via -r requirements.in
bokeh==2.3.3
    # via -r requirements.in
brotli==1.0.9
    # via flask-compress
cashier==1.3
    # via -r requirements.in
certifi==2021.5.30
    # via requests
cffi==1.14.6
    # via cryptography
charset-normalizer==2.0.4
    # via requests
click==8.0.1
    # via
    #   -r requirements.in
    #   black
    #   flask
cloudpickle==2.0.0
    # via dask
cryptography==3.4.8
    # via ansible-core
cycler==0.10.0
    # via matplotlib
dash==2.0.0
    # via -r requirements.in
dash-core-components==2.0.0
    # via dash
dash-html-components==2.0.0
    # via dash
dash-table==5.0.0
    # via dash
dask==2021.9.0
    # via -r requirements.in
decorator==5.1.0
    # via ipython
django==3.2.7
    # via -r requirements.in
docker==5.0.2
    # via -r requirements.in
flake8==3.9.2
    # via -r requirements.in
flask==2.0.1
    # via
    #   -r requirements.in
    #   dash
    #   flask-compress
flask-compress==1.10.1
    # via dash
fsspec==2021.8.1
    # via dask
idna==3.2
    # via requests
importlib-resources==5.2.2
    # via yachalk
iniconfig==1.1.1
    # via pytest
ipython==7.27.0
    # via -r requirements.in
itsdangerous==2.0.1
    # via flask
jedi==0.18.0
    # via ipython
jinja2==3.0.1
    # via
    #   -r requirements.in
    #   ansible-core
    #   bokeh
    #   flask
joblib==1.0.1
    # via scikit-learn
kiwisolver==1.3.2
    # via matplotlib
locket==0.2.1
    # via partd
markupsafe==2.0.1
    # via jinja2
matplotlib==3.4.3
    # via -r requirements.in
matplotlib-inline==0.1.3
    # via ipython
mccabe==0.6.1
    # via flake8
mypy==0.910
    # via -r requirements.in
mypy-extensions==0.4.3
    # via
    #   black
    #   mypy
numpy==1.21.2
    # via
    #   -r requirements.in
    #   bokeh
    #   matplotlib
    #   pandas
    #   pyarrow
    #   scikit-learn
    #   scipy
packaging==21.0
    # via
    #   ansible-core
    #   bokeh
    #   dask
    #   pytest
pandas==1.3.3
    # via -r requirements.in
parso==0.8.2
    # via jedi
partd==1.2.0
    # via dask
pathspec==0.9.0
    # via black
pexpect==4.8.0
    # via ipython
pickleshare==0.7.5
    # via ipython
pillow==8.3.2
    # via
    #   bokeh
    #   matplotlib
platformdirs==2.3.0
    # via black
plotly==5.3.1
    # via dash
pluggy==1.0.0
    # via pytest
prompt-toolkit==3.0.20
    # via ipython
ptyprocess==0.7.0
    # via pexpect
py==1.10.0
    # via pytest
pyarrow==5.0.0
    # via -r requirements.in
pycodestyle==2.7.0
    # via flake8
pycparser==2.20
    # via cffi
pyflakes==2.3.1
    # via flake8
pygments==2.10.0
    # via ipython
pyparsing==2.4.7
    # via
    #   matplotlib
    #   packaging
pytest==6.2.5
    # via -r requirements.in
python-dateutil==2.8.2
    # via
    #   bokeh
    #   matplotlib
    #   pandas
pytz==2021.1
    # via
    #   django
    #   pandas
pyyaml==5.4.1
    # via
    #   -r requirements.in
    #   ansible-core
    #   bokeh
    #   dask
regex==2021.8.28
    # via black
requests==2.26.0
    # via
    #   -r requirements.in
    #   docker
resolvelib==0.5.5
    # via ansible-core
scikit-learn==0.24.2
    # via sklearn
scipy==1.7.1
    # via
    #   -r requirements.in
    #   scikit-learn
six==1.16.0
    # via
    #   cycler
    #   plotly
    #   python-dateutil
sklearn==0.0
    # via -r requirements.in
sqlitedict==1.7.0
    # via -r requirements.in
sqlparse==0.4.2
    # via django
tenacity==8.0.1
    # via plotly
threadpoolctl==2.2.0
    # via scikit-learn
toml==0.10.2
    # via
    #   mypy
    #   pytest
tomli==1.2.1
    # via black
toolz==0.11.1
    # via
    #   dask
    #   partd
tornado==6.1
    # via bokeh
traitlets==5.1.0
    # via
    #   ipython
    #   matplotlib-inline
typing-extensions==3.10.0.2
    # via
    #   black
    #   bokeh
    #   mypy
urllib3==1.26.6
    # via requests
wcwidth==0.2.5
    # via prompt-toolkit
websocket-client==1.2.1
    # via docker
werkzeug==2.0.1
    # via flask
yachalk==0.1.4
    # via -r requirements.in
zipp==3.5.0
    # via importlib-resources

Benchmark script for reproduction:

# cleanup tmp venv if it exists
rm -rf ./tmp_venv

# re-create tmp venv
virtualenv ./tmp_venv

# activate tmp venv
. ./tmp_venv/bin/activate

# install requiremts
time pip install -r requirements.txt --no-deps -i https://pypi.python.org/simple

Starting with the second execution of the script, pip can fully rely on its local cache. But even with 100% cache hits, the time it takes to run all the Collecting ... Using cached ... operations takes ~42 seconds. The total run time is ~92 seconds, so 45% of the time is spend just collecting from the cache. This time seems excessive. The disk is a fast SSD and the total sum of data to be collected should be < 1GB. So in terms of I/O it should be possible to collect the artefacts much faster from an SSD based cache.

In terms of raw I/O loading the wheels from an SSD based cache should be in the order of few seconds. Thus, bringing down the collection time could speed up venv creation in many cases by almost a factor of 2. This could e.g. significantly speed up CI pipelines that require creation of multiple similar venvs (in fact, venv creation is becoming an increasing bottleneck in complex CI pipelines for us).

Used pip version: 21.2.4

Describe the solution you’d like

Perhaps it is possible to revisit why collecting artefacts from the cache is so slow.

Alternative Solutions

Alternative solutions probably doesn’t apply in case of performance improvements.

Additional context

All information given above?

Code of Conduct

I agree to follow the PSF Code of Conduct.

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

jbylundcommented, Sep 16, 2021

@bluenote10 thanks for the clear reproduction instructions, makes it so much easier to look at something like this.

I took a profile on my machine and found that it was spending 25s on 92 calls of _get_html_response. I’m guessing this is necessary to verify that there hasn’t been a newer version of a given package published & that a previous version hasn’t been yanked.

On the one hand this is a good opportunity to thread the http calls, because they’re all independent. But… right now criteria resolution happens on a sort of incremental basis and that will need to be moved a little.

I applied approximately this diff and tried again. Total time for commands.run down from ~95s to ~65s and time for resolver.resolve down from ~38s to ~9s. Now I’m thinking the easiest approach might be to “pre-warm” an http cache in parallel and then not change the structure here?

--- a/src/pip/_vendor/resolvelib/resolvers.py
+++ b/src/pip/_vendor/resolvelib/resolvers.py
@@ -1,5 +1,6 @@
 import collections
 import operator
+import multiprocessing.dummy
 
 from .providers import AbstractResolver
 from .structs import DirectedGraph, IteratorMapping, build_iter_view
@@ -336,11 +337,16 @@ class Resolution(object):
 
         # Initialize the root state.
         self._states = [State(mapping=collections.OrderedDict(), criteria={})]
-        for r in requirements:
+
+        def inner(r):
             try:
                 self._add_to_criteria(self.state.criteria, r, parent=None)
             except RequirementsConflicted as e:
                 raise ResolutionImpossible(e.criterion.information)
+            return r
+        with multiprocessing.dummy.Pool() as tp:
+            for r_out in tp.imap_unordered(inner, requirements):
+                pass
 
         # The root state is saved as a sentinel so the first ever pin can have

0reactions

bluenote10commented, Sep 16, 2021

Great findings, thanks a lot for looking into it!

Of that 90s, ~40s were spent fetching pages from pypi, that can be parallelized which decreases the wall clock time, but it still takes ~10s.

That sounds indeed like a low hanging fruit, with a very welcome speedup 👍

I’m wondering if it would be feasible to store the http request results themselves in the cache as well for an extra speed-up. For dependency==x.y.z it would be possible to look in the cached http file for a possible match. If none is found, the http cache is invalidated, running a real http request. However, unversioned dependencies or <, <=, >=, > operators probably must never use the cache. Also there can be subtle issues, e.g. if version x.y.z is first published as a source tarball, and only later as a wheel. If in this case the http request caches the state where only the tarball is available, it would continue to install version x.y.z from source instead of the preferred wheel. Not sure if there is a good solution to this problem, may cause too much headache.

~50s are spent on “install_wheel”, of which only ~5-6s is spent in shutil.copyfileobj. A huge fraction of the remainder under install_wheel is spent in py_compile.

Does that refer to the compilation of .py files to .pyc? So perhaps it would make sense to have an --dont-compile-pyc option? This particular venv contains 31048 .pyc files totalling a size of 280 MB. In fact this would also help with another issue of virtual envs becoming very large, which can be an issue when trying to create small-ish docker images. If it is indeed the .pyc creation and it could be skipped, it would mean a massive speed-up with a ~20% size reduction in this example. EDIT: Stupid me, this option already exists, just under the name --no-compile (always thought it refers to e.g. Cython), and it indeed gives the expected speed-up.

If I do the math correctly, the parallel requests combined with skipping .pyc creation would bring the time down to ~15 sec, which would be awesome.