Pipenv PEP 503 Improvement: Pipenv downloads PyTorch for all versions of Python, grabbing 16GB of data instead of just 1.7GB.
See original GitHub issueI recently posted the correct way to install PyTorch as a PEP 503 repository in Pipenv:
https://github.com/pypa/pipenv/issues/4961#issuecomment-1045679643
There’s just one annoying issue in Pipenv: It downloads PyTorch for every version of CPython.
So let’s say my project is based on pipenv install --python=3.9
. And I then run the command to install PyTorch (see guide above for details): pipenv install --extra-index-url https://download.pytorch.org/whl/cu113/ "torch==1.10.1+cu113"
.
Well, Pipenv then downloads all versions of PyTorch into ~/.cache/pipenv
: cp36, cp37, cp38, cp39 and probably a few more. And then it finally installs the intended architecture (torch-1.10.1+cu113-cp39
).
This means that the download took 16 GB and 30 minutes, instead of 1.7 GB and 4 minutes. Wasting a ton of disk space and time on downloading extra copies of the library for old Python versions that I’ll never use.
I confirmed that the extra downloaded data is versions for old Python releases, because I went into the Pipenv cache and looked inside the hashed archives to check their WHEEL metadata. It was stuff like the “Python 3.6” torch version etc.
I’m using pipenv 2022.1.8
.
My guess is that Pipenv’s current algorithm just searches PEP 503 repos for packages whose name start with torch-*
and downloads them ALL and then looks at the embedded “wheel metadata” in all downloaded archives to figure out which one matches the installed Python version.
Can Pipenv be improved to detect the “cp39” filename hints in PEP 503 repos and only download the version that matches the installed Python version?
Issue Analytics
- State:
- Created 2 years ago
- Comments:14
Top GitHub Comments
@Bananaman Thanks for your feedback, and I am pretty new here to this code base still but from what I gather about the dependency resolution is that this may require an upstream change somewhere, but I think this is good discussion and could lead to some improvements.
@matteius
Yeah, my card requires PyTorch built for CUDA Toolkit 11.x, which can only be found at the PyTorch repository.
Well there’s 2 issues here:
The best fix would be to do “if running under CPython, look for matching identifier in package filenames such as ‘cp39’ and only download that/those if such an identifier is found”.
As far as I have heard, the
-cp39-
stuff is standardized or at least “the way everyone does it”. The pattern ispackagename-packageversion-cp##-morestuffandCPUarch
. So if filenames follow thepackagename-packageversion-cp##-
pattern, we can strongly assume that it’s an indicator “this is the CPython 3.9 version” and thereby instantly know which packages we can skip from PEP 503 repos.There’s lots of room for improvement of Pipenv’s PEP 503 support. Phase 1 could be "Skip every
-cp##-
version that doesn’t match ours. Phase 2 would be to skip everypackagename-version
that wasn’t requested (no need to download1.10.2
if1.10.2+cu113
was requested). Phase 3 would be to skip every-architecture
(i.e. Linux, Mac, etc) that your system doesn’t have.The most important thing would be to skip the other
-cp##-
versions because that’s a huuuuge amount of data to download.How feasible is it that Pipenv can be extended to filter out useless downloads? Hopefully the internal code isn’t too rigid.