optimize package installation for space and speed by using copy-on-write file clones ("reflinks") and storing wheel cache unpacked
See original GitHub issueWhat’s the problem this feature will solve?
Creating a new virtual environment in a modern Python project can be quite slow, sometimes on the order of tens of seconds even on very high-end hardware, once you have a lot of dependencies. It also takes up a lot of space; my ~/.virtualenvs/
is almost 3 gigabytes, and this is a relatively new machine; and that isn’t even counting my ~/.local/pipx
, which is another 434M.
Describe the solution you’d like
Rather than unpacking and duplicating all the data in wheels, pip could store the cache unpacked, so all the files are already on the filesystem, and then clone them into place on copy-on-write filesystems rather than copying them. While there may be other bottlenecks, this would also reduce disk usage by an order of magnitude. (My ~/Library/Caches/pip
is only 256M, and presumably all those virtualenvs contain multiple full, uncompressed copies of it!)
Alternative Solutions
You could get a similar reduction effect by setting up an import hook, using zipimport, or doing some kind of .pth
file shenanigans but I feel like those all have significant drawbacks.
Additional context
Given that platforms generally use shared memory-maps for shared object files, if it’s done right this could additionally reduce the memory footprint of python interpreters in different virtualenvs with large C extensions loaded.
Code of Conduct
- I agree to follow the PSF Code of Conduct.
Issue Analytics
- State:
- Created a year ago
- Comments:30 (20 by maintainers)
Top GitHub Comments
I don’t think you can solve this in the virtual environment abstraction? At least I’m not sure how you’re envisioning that working? The virtual environment abstraction largely is just setting up
sys.path
, how things get installed onto thatsys.path
isn’t really it’s concern, unless you have something else in mind that I’m not thinking of? Solving it there also doesn’t solve it for cases that aren’t inside of a virtual environment.I think the only reasonable path here is pretty straight forward:
shutil.copytree
to copy out of the wheel cache.This has some immediate benefits:
With some immediate downsides:
Then it also has some longer term benefits:
os.copy_file_range
that’s an additional speed up.copy_function
toshutil.copytree
.I did work on a proof of concept that tries to solve this issue just in a slightly different way, it uses installer to implement a basic wheel installer that installs packages to
multi-site-packages/{package_name}/{package_version}
, but instead of putting reflinks/symlinks to packages inside thesite-packages
directory of a venv it relies on using a customimportlib
finder which reads a lockfile and inserts the path to the requested version of the package intosys.path
before importing.Made a post on the Python forums here if anybody would like to join the discussion.