Pipenv git repository too large due to stored wheels
See original GitHub issueI noticed the pipenv git repository takes a long time to clone because of it’s size. This is because a number of wheels (binary data) is stored directly in the repo rather than using git LFS or other means.
The problem will only grow with time, when different versions of the wheels get committed, because the old ones will still be part of the repo and git cannot make smart diffs with binary data as it can with text.
Please find another solution to storing wheels for tests.
$ git clone git@github.com:pypa/pipenv.git .
Cloning into 'pipenv'...
remote: Enumerating objects: 30627, done.
remote: Total 30627 (delta 0), reused 0 (delta 0), pack-reused 30627
Receiving objects: 100% (30627/30627), 224.03 MiB | 1.20 MiB/s, done.
Resolving deltas: 100% (20898/20898), done.
$ du -sh .
462M .
$ du -sh tests/pypi
214M tests/pypi
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
5 Quick Tips for Pipenv - Medium
Installing directly from a wheel file can be super helpful in situations where you have a library that isn't available through the normal ......
Read more >Newest 'pipenv' Questions - Stack Overflow
I have added two editable git repo in Pipfile and when I install the package by pipenv so one git repo is added...
Read more >Python Virtual Environments in Five Minutes | Chris Warrick
Store them in each project's directory, like ~/git/foobar/.venv . The first option can be easier to manage, there are tools that can help...
Read more >How to Publish an Open-Source Python Package to PyPI
Click the link below to access the GitHub repository containing the full source code of ... PEP 427 describes how wheels should be...
Read more >Python Requirements - Serverless Framework: Plugins
Compiling non-pure-Python modules or fetching their manylinux wheels is ... To install requirements from private git repositories, add the following to your ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Short answer LFS will help, especially in the long run.
The problem with wheels (and any non-text data) directly in a git repo is this:
Git cannot track the changes. Whenever a binary files changes just one single bit, git will think it’s a completly different file and store both versions, old and new, of the whole file in its entirety, not just the changes. Meaning if you have a 25 MB wheel in your repo, you commit a new version of the same wheel that has 26 MB, the whole repo will now be 51 MB, eventhough little actually changed between the two versions of the wheel. That’s why the pipenv repo is currently 562 MB in size, even though all wheels combined in the latest commit are just 214 MB. The difference are older or deleted wheels in historic commits.
Git with LFS stores just links to the files, and fetches them as necessary. The links are tiny. Problem is, you’re stuck now with the historic commits, because you can’t (shouldn’t rewrite commit history). LFS will just prevent things from getting worse and make it possible to “delete” historic files no longer needed without them still cluttering the repo’s history making it huger and huger as time progresses. NEVER commit binary data to git repositories, kids!
Currently we are using submodules to store pypi artifacts, I think this issue can be closed now.