question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pipenv git repository too large due to stored wheels

See original GitHub issue

I noticed the pipenv git repository takes a long time to clone because of it’s size. This is because a number of wheels (binary data) is stored directly in the repo rather than using git LFS or other means.

The problem will only grow with time, when different versions of the wheels get committed, because the old ones will still be part of the repo and git cannot make smart diffs with binary data as it can with text.

Please find another solution to storing wheels for tests.

$ git clone git@github.com:pypa/pipenv.git .
Cloning into 'pipenv'...
remote: Enumerating objects: 30627, done.
remote: Total 30627 (delta 0), reused 0 (delta 0), pack-reused 30627
Receiving objects: 100% (30627/30627), 224.03 MiB | 1.20 MiB/s, done.
Resolving deltas: 100% (20898/20898), done.
$ du -sh .
462M	.
$ du -sh tests/pypi
214M	tests/pypi

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
con-f-usecommented, Apr 10, 2019

Short answer LFS will help, especially in the long run.

The problem with wheels (and any non-text data) directly in a git repo is this:

Git cannot track the changes. Whenever a binary files changes just one single bit, git will think it’s a completly different file and store both versions, old and new, of the whole file in its entirety, not just the changes. Meaning if you have a 25 MB wheel in your repo, you commit a new version of the same wheel that has 26 MB, the whole repo will now be 51 MB, eventhough little actually changed between the two versions of the wheel. That’s why the pipenv repo is currently 562 MB in size, even though all wheels combined in the latest commit are just 214 MB. The difference are older or deleted wheels in historic commits.

Git with LFS stores just links to the files, and fetches them as necessary. The links are tiny. Problem is, you’re stuck now with the historic commits, because you can’t (shouldn’t rewrite commit history). LFS will just prevent things from getting worse and make it possible to “delete” historic files no longer needed without them still cluttering the repo’s history making it huger and huger as time progresses. NEVER commit binary data to git repositories, kids!

0reactions
frostmingcommented, Jul 15, 2019

Currently we are using submodules to store pypi artifacts, I think this issue can be closed now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

5 Quick Tips for Pipenv - Medium
Installing directly from a wheel file can be super helpful in situations where you have a library that isn't available through the normal ......
Read more >
Newest 'pipenv' Questions - Stack Overflow
I have added two editable git repo in Pipfile and when I install the package by pipenv so one git repo is added...
Read more >
Python Virtual Environments in Five Minutes | Chris Warrick
Store them in each project's directory, like ~/git/foobar/.venv . The first option can be easier to manage, there are tools that can help...
Read more >
How to Publish an Open-Source Python Package to PyPI
Click the link below to access the GitHub repository containing the full source code of ... PEP 427 describes how wheels should be...
Read more >
Python Requirements - Serverless Framework: Plugins
Compiling non-pure-Python modules or fetching their manylinux wheels is ... To install requirements from private git repositories, add the following to your ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found